Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Psychological Testing and Measurement Test II

Psychological Testing And Measurement Test Ii

by collegecas, Feb. 2015

Subjects: Psychology, Testing, Measurement

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/105

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

105 Cards in this Set

Front
Back

	Assumptions About Psychological Testing	1.) Psychological States and Traits Exist 2.) Traits and States Can Be Quantified and Measured 3.) Test-related Behavior Predicts Non-Test-Related Behavior 4.) Tests Have Strengths and Weaknesses 5.) Error is Part of Assessment 6.) Testing Can Be Conducted in a Fair Manner 7.) Testing and Assessment Benefit Society
	Norm-Referenced Testing and Assessment	a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker's score and comparing it to scores of a group of test takers.
	Norms	the test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individual test scores (a nominal sample is the reference group)
	Stratified Sampling	Sampling that includes different subgroups, or strata, from the population -stratified-random sampling is when every member of the population has an equal opportunity of being included in a sample
	Purposive Sample	Randomly selecting a sample that is believed to be representative of the population
	Incidental/ convenience sample:	A sample that is convenient or available for use. May not be representative of the population -generalization of finding must be made with caution
	Standardization	the process of administering a test to a representative sample of test takers for the purpose of establishing norms
	Sampling	when test developers select a population, or which the test is intended, that has at least one common, observable characteristic
	Percentile Norms	percentile: the percentage of people whose score on a test or measure falls below a particular raw score -popular for tests because they are easily calculated -difference between raw scores may be minimized at the ends and exaggerated in the middle
	Other Types of Norms	-age norms -grade norms -national norms -national anchor norms -subgroup norms -local norms
	Fixed Reference Group Scoring Systems	The distribution of scores obtained on the test from one group of test takers is used as the basis for the calculation test scores for future administrations of this test
	Reliability	consistency in measurement
	Reliability Coefficient	an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance observed score: true score plus error (X=T+E)
	Variance	true variance plus error variance (the standard deviation squared)
	Measurement Error	all the factors associated with the process of measuring some variable, other than the variable being measured random error: caused by unpredictable fluctuations & inconsistencies of other variables systematic error: error that is always accounted for
	Sources of Error Variance	-Test construction -Test administration -Test scoring and interpretation -sampling error -methodological error
	Test-Retest Reliability	an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test -good for variables that should be stable over time -estimates decrease over time -after 6 months, estimate is called the coefficient of stability
	Parallel-Forms	-for each form of the test, the means and the variances of observed test scores are equal
	Alternate-Forms	different versions of a test that have been constructed so as to be parallel. do not meet the strict requirements of parallel forms buy typically item content and difficulty is similar between tests
	Coefficient of Equivalence	the degree of the relationship between various forms of a test
	Split-Half Reliability	obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. step 1.) divide the test into equivalent halves step 2.) calculate pearson r between two halves step 3.) adjust the half-test reliability using the spearman-brown formula
	Spearman-Brown formula	allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test
	Inter-Item Consistency	The degree of relatedness of items on a test. Able to gauge the homogeneity of a test
	Inter-Scorer Reliability	The degree of agreement or consistency between two or more scorers with regard to a particular measure. Often used with behavioral measures, guards against biases.
	Classical Test Theory (CTT)	most widely used true-score model due to its simplicity
	True Score	a value that according to CTT genuinely reflects an individual's ability or trait level as measured by a particular test
	Domain-Sampling Theory	estimates the extent to which specific sources of variation under defined conditions are contributing to the test score
	Generalizability Theory	based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation
	Item-Response Theory (IRT)	provides a way to model the probability that a person with X ability will be able to perform at a level of Y
	Discrimination	the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or other variable being measured -incorporates considerations of item difficulty and discrimination. difficulty relates to an item not being easily accomplished
	Standard Error of Measurement (SEM)	provides a measure of the precision of an observed test score. an estimate of the amount of error inherent in an observed score or measurement -the higher the reliability, the lower the SEM
	Confidence Interval	a range or band of test scores that is likely to contain the true score
	Standard Error of Difference	a measure that can aid a test user in determining how large a difference in test scores should be expected before it is considered statistically significant
	SED Answers These Questions:	1.) How did this individual's performance on test 1 compare with his or her performance on test 2? 2.) How did this individual's performance on test 1 compare with someone else's performance on test 1? 3.) How did this individual's performance on test 1 compare with someone else's performance on test 2?
	Validity	a judgement or estimate of how well a test measures what it purports to measure in a certain context
	Validation	the process of gathering and evaluating evidence about validity
	3 Types of Validity	1.) Content Validity: measure of validity based of eval. of subjects, topics, or content covered by the items in a test 2.) Criterion-Related Validity: measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests 3.) Construct Validity: a measure of validity arrived at by executing a comprehensive analysis of: a.) how scores on the test relate to other test scores and measures and b.) how scores on the test can be understood with some theoretical framework for understanding the construct that the test was designed to measure
	Face Validity	a judgement concerning how relevant the test items appear to be - self-report tests tend to be high, Rorscach is low - low face validity leads to low validity confidence
	Test Blueprint	a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, etc.
	Criterion	the standard against which a test or test score is evaluated. - a criterion is relevant, valid, and uncontaminated, which means it's not part of the predictor
	Concurrent Validity	an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently)
	Predictive Validity:	an index of the degree to which a test score predicts some criterion, or outcome, measure in the future. Tests are evaluated as to their predictive validity.
	Validity Coefficient	a correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure -affected by restriction or inflation of range
	Incremental Validity	the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
	Expectancy Tables	show the percentage of people within specified test-score intervals who subsequently were placed in various categories of the criterion (pass vs fail categories)
	Evidence of Construct Validity	-homogeneity: how uniform a test is in measuring a single construct -changes with age/ changes over time -pretest/posttest changes: scores change as result of experience -from distinct groups: scores vary because of membership of a group
	Convergent Evidence	scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established tests designed to measure the same/ similar construct
	Discriminant Evidence	validity coefficient showing little relationship between test scores and other variables with which scores on the test should not theoretically be correlated
	Factor Analysis	a new test should load on a common factor with other tests of the same construct
	Bias	a factor inherent in a test that systematically prevents accurate, impartial measurement -implies systematic variation in tests -prevention during development is best cure
	Rating Error	a judgement resulting from the intentional or unintentional misuse of a rating scale
	Halo Effect	a tendency to give a particular person a higher rating than he or she objectively deserves because of a favorable overall impression
	Fairness	The extent to which a test is used in an impartial, just, and equitable way
	Utility	the usefulness or practical value of testing to improve efficiency
	Factors Affecting Utility	-psychometric soundness: high criterion validity, higher utility -costs -benefits
	Costs of Utility	-money -time -place you're using -doctor insurance -skill of assessor -travel
	Benefits of Utility	-conclusive outcome -improved well-being -better functioning in life -better vocational placement -increase in performance -assessment is worth it
	Utility Analysis	a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment -ask question "what test will give us the most bang for our buck"
	Taylor-Russel Tables	provide an estimate of the % of employees hired by the use of a particular test who will be successful at their jobs, give the test's validity (validity coefficient), the selection ratio (# that represents #of people to be hired vs. # of people available to be hired), and the base rate (% of people hired under existing system for particular reason
	Naylor-Shine Tables	entail obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is adding to already established procedures
	Brogden-Cronbach-Gleser Formula	used to calculate the $ of a utility gain resutling from the use of a particular selection instrument under specified conditions
	Cronbach and Gleser presented	1.) a classification of decision problems 2.) various selection strategies ranging from single-stage to sequential analysis 3.) quantitative analysis of the relationship between test utility, selection ratio, cost, and expected value of the outcome 4.) a recommendation that in some instances, job requirements be tailored to the applicant's ability instead of vice versa (adaptive treatment)
	Practical Considerations	-the pool of job applicants (infinite vs finite) -the complexity of the job -the cut score in use -multiple cut scores (Student gets A, B, C, etc.) -multiple hurdles (need to answer a question right to move onto the next one)
	The Angoff Method	judgements of experts are averaged to yield cut scores for the test -problems arise if there is low agreement between experts
	The Known Groups Method	entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability or interest -no guidelines exist to establish guidelines
	IRT Based Method	each item is associated with a particular level of difficulty -in order to "pass" the test, test taker must answer items that are deemed to be above minimum level of difficulty
	Method of Predictive Yield	took into account the # of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores
	Discriminant Analysis	a family of statistical techniques used to shed light on the relationship between identified variables and naturally occurring groups
	Five Stages of Test Development	1.) Conceptualization 2.) Construction 3.) Tryout 4.) Analysis 5.) Revision
	Scaling	the process of setting rules for assigning numbers in measurement
	Rating Scales	a grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude or emotion are indicated by the test taker
	Likert Scale	Each item presents the test taker with five alternative options (sometimes seven), usually on an agree-disagree or approve-disapprove
	Scaling Methods	-numbers can be assigned to responses to calculate test scores using a number of methods -unidimensional: one dimension is presumed to underlie the ratings -multidimensional: more than one dimension is thought to underlie the ratings
	Method of Paired-Comparisons	test takers must choose between two alternatives according to some rule (which is more justified) -test takers receive more points for choosing option deemed more justifiable by majority group of judges
	Comparative Scaling	Entails judgements of a stimulus in comparison with every other stimulus on the scale
	Categorical Scaling	stimuli are placed into one of two or more alternative categories
	Guttman Scale	Items range sequentially from weaker to strong expressions of attitude, belief, or feeling being measured -respondents who agree with the stronger statements of the attitude will agree with milder statements
	Item Pool	the reservoir or well form which items will or will not be drawn for the final version of a test
	Item Format	includes variables like form, plan, structure, arrangement, and layout of individual questions
	Selected Response Fomat	Items require test takers to select a response from a set of alternative responses
	Constructed Response Format	items require test takers to supply or create the correct answer, not just select it
	Multiple Choice	three components: 1.) a stem, the ? 2.) a correct alternation or option, the answer 3.) incorrect alternatives referred to as distractors or foils
	Computerized Adaptive Testing (CAT)	an interactive, computer-administered test-taking process wherein items presented to the test taker are based in part of the test takers performance on the previous items -provide economy in testing time and # of items presented
	Floor Effect	your base level, the lowest skill level you achieve
	Ceiling Effect	you get 3 answers in a row wrong and the test is concluded, that is your ceiling
	Cumulatively Scored Test	assumption that the higher the score on the test, the higher the test taker is on the ability, trait, or other characteristic that the test purports to measure
	Class Scoring	responses earn credit toward placement in a particular class or category with other test takers whose pattern of responses is presumably similar in some way (diagnostic testing)
	Ipsative Scoring	comparing a test taker's score on one scale within a test to another scale within that same test
	Test Tryout	-should be tested on pop. it was designed for -should be tested 5-10 times -should be administered consistently and fairly
	Item Analysis	test developers will use the tools of an index of the item's difficulty, reliability, validity, and discrimination
	Item-Difficulty Index	the proportion of respondents answering an item correctly
	Item Reliability Index	indication of the internal consistency of the scale -factor analysis can also provide this
	Item-Validity Index	allows test developers to evaluate the validity of items in relation to a criterion measure
	Item-Discrimination Index	indicates how adequately an item separates or discriminates between high scorers and low scorers on an entire test -measure of difference between proportion of high scorers answering a ? right and proportion of low scorers answering the ? right
	Item Characteristic Curves (ICC)	a graphic representation of item difficulty and discrimination
	Other Considerations	-guessing -item fairness -biased test items -speed tests
	Qualitative Methods	techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures -read a question out loud, respondents are asked to verbalize their thoughts as they occur during testing
	Sensitivity Review	items are examined in relation to fairness to all prospective test takers (check for offensiveness)
	Revision In Test Development	items are evaluated as to their strengths and weaknesses -some may be replaced by items from item pool -revised tests will be tried again -once a test has been finalized, norms may be developed from the data and it is said to be standardized
	Cross-Validation	the revalidation of a test on a sample of test takers other than those on whom test performance was originally found to be a valid predictor of some criterion
	Co-Validation	a test validation process conducted on two or more tests using the same sample of test takers
	Anchor Protocol	test protocol scored by a highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies
	Scoring Drift	a discrepancy between scoring in an anchor protocol and the scoring of another protocol
	IRT Applications In Building/Revising Tests	1.) evaluating existing tests for the purpose of mapping test revisions 2.) determining measurement equivalence across test taker populations 3.) developing item banks
	hey cas guess what	YOU'RE AWESOME AS HECK OKAY GO YOU. YOU CAN DO THIS. YOU WILL DO AMAZING ON THIS TEST. ILY. -cas

Share This Flashcard Set

Set the Language

Related Flashcards

Psychological Testing And Measurement Test Ii

Add to Folders

Upgrade to Cram Premium

Card Range To Study

105 Cards in this Set