Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
73 Cards in this Set
 Front
 Back
1) what does "p" stand for
2) what is the formula for calculating "p" 3) what is the range of "p"? 4) What do larger/smaller values mean? 5) many tests retain items with moderate difficulty levels. What is "p" at this level? 
1) item difficulty index
2) total # examinees passing item / total # examinees 3) 0  1.0 4) larger = easy item smaller = difficult item 5) p is close to .50 

when p is close to .50:
1) test score variability increases/decreases? 2) test reliability increases/decreases? 3) discrimination between examinees increases/decreases? 4) what is the distribution 
1) increase
2) increase 3) increase 4) normal 

1) while most tests look for moderate difficulty levels (p), what type of test prefers retaining more difficult items
2) what value of p is usually optimal for these tests 
1) true/false
2) .75 (whereas, most other tests look for moderate difficulty, p=.50) 

1) what is item discrimination
2) to measure item discrimation, you calculate __. What is the symbol for this index? 3) how is it calculated 4) what is the range of this index 5) an item with what index is generally considered acceptable 
1) extent to which a test item discriminates BETWEEN examinees
2) item discrimination index (D) 3) D = U  L L = % of examinees in lower scoring group U = % of examinees in higher scoring group 4) 1.0 to +1.0 5) .35 or higher 

1) What is "D"
2) when D = +1.0, what does this mean 3) when D = 1.0, what does this mean 
1) D = item discrimination index
2) all upperscoring group got the item correct none of the lowerscoring group 3) all of the lowerscoring group got the item correct none of the upperscoring group 

classical test theory vs. item response theory
1) which is sample invariant (same across different samples) 2) which is sample dependent (varies from sample to sample) 
1) item response theory
2) classical test theory 

1) Item response theory involves deriving __ for EACH item
2) what does it show 
1) item characteristic curve
2) level of ability AND probability of answering item correctly 

On an item characteristic curve
1) what is on the x axis? y axis? 2) how do you determine difficulty level 3) how do you determine the item's ability to discriminate btwn high and low achievers 4) how do you determine the probability of guessing correctly 
1) x = ability level
y = probability of correct response 2) look where on the curve, 50% got the correct response. Then look for corresponding ability level. 3) slope of the curve 4) y intercept (point at which curve intercepts the vertical axis)  proportion of people with low ability who answered the item correctly 

on an item characteristic curve, what does a steep slope indicate

the steeper the slope, the greater the item's ability to discriminate btwn high and low achievers


when using an achievement test developed on the basis of item response theory, an examinee's test score will indicate __

ability level


in classical test theory, an examinee's scores is composed of 2 components: __ & __

1) true score and error


reliability provides an estimate of the proportion of variability in an examinee's score that is due to __

TRUE differences among examinees on attributes measured by test


reliability coefficient
1) range 2) what does a low "r" mean? 3) what does a high "r" mean? 
1) 0.0 to +1.0
2) r = 0 > all variability in score is due to error 3) r = +1 > all variability reflects true score variability (reliable) 

1) r with subscript "xx" stands for
2) r with subscript "xy" stands for 
1) reliability coefficient
2) validity coefficient 

a reliability coefficient of .84 indicates that
1) __% of variability in scores is due to TRUE score differences 2) __% is due to error 
1) 84%
2) 16% 

which method for estimating reliability is associated with:
1) degree of stability (consistency) 2) coefficient of equivalence 3) coefficient alpha 
1) testretest
2) alternate forms 3) internal consistency 

testretest reliability
1) what is primary source of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute 
1) time sampling factors
2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood) 

which method for estimating reliability can be used for the following:
1) aptitude 2) mood 3) speeded test 
1) testretest, alternate forms, internal consistency
2) internal consistency 3) alternate forms 

alternate forms reliability
1) 2 primary sources of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute 
1) content sampling and time sampling factors
2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood) 

internal consistency reliability
1) methods for evaluating 2) not appropriate for assessing reliability of what type of test? It will produce a coefficient that is too high/low? 
1) splithalf and coefficient alpha
2) speeded; too high 

1) splithalf reliability assesses what type of reliability
2) it usually under/over? estimates a test's true reliability 3) how is this corrected 
1) internal consistency
2) under 3) SpearmanBrown prohecy forumula 

As the length of a test decreses, the reliability decreases/increases?

decreases


1) Cronbach's coefficient alpha assesses what type of reliability
2) it provides the lower/upper boundary of a test's reliability 
1) internal consistency
2) lower 

1) KR20 is used for what type of reliability?
2) it is a variation of what other method 3) how does it differ 
1) internal consistency
2) coefficient alpha 3) KR20 is used when items are scored dichotomously 

which method for evaluating internal consistency reliability is used when items are scored dichotomously

KR20


coefficient alpha
1) as the test content become more heterogeneous, coefficient alpha increases/decreases? 
1) decreases


what correlation coefficient is uses with interrater reliability

kappa statistic


for interrater reliability, percent agreement will provide an over/under? estimate of the test's reliability

overestimate


consensual observer drift will aritificially inflate/deflate interrater reliability

inflate


1) what is the most thorough methond for estimating reliability
2) which method is NOT appropriate for speed tests 
1) alternate forms
2) internal consistency 

what method is used to estimate the effects of lengthening and shortening a test on its reliability coefficient

SpearmanBrown


SpearmanBrown tends to over/under? estimate a test's true reliability

overestimate


when the range of scores is restricted, the reliability coefficient is high/low

low


is the reliability coefficient high or low when:
1) item has low difficulty 2) item has average difficulty 3) item has high difficulty 
1) high
2) low 3) low 

to maximize reliability coefficient
1) increase/decrease test length 2) increase/decrease range of scores 3) increase/decrease heterogeneity among examinees 4) increase/decrease the probabilty of guessing correctly 5) p should be close to __ 
1) increase
2) increase 3) increase 4) decrease 5) .50 (average item difficulty) 

a reliability coefficient of __ is considered acceptable

.80 or larger


1) what is the standard error of measurement
2) what is the standard error of estimate 
1) used to construct a confidence interval around a measured (obtained) score
2) used to construct a confidence interval around a predicted (estimated) crterion score 

what is the formula for
1) standard error of measurement 2) standard error of estimate 
1) SD x square root of 1 minus reliability coefficient squared
2) SD x square root of 1 minus validity coefficient squared 

obtained test scores tend to be inaccurate estimates of true scores
1) scores ABOVE the mean tend to over/under?estimate true scores 2) scores BELOW the mean tend to over/under?estimate true scores 
1) over
2) under 

when the standard error of measurement = __,
an examinee's obtained scores can be interpreted as her true score 
0


which of the following would be most appropriate for estimating reliability for anxiety
1) testretest 2) alternate forms 3) coefficient alpha 
3


what are the minimum and maximum values of the standard error of measurement

minimum = 0
maximum  SD of test scores 

how do you establish content validity

judgment of subject matter experts


1) what are the types of construct validity
2) what methods are used to assess 
1) convergent and disciminant
2) multitraitmultimethod matrix AND factor analysis 

multitraitmultimethod matrix
1) which coefficient provides evidence of convergent validity? Is the coefficient large/small? 2) which coefficient provides evidence of discriminant validity? Is the coefficient large/small? 
1) large monotraitheteromethod
2) small heterotraitmonomethod OR small heterotraitheteromethod 

what does a factor analysis assess?

construct validity (convergent and discriminant)


In a factor matrix, correlation coefficients (factor loadings) indicate the degree of association btwn __ and __

each test and each factor


a test has a factor loading of .78 for Factor I. This means that __% of variability in the tests is accounted for by Factor I.

61% (.78 squared)


what is communality

total amount of variability in test scores explained by identified factors


Communality for a test is .64
This means that __% of variability in scores is explained by a combination of identified factors 
64%
NOT squared because it is already square. Communality IS the amount of shared variance. 

according to factor analysis, a test's reliability consists of what two components

communality and specificity
communality = factors tests share in common specificity = factors specific to the test (not measured by other tests) 

a communality is a lower/upper? estimate of a test's reliablity coefficient

lowerlimit


if a test has a communality of .64, the reliability coefficient will necessarily be __

.64 or larger


two types of rotations in a factor analysis
1) which one is associate with uncorrelated factors? 2) with correlated factors? 
1) orthogonal
2) oblique 

when factors are orthogonal/oblique?, a test's communality can be calculated from factor loadings

orthogonal


In factor analysis, when factors are orthogonal, how do you calcualte communality?

communality = SUM of squared factor loadings


when a criterionrelated validity coefficient is large, what does this indicate

predictor has criterionrelated validity


what are the forms of criterionrelated validity

concurrent and predictive


1) validity coefficients rarely exceed __
2) validity coefficients as low as __ might be acceptable 
1) .60
2) .20.30 

how do you evaluate a predictor's incremental validity

scatterplot


for a scatterplot used to assess incremental validity, what determines:
1) positive/negative 2) true/false 
1) predictor
2) criterion 

how do you calculate incremental validity? How is each component calculated?

incremental validity = postive hit rate  base rate
base rate = true positive + false negative divided by total # of people positive hit rate = true positives divided by total positives 

if incremental validity = .34
test can be expected to increase proportion of sucessful employees by __% 
34


relationship btwn predictor reliability and validity
1) what is the equation relationship btwn predictor AND criterion reliability and validity 2) what is the equation 
1) predictor's criterionrelated validity coefficent is less than or equal to (cannot exceed) the sqaure root of its reliability coefficient
predictor's validity coefficient is less than or equal to (cannot exceed) the square root of the predictor's reliability coefficient TIMES the criterion's reliability coefficient 

If a predictor has a reliability coefficient of .81, it's validity coefficient will necessarily be __ (exact number)

.90 or less


1) what is the correction for attenuation formula used for
2) does it tend to over/under?estimate the actual validity coefficient 
to estimate what a predictor's validity coefficient WOULD be if the predictor and/or criterion were perfectly reliable (reliability coefficients = 1.0)
2) overestimate 

criterion contamination
1) tends to inflate the relationship btwn __ 2) results in an artificially high __ coefficient 
1) predictor and criterion
2) criterionrelated validity coefficient 

when crossvalidating a predictor on another smaple, the crossvalidation coefficient tends to __

shrink


"shrinkage" refers to the the shrinking of __ when __

validity coefficient when the predictor is crossvalidated


normreferenced vs. criterionreferenced
Which are the following: 1) percentile ranks 2) percentages 3) regression equation 4) zscore 5) IQ score 
1) norm
2) criterion 3) criterion 4) norm 5) norm 

distribution of percentile ranks has what kind of shape

flat (rectangular)
regardless of the shape of the raw score distribution 

what is the tranformation called when:
1) distribution of transformed scores DIFFERS in shape from the distribution of raw scores 2) has SAME shape? 3) example of the first? 
1) nonlinear transformation
2) linear transformation 3) percentile ranks 

when using correction for guessing, the resulting distribution will have (compared to original distribution)
1) lower/higher mean 2) smaller/larger SD 
1) lower
2) larger 