Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
61 Cards in this Set
- Front
- Back
Def: reliability
|
Repeatable and consistent
Free from error Reflects 'true score' |
|
Def: validity
|
Measures what it says it does
|
|
Def: power test
|
Assesses the attainable level of difficulty
No time limit Graduated difficulty Qs that everyone can do Qs that no one can do Eg: WAIS information subtest |
|
Def: ipsative measures
|
Scores reported in terms of relative strength within the individual
Preference is expressed for one item over another |
|
Def: mastery test
|
Cutoff for predetermined level of performance
|
|
Def: normative measures
|
Absolute strength measured
All items answered Comparison among people possible |
|
Range and interpretation of a reliability coefficient
|
0 (unreliable)
to 1 (perfectly reliable) .9 means 90% of the variance accounted for You do NOT square a reliability coefficient |
|
Factors affecting reliability coefficient
|
Anything reducing the range of obtained scores (eg a homogeneous population)
Anything increasing measurement error Short (vs long) tests Presence of floor or ceiling effects High probability of guessing a correct answer |
|
Factors affecting test-retest reliability
|
Maturation
Difference in conditions Practice effects |
|
Measures of internal consistency
|
Split-half: divide test in 2 and correlate scores on the subtests; sensitive to selection strategy
Coefficient alpha: used with multiple choice questions Kuder-Richardson Formula 20 (KR-20) used for questions with dichotomous answers Reliability increases with item homogeniety |
|
Utility of internal consistency measures
|
Measurement of unstable traits
Not good for speed tests Sensitive to item content / sampling |
|
Appropriate measure of speed test reliability
|
Test-retest
Alternate forms |
|
Measure of inter-rater reliability
|
Kappa coefficient
|
|
Factors improving inter-rater reliability
|
Well trained raters
Explicit observation of the raters Mutually exclusive and exhaustive scoring categories |
|
Def: interval recording
|
All behavior within a specified period of time
|
|
Def: standard error of measurement
|
How much error is expected from an individual test score
|
|
Formula: standard error of measurement *
|
SE = SD * square root of (1-r)
where r = the reliability coefficient which ranges from 0 to 1 |
|
Use: standard error of measurement
|
Construction of a confidence interval
|
|
Probability of scores falling within a specified confidence interval
|
68% +/- 1 SE
95% +/- 1.96 SE 99% +/- 2.58 SE |
|
Use: eta *
|
Correlation of continuous non-linear variables
|
|
Def: types of criterion related validity
|
Concurrent
Scores collected at the same time Useful for diagnostic tests Predictive validity Scores tested before and later Useful for eg job selection tests |
|
Factors affecting criterion related validity
|
Restricted range of scores
Unreliability of predictor or criterion Regression Criterion contamination |
|
Def: criterion contamination
|
Occurs when person assessing criterion knows predictor for an individual
|
|
Def: convergent/divergent analysis
|
Convergent validity is high correlation between different measures of same construct
Divergent validty is low correlation between measures measuring different constructs |
|
Relationship between reliability and validity
|
The criterion-related validity coefficient cannot exceed the square root of the predictor's reliability coefficient
Reliability coefficient sets a ceiling on the validity coefficient |
|
Def: face validity
|
Appearance of validity to test takers, administrators and other untrained people
|
|
Def: criterion related validity coefficient
|
Pearson r correlation between predictor and criterion
acceptable range is +/- .3 to .6 |
|
Differences between
standard error of measurement and standard error of estimate |
Standard error of measurement
related to reliability coefficient used to estimate true score on a given test Standard error of estimate Determines where a criterion will fall given a predictor |
|
Def: shrinkage
|
Reduction in validity coefficient on cross-validation (revalidation with a second sample)
A result of noise in original sample |
|
Factors affecting shrinkage
|
Small original validation sample
Large original item pool Relative number of items retained is small Items not sensibly chosen |
|
Def: construct validity
|
Extent to which a test successfully measures an unobservable, abstract concept such as IQ
|
|
Techniques for assessing construct validity
|
Convergent validity techniques
High correlation on a trait even with different methods Divergent / discriminant validity techniques Low correlation on different traits even with the same method Factor analysis |
|
Def: factor loading
|
Correlation between a given test and a factor derived from a factor analysis
Can be squared to give % of variance that the test accounts for in the factor |
|
Def: communality (factor analysis)
|
The proportion of variance of a test accounted for by the factors
Sum of the squared factor loadings Interpreted directly, ie .4 = 40% Only valid when factors are orthogonal |
|
Def: unique variance (factor analysis)
|
Variance not accounted for by the factors
u2 = 1 - h2, where h2 is the communality |
|
Def: eigenvalue
|
explained variance
= Sum of the squares of the loadings sum of the eigenvalues <= number of tests Applied to unrotated factors only |
|
Formula to convert eigenvalue to %
|
= eigenvalue * 100 / number of tests
|
|
Types of rotation (factor analysis) *
|
Orthogonal - uncorrelated
Oblique - correlated Choice depends on what you believe the relationship is among the factors |
|
Differences between principle components analysis and factor analysis
|
In principle components analysis:
Factors are always uncorrelated Variance = explained + error In factor analysis: variance = common + specific + error |
|
Use: cluster analysis
|
Categorize or taxonimize a set of objects
|
|
Differences between cluster analysis and factor analysis
|
Cluster analysis
all types of data clusters interpreted as categories Factor analysis interval or ratio data only factors interpreted as underlying constructs |
|
Def: correction for attenuation
|
Estimate of how much more valid a predictor would be if it and the criterion were perfectly reliable
|
|
Def: content validity
|
Adquate sampling of relevant content domain
|
|
To reduce the number of false positives...
|
Raise the predictor cutoff
and / or Lower the criterion cutoff |
|
Def: false negative
|
Predicted not to meet a criterion but in reality does
|
|
Def: item difficulty or difficulty index *
|
% of examinees answering correctly
an ordinal value, because an item with an index of .2 is not necessarily half the difficulty of an item with an index of .4 |
|
Def: item discriminability
|
Degree to which an item differentiates between low and high scorers
D = difference between high and low % correctly answered range from 100 to -100 moderate difficulty optimal |
|
Target values for item difficulty by objective
|
.5 for most tests
.25 for high cutoff (matching selection %) .8 or .9 for mastery half way between chance and 1, eg t/f exams would be .75 |
|
Relationship between item difficulty and discriminability
|
Difficulty creates a ceiling for discriminability
Difficulty of .5 creates maximum discriminability The greater the mean discriminability the greater the reliability |
|
What can you determine from an item response (aka item characteristic) curve?
|
Difficulty
point where p(correct response) = .5 Discriminability slope of the curve; lower more discriminable Probability of a correct guess intersection with y axis |
|
Def: computer adaptive assessment
|
Computerized selection of test items based on periodic estimates of ability
|
|
What are the advantages of a test item of moderate difficulty (p = .5)
|
Increases variability which increases reliability and validity
Maximally differentiates between low and high scorers |
|
Techniques for assessing an item's discriminability
|
Correlation with
total score an external criterion |
|
What are the mean and std deviation for the following standard scores: z, t, stanine and deviation IQ?
|
mean SD
z 0 1 t 50 10 stanine 9 ~2 deviation IQ 100 15 |
|
The difference between norm-referenced and criterion referenced scores
|
Norm referenced is a comparison to others in a sample
Criterion referenced measure against an external criterion |
|
Characteristics of alternate forms reliability coefficient
|
Best, because to be high must be consistent across time and content
Likely to have a lower magnitude than other coefficients |
|
Def: moderator variable
|
Variables affecting validity of a test
A moderator variable confers differential validity on the test |
|
Def: 'testing the limits' in dynamic assessment
|
Following a standardized test, using hints to elicit correct performance. The more hints necessary, the more severe the learning disability
|
|
Contents of the Mental Measurements Yearbook
|
Author
Publisher Target population Administrative time Critical reviews |
|
Effect on the floor of adding easy questions to a test *
|
Will raise the floor
|
|
Def: dynamic assessment
|
Variety of procedures following on standardized testing to get further information, usually used with learning disablity or retardation
|