• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/39

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

39 Cards in this Set

  • Front
  • Back
Classical Test Theory
Any given score is combination of truth and error
Classical Test Theory: Variability
Total Variability in a group is combination of true score variability (differences in test taker, 'reliability') and error variability.
Sources of Error in Variability
Content Sampling (items do or don't tap into domain by chance)

Time Sampling

Test Heterogeneity (more domains tapped into = more error due to chance)
Reliability Coefficient (rxx or rtt)

Vs. Pearson R
rxx/rtt
Range: 0.00-1.00
acceptable reliability: .80
True score variability: 80%

Pearson R: -1 - +1
Square for true variability
Factors Affecting Reliability
Number of Items

Range of Scores (full range + reliability)

Homogeneity of Items

Ability to Guess
Estimates of Reliability:
Test-Retest reliability (AKA coefficient of stability)
test steady over time
Source of Error: Time Sampling
Estimates of Reliability:
Parallel Forms Reliability (AKA Coefficient of Equivalence)
2 different forms given, 1 group.
Source of Error: Time sampling, content sampling
Estimates of Reliability:
Internal Consistency Reliability
-split half reliability

-Kuder Richardson/Cronbachs Coefficient
Consistency across items. Given 1 time, 1 group.
1. Split half reliability
Spearman Brown Prophecy Formula: shows how much more reliability would be w/all items (vs. 1/2, which has lower reliability)
-inappropriate for speeded (vs. 'power' tests).

Source of error: Item/content sampling
Estimates of Reliability:
Internal Consistency Reliability
-Kuder-Richardson (kr20 & 21) & cronbach's coefficient alpha
Compares all possible halfs
KR20/21 used for dichotimous data (20 if varying difficulty, 21 if not)

Cronbachs coefficient alpha: for nondichotimous (likert type q's)
Inter-rater Reliability
Used when ratings are subjectively scored.
Calculated w/Pearson R, Kappa, Yules Y.

Improved by: Group discussion, practice exercises, feedback.
Standard Error of Measurement
Avg amount of measurement error in any given test

Smea = SDx X sq rt 1-rxx
Standard Error of Measurement Range
0.0 (perfect) - SD of test (whatever that is). Given formula (sd x sqrt 1-rxx)
*Calculating Confidence Bands:
-need to know (2 things):
Need persons score on test ex. 120
Need persons standard error of measurement ex. 6
Add/subtract standard error of measurment.
68%: 114-126
95%: 108-132,
99%: 102-138
Validity - what are the three subtypes?
Content: is tool measuring skills it should be (expert validated)

Criterion: Is test accurate predictor? -2 subtypes (Concurrent/predictive)

Construct: Is tool measuring trait it should be
Criterion Related Validity:
Calculated by?

Subtypes definitions?
Pearson r - score on x/y (range -1 - +1) valid=.20 or higher

variance calculated by squaring: validity-.5 = 25% of varability in scores of y accounted for by x.
Criterion Related Validity:

2 Subtypes
Concurrent: Tests are given/measured about same time

Predictive: Predictor is done long after criterion is given (ex. SAT > College GPA.
Review:

Standard Error of Mean

Standard Error of the Measurement

Standard Error of Estimate
Error in group mean vs. population mean

error in scores based on given test

Error in predictive power of a criterion
Standard Error of Estimate formula
Sest = SDy x sq root of 1-rxy squared
Standard Error of Estimate range
0 - SD of criterion
Mnemonic estimate goes w/Y (criterion) - repairs, cry about estimate. Y so High?

Measurement goes w/X
that's it
Criterion Related Validity Coefficient application:

Expectancy Table
Table that shows probably a given score w/fall in a range of a predictor based on the criterion outcome.
*Taylor Russell Tables
Base Rate, Selection Ratio, Incremental Validity
table that outlines how much better hiring decisions will be when using a test vs. no test

Base Rate: Optimal: .5 (moderate, 50% ee's successful), Selection Ratio: Optimal: .1 (large pool, 10 applicants per job)

Incremental validity: Degree to which a new predictor will impact the critereon.
*Taylor Russell Tables - Base Rate
Rate of successful employees w/out using any test.

ex. 80% are turn out good ee's (base rate=.80)
*Taylor Russell Tables - Selection Ration
# of openings / # of applicants

ex. 1 opening, 10 applicants. 1/10, .10 selection ration. (low)
Incremental Validity
Amount of improvement when using predictor test vs. no test
Taylor Russel Table
Incremental Validity Calculation
Base rate (ex. .40) - 40% are good. W/Test, 65%.
Incremental Validity: .25 (amount of improvement)
Taylor Russel Tables:
Factors affecting INcremental Validity
1. Criterion Related Validity of instrument (rxy)
2. base rate
3. selection ratio
How to optimize incremental validity
*moderate base rate (.5)

*Low selection ration (.1)
Decision Making Theory:
4 options in predictions
true pos (predicted good, did good)
False pos (predicted good, did bad)
true neg (predicted bad, did bad)
false neg (predicted bad, did well)
Decision Making Theory:
How do you de
crease false positives?
increase predictor (1st choice, sometimes can't change criterion)

decrease criterion
Developing a Predictor Test
1. Conceptualization (objective, administrtion, etc)
2. Test construction. Item format, write items.
Item difficulty
want it between .3-.8 (30%-80% get it right)
Item Characteristic Curve
Shows degree to which item indicates testee has the trait being measured
Item Response Theory
Used to develop individually tailored, adaptive test. one answer determines following items.
Test Revision
Items are retained after validation, then cross validated. Results in SHRINKAGE (validity coefficient is smaller on new sample)
Factors Affecting Validity Coefficient
Range of Scores (want broad, unrestricted range)

Reliability of Predictor (caps validity)

Reliability of Predictor and Criterion (correction for attenuation)

Criterion Contamination: Y (criterion) is subjective scored, rater has knowledge of predictor. Inflates validity
*Correction for Attenuation
tells you how much more valid instrument would be if X or Y was perfect
Construct Validity (trait)
2 subtypes
Convergent (degree to which test aligns w/similar instruments - MonoTrait) - moderate/high correlation

Divergent (alignment with tests measuring different traits) - Heterotrait want low correlation
Construct Validity - what table shows different validity types?
Multi-trait/Multi method matrix