Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
24 Cards in this Set
- Front
- Back
- 3rd side (hint)
Reliability Coefficient
|
Measure of how much obtained score is true ability
-Interpret directly (70% means 70% is true ability, 30% error) -A good test should have at least 0.7 or higher |
A measure of how much score is actual.....
|
|
Classical Test Theory
|
Results are:
1. True Score (Ability) -true variance 2. Some Error (Fatigue...) -error variance |
Two parts...
|
|
Reliablity
|
-Establish reliability first (Test can be reliable but not valid.)
-Consistency |
Do you get the same results each time?
|
|
Validity
|
-Accuracy
|
Does test measure what it is supposed to?
|
|
Validity can not exceed...
|
the square root of reliablity
|
|
|
Types of Reliability
|
1. Test-retest reliability (Coefficient of stability)
2. Alternate Forms (Considered the best but least used) 3. Internal Consistency (Compares test against itself) |
T-A-I
|
|
Types of Internal Consistency Reliablity
|
1. Split-Half (split test, problem is restricted range)
-can use Spearman-Brown Prophecy Formula to make it like 2 tests 2. Inter-Item Consistency (compare items on one test one against the other in a systematic way) -can use Cronbach's Alpha (compare items on test individually against all others systematically) or Kuder Richardson Formula 20 (special version of Cronbach, use when you have true/false or yes/no dichotomous test items) |
S-I
|
|
Kappa Coefficient
|
Inter rater reliability
|
IRR
|
|
Standard Error of Measurement
|
-Based on reliability coefficient
-Try to get an idea of what a person's true ability is -Based on a person's single score but has properties of a normal curve -the more reliable the test, the less the SE of measurement |
based on another coefficient,<br />
testing issue |
|
Standard Error of Mean
|
-How will sample represent population?
|
|
|
It is best to have ___________ items and _____________ test takers for a test to be most reliable.
|
Homogeneous
Heterogeneous |
|
|
Content Validity
|
Based on expert judgement
-academic tests |
Ex: the EPPP
|
|
Criterion-Related Validity
|
Outcome
-look at relationship between predictor and outcome -used most often in personnel psych (predicting job performance, etc) -two types are predictive validity (who will become schizophrenic?, predicts future behavior) and concurrent validity (who is schizophrenic now?, test results NOW) |
OUTCOME
-two types |
|
Construct Validity
|
Can not directly define
-Two types are convergent (compare new test with established test that measures same construct) and divergent (discriminant validity - you want your test to have nothing in common with another test of a different construct) |
Can not define
-two types |
|
Multitrait-Multimethod Matrix
|
If it's a single trait, will establish convergent validity - need at HIGH monotrait number to establish convergent validity
-If it's a heterogeneous trait, will need a low trait number to establish divergent validity |
single or heterogeneous traits/trait numbers/convergent or divergent validity
|
|
Face Validity
|
Does the test make sense to the people who are taking it?
|
Just what it says :)
|
|
Cross Validation
|
Give test instrument again and again
-Shrinkage may occur (range of scores will shrink slightly when you initially cross validate instruments) |
|
|
Criterion-related Validity
|
OUTCOME
-job performance in the future |
OUTCOME
|
|
Incremental Validity
|
Can we increase that number of correct decisions we are already making?
|
|
|
Three things to establish Incremental Validity
|
1. base rate - moderate (number of decisions you are already making correctly)
2. selection ratio - need low selection ratio (number of jobs available to number of applicants) 3. validity coefficient - high validity on predictor and criterion |
B-S-V
|
|
Criterion-Referenced Scores
|
-Do not compare score to anyone else, just meeting a standard
|
Ex: Driver's license test
|
|
Norm-Referenced Scores
|
-Score is compared to other individuals
-Two types: percentile ranks (not used as much now) and standard scores (transformed scores that allow you to compare) |
Exs: Grade Equivalents, z-scores
|
|
Floor Effect
|
-bunch of test takers at bottom of test range
-need to have enough easy items |
|
|
Ceiling Effect
|
-need to have enough difficult items to discriminate between best test takeers
|
|