Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
31 Cards in this Set
- Front
- Back
Describe why you want reliability in a test (and a man) :)
|
Test scores (or behavior) is dependable, consistent, and stable across items.
It's the *results* we are looking at (just like a man's behavior) not the test itself. It always refers to a *specific type* of reliability (time, raters, items, etc) Remember: None are totally consistent or error free. All are subject to some degree of error and fluctuation. |
|
Any fluctuations in scores that results from factors related to the measurement process that are irrelevant to what is being measured is
|
Measurement error.
|
|
True Score + Measurement Error =
|
Observed score.
|
|
This is associated with the fluctuation in test scores obtained from repeated testing of the same individual.
|
Time-sampling error
|
|
This is when the first testing session influences the scores on the second session
|
Carryover effect
|
|
this term is used to label the error that results from selecting test items that inadequately cover the content area that the test is supposed to evaluate.
|
Content-sampling error
|
|
this considers differences in scorers as a potential source of error
|
Interrater differences
|
|
List and explain 4 "other sources of error"
|
Quality of test items (not vague and ambiguous)
Test length (the greater the number of items, the greater the reliability) Test-Taker Variables (motivation, fatigue, illness, physical discomfort, mood, etc.) Test Administration (room temp, admin instructions, lighting, noise, etc) |
|
List 4 methods of estimating reliability
|
Test-Retest
Alternate Forms Internal Consistency Interrater |
|
This is when the same test is given twice with time interval between testings
|
Test-Retest
|
|
What source of error is associated with the test-retest model of estimating reliability?
|
Time-sampling error
|
|
This is when equivalent tests are given at the same time in order to estimate reliability.
|
Simultaneous Administration (main category of this method is Alternate Forms)
|
|
What source of error could occur with the Simultaneous Administration method of estimating reliability?
|
Content-sampling error
|
|
This is when equivalent tests are given with a time interval between testings
|
Delayed Administration (main cat = alternate forms)
|
|
What sources of error are correlated with Delayed Administration?
|
Time sampling and content sampling errors
|
|
This is when one test is divided into two comparable halves, and both halves are given during one testing session
|
Split half (main cat = internal consistency)
|
|
What source of error is associated with the split-half method of estimating reliability?
|
Content sampling.
|
|
This is when one test is given at one time (items compared to other items or to the whole test)
|
KR Formulas and Coefficient Alpha (Cat = Internal consistency)
|
|
What source of error is associated with KR Formulas and Coefficient Alpha?
|
Content Sampling
|
|
This is when one test is given and two individuals independently score the test.
|
Interrater
|
|
What source of error is associated with the Interrater method of estimating reliability?
|
Interrater differences
|
|
This is an alternative to the split-half method of estimating internal consistency. It calculates the reliability of a test by using a single form and a single test administration, and without arbitrarily dividing the test into halves. Dichotomous items only (true false)
|
Kuder-Richardson Formulas (KR-20 and KR-21)
|
|
N/N-1(1-(sum of pq / variance)) =
|
KR-20 formula
|
|
This internal consistency method of estimating reliability is used when items on a test are *not* scored dichotomously (no right or wrong answers), like a rating scale.
|
Coefficient Alpha
|
|
This is the extent to which two or more raters agree. It assesses how consistently the raters implement the rating scale.
|
Interrater Reliability
|
|
A test that measures depressed mood assesses a single content domain and thus contains homogenous test items. In this case which reliability coefficient would be appropriate?
|
KR 20 or Coefficient alpha because they assess internal consistency by correlation test items with each other.
|
|
If a test instrument measured two separate constructs (like depressed mood and anxiety) the heterogeneity of the test items should be computed by this reliability coefficient:
|
Split-half method. (Or homogenous parts of test could be subdivided into appropriate groupings and the KR 20 or coefficient alpha reliability estimates could be used for calculating each group.)
|
|
Reliability coefficients of .70 or higher are considered
|
acceptable
|
|
Do coefficients relate to individuals or a group of scores?
|
a group of scores.
|
|
What measurement is used to estimate the amount of variation in test scores for a single test taker?
|
Standard Error of Measurement (SEM)
|
|
Do SEMs estimate the accuracy of true scores or observed scores?
|
Observed scores.
|