• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/31

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

31 Cards in this Set

  • Front
  • Back
Describe why you want reliability in a test (and a man) :)
Test scores (or behavior) is dependable, consistent, and stable across items.

It's the *results* we are looking at (just like a man's behavior) not the test itself.

It always refers to a *specific type* of reliability (time, raters, items, etc)

Remember: None are totally consistent or error free. All are subject to some degree of error and fluctuation.
Any fluctuations in scores that results from factors related to the measurement process that are irrelevant to what is being measured is
Measurement error.
True Score + Measurement Error =
Observed score.
This is associated with the fluctuation in test scores obtained from repeated testing of the same individual.
Time-sampling error
This is when the first testing session influences the scores on the second session
Carryover effect
this term is used to label the error that results from selecting test items that inadequately cover the content area that the test is supposed to evaluate.
Content-sampling error
this considers differences in scorers as a potential source of error
Interrater differences
List and explain 4 "other sources of error"
Quality of test items (not vague and ambiguous)

Test length (the greater the number of items, the greater the reliability)

Test-Taker Variables (motivation, fatigue, illness, physical discomfort, mood, etc.)

Test Administration (room temp, admin instructions, lighting, noise, etc)
List 4 methods of estimating reliability
Test-Retest
Alternate Forms
Internal Consistency
Interrater
This is when the same test is given twice with time interval between testings
Test-Retest
What source of error is associated with the test-retest model of estimating reliability?
Time-sampling error
This is when equivalent tests are given at the same time in order to estimate reliability.
Simultaneous Administration (main category of this method is Alternate Forms)
What source of error could occur with the Simultaneous Administration method of estimating reliability?
Content-sampling error
This is when equivalent tests are given with a time interval between testings
Delayed Administration (main cat = alternate forms)
What sources of error are correlated with Delayed Administration?
Time sampling and content sampling errors
This is when one test is divided into two comparable halves, and both halves are given during one testing session
Split half (main cat = internal consistency)
What source of error is associated with the split-half method of estimating reliability?
Content sampling.
This is when one test is given at one time (items compared to other items or to the whole test)
KR Formulas and Coefficient Alpha (Cat = Internal consistency)
What source of error is associated with KR Formulas and Coefficient Alpha?
Content Sampling
This is when one test is given and two individuals independently score the test.
Interrater
What source of error is associated with the Interrater method of estimating reliability?
Interrater differences
This is an alternative to the split-half method of estimating internal consistency. It calculates the reliability of a test by using a single form and a single test administration, and without arbitrarily dividing the test into halves. Dichotomous items only (true false)
Kuder-Richardson Formulas (KR-20 and KR-21)
N/N-1(1-(sum of pq / variance)) =
KR-20 formula
This internal consistency method of estimating reliability is used when items on a test are *not* scored dichotomously (no right or wrong answers), like a rating scale.
Coefficient Alpha
This is the extent to which two or more raters agree. It assesses how consistently the raters implement the rating scale.
Interrater Reliability
A test that measures depressed mood assesses a single content domain and thus contains homogenous test items. In this case which reliability coefficient would be appropriate?
KR 20 or Coefficient alpha because they assess internal consistency by correlation test items with each other.
If a test instrument measured two separate constructs (like depressed mood and anxiety) the heterogeneity of the test items should be computed by this reliability coefficient:
Split-half method. (Or homogenous parts of test could be subdivided into appropriate groupings and the KR 20 or coefficient alpha reliability estimates could be used for calculating each group.)
Reliability coefficients of .70 or higher are considered
acceptable
Do coefficients relate to individuals or a group of scores?
a group of scores.
What measurement is used to estimate the amount of variation in test scores for a single test taker?
Standard Error of Measurement (SEM)
Do SEMs estimate the accuracy of true scores or observed scores?
Observed scores.