Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
42 Cards in this Set
- Front
- Back
reliability |
consistency in measurement |
|
reliability coefficient |
ratio between the true score of variance and the total variance |
|
true variance |
true differences |
|
error variance |
variance from irrelevant, random sources |
|
measurement error |
all of the factors of associated with the process of measuring some variable, other than the variable being measured |
|
random error (aka noise) |
error caused by unpredictable inconsistencies of other variables in the measurement process
ex: unanticipated events in the testing environment |
|
systematic error |
error that is typically constant
ex: ruler is off by 1/10 of an inch, everything measured with that ruler was systematically off by 1/10 of an inch |
|
Sources of error variance |
test construction test administration test scoring/interpretation
|
|
item sampling or content sampling |
variation among and between tests
ex: differences in wording and content
test developer tries to maximize true variance and minimize error variance |
|
examples of error due to test administration |
environment: temperature, lighting, noise test-taker variables: physical and emotional problems examiner-related variables: appearance, demeanor, providing clues |
|
examples of error due to test scoring |
subjectivity in scoring/scorer interpretation computer glitches |
|
other sources of error |
sampling error (did not obtain sample representative of the population) methodological error: improper training, wording is ambiguous, biased test items |
|
test-retest reliability |
correlating scores from the same people on two different administrations of the same test |
|
coefficient of stability |
when the interval between tests is longer than 6 months, the estimate of test-reliability is referred to as the coefficient of stability |
|
coefficient of equivalence |
relationship between various forms of a test |
|
parallel forms |
for each test, the means and variance of test scores are equal |
|
alternate forms |
different versions of a test that are equivalent in content and difficulty |
|
internal consistency estimate of reliability |
an estimate of reliability that can be measured without administering the test twice to the same people |
|
split-half reliability |
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once |
|
acceptable ways to split a test |
1. randomly assign items to one or other half of the test 2. odd-even reliability: assign odd numbered items to one half of the test and even numbered items to the other half 3. divide the test by content so items are equivalent in content and difficulty |
|
inter-item consistency |
correlation among all items on a scale |
|
homogeneity |
degree to which items measure a single trait
the more homogeneous, the more inter-item consistency it is expected to have
straight forward interpretation, but inefficient for intelligence or personality |
|
heterogeneity |
degree to which a test measures different factors-more than one trait |
|
inter-scorer reliability |
degree of agreement or consistency between two or more scorers |
|
when inter-scorer reliability is relevant |
when a researcher wishes to quantify nonverbal behavior |
|
rule of thumb for reliability coefficient grading |
.90- A (.95 used for most important decisions) .80- B .65-.70- weak |
|
test-retest purpose |
evaluate the stability of a measure |
|
test-retest uses |
when assessing the stability of various personality traits |
|
test-retest sources of error variance |
administration |
|
alternate forms purpose |
to evaluate the relationship between different forms of a measure |
|
alternate forms uses |
where there is a need for different forms of a test (ex: makeup test) |
|
alternate forms sources of error |
test construction or administration |
|
internal consistency purpose |
to evaluate the extent to which items on a scale relate to one another |
|
internal consistency uses |
when evaluating the homogeneity of a measure (ie all items are tapping a single construct) |
|
internal consistency sources of error variance |
test construction |
|
inter-scorer purpose |
to evaluate the level of agreement between raters |
|
inter-scorer uses |
interviews or coding of behavior
when researchers need to show that there is consensus in the way that different raters view a behavior pattern |
|
inter-scorer sources of error variance |
scoring and interpretation |
|
item-response theory (IRT) |
provides a way to model the probability that person with X ability will be able to perform at a standard of Y |
|
two characteristics of items within a IRT framework |
difficulty and discrimination |
|
difficulty |
attribute of not being easily accomplished, solved, or comprehended
physical difficulty: how hard it is for a person to engage in a particular activity |
|
discrimination |
the degree to which an item differentiates among people with higher or lower levels of the trait or ability being measured |