• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/28

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

28 Cards in this Set

  • Front
  • Back
Reliability coefficients range from
Reliability coefficients range from

0-1.0
What does classical test theory boil down to?
CLASSICAL TEST THEORY:

Reliability means
1) a test yields repeatable, consistent results
2)a test is reliable to the degree that your score reflects the true score on the test rather than error
What does a reliability coefficient of .90 indicate?
90% of the observed variability is d/t true score differences and the remaining 10% is due to measurement error.
Which do you square to interpret?

a) the reliability coefficient
b) the correlation coefficient?
square the correlation coefficient to interpret it
i.e., to determine the proportion of variability that’s shared between 2 measures
Test-retest reliability isn’t appropriate for…
attributes that change over time, e.g., mood.
How do you derive an alternate forms reliability coefficient?
ALTERNATE FORMS RELIABILITY COEFFICIENT

administer 2 alternate forms of a test to the same group of examines (Form A at time 1, then Form B at time 2), then obtain the correlation b/w the 2 sets of scores. So, everyone completes Form A and Form B.

reduces practice effects
What does Internal Consistency Reliability measure?
measures the correlations among individual items in a test
What are the 3 different methods of determining the coefficient of internal consistency?
1) Split-half
2) cronbach’s coefficient alpha (for multiple scored items, e.g., likert scales)
3) Kuder-Richardson formula 20 (for dichotomously scored right-wrong)
What’s the kappa coefficient for?
a measure of inter-rater reliability
The standard error of measurement is different than a reliability coefficient in that…
the standard error of measurement
is used to determine the
CONFIDENCE INTERVAL
for an INDIVIDUAL test score.
Whereas
the reliability coefficient represents how much error a whole TEST contains
How do you calculate a 68, 95, and 99% CI given a standard error of the measurement of 4.0, and a score of 110 on an IQ test?
68%CI = Score +/- 1x(Stan Err of Measurement)
95%CI = Score +/- 2x(Stan Err of Measurement)
99%CI = Score +/- 2.5x(Stan Err of Measurement)

68%CI = 110 +/- 4 = 106-114
95%CI = 110 +/- 8 = 102-118
99%CI = 110 +/- 10 = 100-120
How does a decrease in variability of scores impact reliability coefficient of a test?
it DECREASES reliability
Floor effects result from too many
a) easy items
b) difficult items
FLOOR EFFECT =
too many difficult questions
A ceiling effect results from too many
a) easy items
b) difficult items
CEILING EFFECT =
too many easy questions
How is content validity different than construct validity?
CONTENT validity asks
”how much does this test adequately and representatively sample the context area?”
and it’s based on careful judgment & selection of items covering all content domains, and/or good convergent or criterion-related validity

Construct validity asks
”how much does this test measure the a theoretical construct or trait?”
and it’s assessed over time as data is accumulated and test of convergent/divergent validity are made, and/or factor analyses.
Criterion-related validity is…
the correlation between the predictor (e.g., the SATs) and the criterion (what it’s supposed to predict, e.g., college GPA)
What’s the difference b/w concurrent validity and predictive validity?
concurrent = predictor and criterion scores are collected concurrently.

predictive = predictor scores are collected first and criterion data are collected later.
The multitrait-multimethod matrix is one way of assessing a test’s….
convergent and divergent validity

that is, how much a given test correlates with another test that measures the same construct
and
how much it doesn’t correlate with a test designed to measure another construct
What’s a factor analysis for?
it measures the degree to which a set of tests are all measuring the same underlying factor, or construct.
In factor analysis, what is a factor loading?
it’s a particular test’s correlation with a particular factor that’s been found in the factor analysis.
In factor analysis, what’s the purpose of rotating factors?
What are the two types of rotations?
When is it done?
It facilitates the interpretation of the analysis.
orthogonal rotation (resulting in uncorrelated factors), and oblique (resulting in correlated factors)
factors are rotated as the final step in a factor analysis.
What’s the difference between the standard error of the estimate vs the standard error of the measurement?
the standard error of the ESTIMATE tells us how much error is in the estimated/predicted CRITERION score (e.g., if using SATs to predict GPA, this tells us how off our prediction of GPA might be)

standard error of the MEASUREMENT tells us how much error is in the TEST score itself (e.g., where an examinees TRUE test score is likely to fall on the SATs)
How do you calculate a standard error of the measurement?
Standard Error of the Measurement =

SD √(1 - reliability coefficient)
When the criterion-related validity of a test is moderated by a variable, the test is said to have…
differential validity
What’s shrinkage?
SHRINKAGE

the reduction that occurs in a criterion-related validity coefficient upon cross-validation (i.e., when a test is developed and validated with an initial sample, and then tested again using only the retained items within a second sample)
What is an EIGENVALUE?
EIGENVALUE

1) applies to factor analysis,

2) is used to describe the FACTORS, not the particular tests

3) it is the amount of variance across all tests that is accounted for by the factor. In other words, an eigenvalue tells us how significant or big each factor is.
If for a particular item, the p value is .80, what does this mean?
80% of test takers get the question RIGHT
How hard should test items be to maximize their value in discriminating between high and low scoring test takers?
moderately difficult, p=.50