• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

41 Cards in this Set

  • Front
  • Back
Refers to the extent to which test items contribute to achieving the stated goals of testing.
Factors that influence:
Content Appropriateness - Does the item assess the domain it is designed to evaluate?
Taxonomic Level - Does the item reflect the appropriate cognitive level?
Extraneous Abilities - Does it assess other abilities?
Item Difficulty
Passing examinees/total examinees. P ranges from 0-1. .5 for most tests. .75 for T/F tests.
Item Discrimination
The extent to which a test items discriminates b/t examinees with low and high scores. D=U-L. Ranges from -1 to 1. Discrimination = Upper group-Lower group. (Most test look for .35 or .5)
Item Charactersitic Curve (ICC)
In item response theory, the curve is made for each item and assesses the probability that the respondent will get the item correct. Diff level, discrimination, and guessing are accounted in the ICC.

An ICC provides 1-3 pieces of information about a test item – its difficulty (the position of the curve (left versus right); its ability to discriminate between high and low scorers (the slope of the curve); and the probability of answering the item correctly just by guessing (the Y-intercept).
Classic Test Theory
Reliability Coefficient
A correlation coefficient (ranging from 0-1) which assesses true scores vs error scores. Interpret directly; no need to sq. Increased when similar items are added, the range of scores is unrestricted, and guessing is reduced.
Test-Retest Reliability
Known as the coefficient of stability. Good for tests that are relatively stable over time and not affected by repeated measurement.
Alternative Forms Reliability
Good for tests that are stable. Most thorough method for estimating reliability.
Internal Consistency Reliability
Not appropriate for speeded tests. Split-half & Coefficient alpha are 2 types.
Split-Half Reliability
Two halves. Must have enough items or power is low. S-B helps when low.
Spearman-Brown Prophecy Formula
Estimates reliability on short split-half samples.
Coefficient Alpha
Method of assessing internal consistency reliability (i.e., special formula) that provides an index of average inter-item consistency.
Kuder-Richardson 20 (KR-20)
A substitute for coefficient alpha when test items are scored dichotomously (right or wrong).
Inter-Rater Reliability
Diff raters assessment. Assessed using kappa statistic.
Standard Error of Measurement (SEM)
SD times the Sq rt of 1-reliability coef. Makes the confidence interval. Ex: 68% confidence interval would be one standard error on both sides of the actual score.
Content Validity
The extent that a test adequately samples the content that it is designed to measure.
Construct Validity
The extent that a test measures the hypothetical trait.
Converget and Divergent Validity
Methods to assess construct validity (multimethod-multitrait or factor analysis).
"Mono" means same and "Hetero" means different. Same trait-diff methods coeff are large = convergent validity. Diff traits-same method coff are small = discriminant validity.
Factor Analysis
Identifies the min # of common factors accounting for a set of tests. Factors can be sq. Communality is the % of accountability by the factors.
Orthogonal Rotation
FA rotation resluting in seperate or uncorrelated factors.
Oblique Rotation
FA rotation resulting in similar or correlated factors.
Criterion-Related Validity
When test scores are to be used to draw conclusions about an examinee's likely standing an another measure.
Concurrent Validity
When criterion data are collected prior to or during the predictor.
Predictive Validity
When the criterion data are collected after the predictor.
Standard Error of Estimate
Index of error when predicting criterion scores from a predictor score. Uses Criterions SD and predictors validity coef. SD tiems the sq rt of 1 - validity coefficent
Incremental Validity
The extent to which a predictor increases decision-making accuracy. (Positive Hit Rate-Base Rate)
True Positive
ID by predictor; meet criterion
False Positive
ID by predictor; do not meet criterion
True Negative
No ID by predictor; do not meet criterion
Determines if a S is +/-
False Negative
No ID by predictor; do meet criterion
Determines of a S is T/F
Base Rate
True Positives + False Negatives/Total #
Positive Hit Rate
True Positives/Total Positives
Criterion Contamination
When a criterion rater knows the predictor score.
When cross validating...
...shrinkage may occur.
Norm-Referenced Interpretation
Norms must be derived from individuals w/ similar characteristics to be valid. Norms become obsolete quickly.
Low reliability means low validity, but...
...high reliability does not always mean high validity.
Percentile Ranks
Nonlinear. Distribution is always flat regardless of the shape of the raw scores. Disadvantage is that it is an ordinal scale.
Criterion-Referenced Interpretation
Interpreting against a predetermined standard using either a percentage score or criterion score from an regression eqution and expectancy table.