• Shuffle
Toggle On
Toggle Off
• Alphabetize
Toggle On
Toggle Off
• Front First
Toggle On
Toggle Off
• Both Sides
Toggle On
Toggle Off
Toggle On
Toggle Off
Front

### How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

Play button

Play button

Progress

1/41

Click to flip

### 41 Cards in this Set

• Front
• Back
 Relevance Refers to the extent to which test items contribute to achieving the stated goals of testing. Factors that influence: Content Appropriateness - Does the item assess the domain it is designed to evaluate? Taxonomic Level - Does the item reflect the appropriate cognitive level? Extraneous Abilities - Does it assess other abilities? Item Difficulty Passing examinees/total examinees. P ranges from 0-1. .5 for most tests. .75 for T/F tests. Item Discrimination The extent to which a test items discriminates b/t examinees with low and high scores. D=U-L. Ranges from -1 to 1. Discrimination = Upper group-Lower group. (Most test look for .35 or .5) Item Charactersitic Curve (ICC) In item response theory, the curve is made for each item and assesses the probability that the respondent will get the item correct. Diff level, discrimination, and guessing are accounted in the ICC. An ICC provides 1-3 pieces of information about a test item – its difficulty (the position of the curve (left versus right); its ability to discriminate between high and low scorers (the slope of the curve); and the probability of answering the item correctly just by guessing (the Y-intercept). Classic Test Theory X=T+E Reliability Coefficient A correlation coefficient (ranging from 0-1) which assesses true scores vs error scores. Interpret directly; no need to sq. Increased when similar items are added, the range of scores is unrestricted, and guessing is reduced. Test-Retest Reliability Known as the coefficient of stability. Good for tests that are relatively stable over time and not affected by repeated measurement. Alternative Forms Reliability Good for tests that are stable. Most thorough method for estimating reliability. Internal Consistency Reliability Not appropriate for speeded tests. Split-half & Coefficient alpha are 2 types. Split-Half Reliability Two halves. Must have enough items or power is low. S-B helps when low. Spearman-Brown Prophecy Formula Estimates reliability on short split-half samples. Coefficient Alpha Method of assessing internal consistency reliability (i.e., special formula) that provides an index of average inter-item consistency. Kuder-Richardson 20 (KR-20) A substitute for coefficient alpha when test items are scored dichotomously (right or wrong). Inter-Rater Reliability Diff raters assessment. Assessed using kappa statistic. Standard Error of Measurement (SEM) SD times the Sq rt of 1-reliability coef. Makes the confidence interval. Ex: 68% confidence interval would be one standard error on both sides of the actual score. Content Validity The extent that a test adequately samples the content that it is designed to measure. Construct Validity The extent that a test measures the hypothetical trait. Converget and Divergent Validity Methods to assess construct validity (multimethod-multitrait or factor analysis). Multitrait-Multimethod "Mono" means same and "Hetero" means different. Same trait-diff methods coeff are large = convergent validity. Diff traits-same method coff are small = discriminant validity. Factor Analysis Identifies the min # of common factors accounting for a set of tests. Factors can be sq. Communality is the % of accountability by the factors. Orthogonal Rotation FA rotation resluting in seperate or uncorrelated factors. Oblique Rotation FA rotation resulting in similar or correlated factors. Criterion-Related Validity When test scores are to be used to draw conclusions about an examinee's likely standing an another measure. Concurrent Validity When criterion data are collected prior to or during the predictor. Predictive Validity When the criterion data are collected after the predictor. Standard Error of Estimate Index of error when predicting criterion scores from a predictor score. Uses Criterions SD and predictors validity coef. SD tiems the sq rt of 1 - validity coefficent Incremental Validity The extent to which a predictor increases decision-making accuracy. (Positive Hit Rate-Base Rate) True Positive ID by predictor; meet criterion False Positive ID by predictor; do not meet criterion True Negative No ID by predictor; do not meet criterion Predictor Determines if a S is +/- False Negative No ID by predictor; do meet criterion Criterion Determines of a S is T/F Base Rate True Positives + False Negatives/Total # Positive Hit Rate True Positives/Total Positives Criterion Contamination When a criterion rater knows the predictor score. When cross validating... ...shrinkage may occur. Norm-Referenced Interpretation Norms must be derived from individuals w/ similar characteristics to be valid. Norms become obsolete quickly. Low reliability means low validity, but... ...high reliability does not always mean high validity. Percentile Ranks Nonlinear. Distribution is always flat regardless of the shape of the raw scores. Disadvantage is that it is an ordinal scale. Criterion-Referenced Interpretation Interpreting against a predetermined standard using either a percentage score or criterion score from an regression eqution and expectancy table.