### 21 Cards in this Set

 Kappa Statistic - Think of... Inter-Rater Reliability Item Difficulty: # got answer correct/# who answered question .50 is ideal, .75 ideal for t/f "D" Item Discrimination: From -1.0 to 1.0, ability to discriminate between high and low. High group gets them approach 1.0, low group gets them: approach -1.0 True + False + True - False - T+ = Hi Predict, Hi Crit F+ = Hi Predict, Low Crit T- = Low Predict, Low Crit F- = low predict, low critertion Criterion: True/False Predictor: Positive/negative Examples of Norm Referenced: Percentile Z-Score (=0 @1) T-Score (=50 @ 10) Relevance: Extent that test items contribute to goals of testing. Reliability: Ways to get it: AKA Consistency: Test-Retest Alternate Forms (ie, A & B form WJ3) Split-Half (Spearman/Brown) Coefficient Alpha (KR-20) Inter-Rater (Kappa Stat) Std Error Measurement: For measurement error: CI around the test score. Std Error Estimate: Error for predicting criterion from predictor. CI around criterion. 2 Types of Construct Validity: CV: Extent test measure the hypothetical construct (ie, intelligence, etc etc) Convergent: Correlates with other measures of same construct. Discriminant: Correlates with other measures that dont correlate. Criterion Referenced Interpretation: Examples: Interpret scores by pre-specified std. % correct Regression Equation Expectancy Table Content Validity: Extent test samples domain of info/knowledge/skill it claims to. Determined by expert judgment. Important for achievement/ job sample. CF: Construct validity. Classical Testing Theory: Variability reflects combo of TRUE SCORE DIFFERENCES and EFFECTS of ERROR (measurement, etc) Thus: Reliablity = Measurement of true score. (80% true score, 20% error) For test item with discrimination index (D) of +1: a. high achievers more correct than low b. low more correct than high c. low and high to be equally correct d. low and high to be equally incorrect. High get more (all) correct. Internal Consistency on Dichotomous: a Spearman-Brown b. Kappa Stat c. KR-20 d. coefficient of concordance KR-20: Best for dichotomous (t/f, right/wrong, etc) to make a coefficient alpha stronger: a. add more similar items b. add more heterogenuous items c. use true/false d. all of the above Add more similar to increase alpha coefficient: IE: ask the same question more, ask more q's about US history and you'll get a more accurate test of knowledge, than a ten question test. Multimodal Multimethod Matrix: For validity you want: a. MM low and HH high b MH high, HM low c. MM high, HM low d. MM High, HH low MH High, HM Low Preventing Criterion Contamination: a. keep raters independent from each other b. make sure they dont have predictor's scores c. make them aware of possible biases through training d. Make sure they dont have predictors scores Oblique Rotation for FA if: a. assessing construct validity of single trait test b. if he believe constructs in the analysis are correlated c. determining factorial validity d. determining reliability Oblique: looking for correlations between constructs measured by the test. Orthogonal vs Oblique Orthogonal: uncorrelated, independent Oblique: Correlated Shared variability calculated by squaring factor load