Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
80 Cards in this Set
- Front
- Back
RELIABILITY COEFFICIENT?
|
An estimate of TRUE SCORE VARIABILITY.
|
|
RELEVANCE?
|
When test items actually measure or contribute to achieving the stated GOALS of testing.
|
|
How do you determine Relevance in testing?
|
CONTENT APPROPRIATNESS, EXTRANEOUS ABILITIES (to what extent does the item require knowledge, skills, or abilities outside the domain of interest), AND TAXONOMIC LEVEL (item reflects the approp cogn or abiity level)
|
|
ITEM DIFFICULTY INDEX
|
P = TOTAL PE/(divided) by TOTAL E
PE = passing examiness E = examinees ITEM DIFFICULTY INDEX |
|
WHAT ARE THE SHORTCOMINGS OF THE CLASSICAL TEST THEORY?
|
1. Item & Test Parameters are SAMPLE-DEPENDENT, that is the ITEM DIFFICULTY INDEX AND the RELIABILITY COEFFICIENT are likely to VARY from sample to sample.
2. It's DIFFICULT to EQUATE SCORES OBTAINED on different tests that have been developed on the basis of CLASSICAL TEST THEORY. eg. 50 on Math Test does not equate to 50 on English test. |
|
ITEM DIFFICULTY index (P)
p values: |
.50 is optimal
.75 for TRUE/FALSE TEST -IRT defines Item Difficulty as: the probability that an examinee with a given level of the ability measured by the test will answer the item CORRECTLY |
|
ITEM DISCRIMINATION
|
A TESTS' ABILITY TO DISCRIMINATE (DIFFERENTIATES) BET EXAMINEES WHO HAD LOW SCORES VS THOSE WHO HAD HIGH SCORES ON THE ENTIRE TEST OR ON AN EXTERNAL CRITERION.
Range: -1.0 to +1.0 Acceptable Range: .35 or higher (.50 maximum discrimination) |
|
MEASURING ITEM DISCRIMINATION
|
THE DISCRIMINATION INDEX:
D = U - L U (UPPER SCORING GROUP) L (LOWER SCORING GROUP) WHEN D IS = + 1.0, ALL UPPER GROUP SCORED WHEN D = -1.0, ALL OF LOWER GROUP ANSWERED ITEM CORRECTLY. |
|
ITEM RESPONSE THEORY (IRT) VS. CLASICAL TEST THEORY
|
IRT ("LATENT TRAIT APPROACH") ADVANTAGES:
1. The item characteristics (parameters) are SAMPLE INVARIANT (the SAME accross different samples). 2. MEASURES SPECIFIC THINGS FOR EXAMPLE AN EXAMINEE'S LEVEL ON A TRAIT BEING MEASURED RATHER THAN JUST A TOTAL SCORE, IT IS POSSIBLE TO EQUATE SCORES FROM DIFFERENT SETS OF ITEMS AND FROM DIFFERENT TESTS. (MEASURES AN INDIVIDUAL'S: status on a latent trait or ability) 3. EASIER TO DEVELOP COMPUTER-ADAPTIVE TETS, IN WHICH THE ADMINISTRATION OF SUBSEQUENT ITEMS IS BASED ON THE EXAMINEE'S PERFORMANCE ON PREVIOUS ITEMS. |
|
ITEM CHARACTERISTICS CURVE (ICC)
|
-Info the rel. bet an EXAMINEE'S LEVEL on the ABILTY OR TRAIT measured by the test and the PROBABILITY that he or she will RESPOND to the item correctly.
ICC INDICATES: -THE DIFFICULTY LEVEL (position of the curve) -DISCRIMINATION (steepness of slope) -PROBABILTY OF GUESSING CORRECTLY (point at which the curve intercepts the vertical axis) |
|
METHODS FOR ASSESSING RELIABILITY
|
1. TEST-RETEST reliability(Coefficient of STABILITY)
2. ALTERNATE/PARALLELL FORMS (Coefficient of EQUIVALENCE)reliability 3. INTERNAL CONSISTENCY (Coefficient of INTERNAL CONSISTENCY) reliability -SPLIT-HALF (SPEARMAN-BROWN) -COEFFICENT ALPHA (KR-20) (remember that split-half and other forms of internal consistency reliability overestimate the reliability of a speed test) (Internal consistency reliability is generally not used to assess the reliability of speed tests because it produces a spuriously high reliability coefficient) 4.INTER-RATER (KAPPA STAT) |
|
RELIABILTY COEFFICIENT
|
A MEASURE OF TRUE SCORE VARIABILITY
0.0 to +1.0 0.0 = MEASUREMENT ERROR +1.0 = TRUE SCORE VARIABILITY NOTE: -NEVER SQUARED VS. OTHER COEFFICIENTS -reliability vs. validity: reliability = consistency, stabililty validity = assess what it is designed to measure |
|
TEST-RETEST RELIABILITY
|
TEST-RETEST RELIABILITY INVOLVES TESTING THE SAME GROUP AT DIFF TIMES. IT'S COEFFICIENT IS AKA: DEGREE OF STABILITY (CONSISTENCY) OF EXAMINEES OVER TIME, YIELDS A COEFFICIENT OF STABILITY.
NOT GOOD FOR: MOOD TESTS GOOD FOR: APPTITUDE TESTS |
|
ALTERNATIVE (EQUIVALENT, PARRELLEL) FORMS RELIABILITY
|
-ASSESS CONSISTENCY OF RESPONDING TO DIFF ITEM SAMPLES (eg diff test forms)
-ALTERNATIVE FORMS RELIABILITY ENTAILS ADMINISTERING 2 FORMS OF THE TEST TO THE SAME GROUP OF EXAMINEES AND CORRELATING THE 2 SETS OF SCORES. -BEST METHOD OF ALL, MOST THOROUGH METHOD -AKA: Coefficient Of Equivalence |
|
INTERNAL CONSISTENCY RELIABILITY
|
INTERNAL CONSISTENCY RELIABILITY:
~SPLIT-HALF RELIABILITY underestimates sometimes (so use: SPEARMAN-BROWN) ^KUDER-RICHARDSON FORMULA 20 (KR-20)for items scored dichotomously (right or wrong) (inappropriate for speeded tests, it produces a spuriously high reliability coefficient) (KR20 is linked to COEFFICIENT ALPHA) NOT GOOD FOR SPEEDED TESTS |
|
INTER-RATER (INTER-SCORER, INTER OBSERVER) RELIABIILTY
|
-KAPPA STATISTIC
-CALCULATING a PERCENT AGREEMENT bet 2 raters -nominal or ordinal data |
|
FACTORS THAT AFFECT THE RELIABILTY COEFFICIENT
|
1. TEST LENGTH: LARGER SAMPLE IS IDEAL OR TEST LENGTH THE LARGER THE TEST'S RELIABILITY COEFFICIENT.
2. RANGE OF TEST SCORES: UNRESTRICTED RANGE IS IDEAL (heterogeneous examiness is also ideal, also difficulty level is in the mid-range or p = .50) 3. GUESSING: IF TEST TAKERS PROBABILITY OF GUESING INCREASES, THE RELIABILITY COEFFICIENT DECREASES. |
|
STANDARD ERROR OF MEASUREMENT IS USED...
|
~TO DETERMINE OR CONSTRUCT CONFIDENCE INTERVALS.
~INDEX OF MEASUREMENT ERROR. ~Used to construct confidence interval around an examinee's OBTAINED test score. |
|
METHOD FOR ASSESSING CONSTRUCT VALIDITY:
CONVERGENT VALIDITY |
HIGH CORRELATIONS WITH MEASURES OF THE SAME CHARACTERISTICS/TRAIT.
Note: Evidence of Convergent Validity is found when the MONOTRAIT-HETEROMETHOD COEFFECIENT IS LARGE. |
|
DISCRIMINANT (DIVERGENT) VALIDITY
|
LOW CORRELATIONS WITH MEASURES OF UNRELATED CHARACTERISTICS/TRAITS.
Note: Evidence of Discriminant validity is found when the Heterotrait-Monomethod Coeffecient is SMALL. |
|
What is the MULTITRAIT-MULTIMETHOD MATRIX used for?
|
-Assess a test's CONVERGENT & DISCRIMINANT validity.
-4 types of correlation coefficients. |
|
CONTENT VALIDITY
|
-ADEQUATE & REPRESENTATIVE SAMPLE of the target domain.
-ADEQUATELY SAMPLES the CONTENT or BEHAVIOR DOMAIN that it is designed to measure. -Associated mostly with ACHIEVEMENT-TYPE TESTS that measure knowledge of one or more domains. NOTE: NOT FACE VALIDITY |
|
Is FACE VALIDITY (FV) real? What happens if there is no FV?
|
NO. It is not an actual type of VALIDITY.
Is FV real? But if there is no FV, examinees may not answer honestly. What happens if there is no FV? |
|
What are CONSTRUCTS?
|
HYPOTHETICAL TRAITS
What are CONSTRUCTS? E.G. -IQ -MECHANICAL APTITUDE -SELF-ESTEEM -NEUROTICISM |
|
What is CONSTRUCT VALIDITY?
|
When a test measures the HYPOTHETICAL TRAIT or CONSTRUCT it supposed to measure.
WHAT IS CONSTRUCT VALIDITY? |
|
What are the various ways to establish CONSTRUCT VALIDITY?
|
1. Assessing the test's INTERNAL CONSISTENCY
2. STUDY GROUP DIFFERENCES 3. CONDUCTING RESEARCH & test HO 4. Assess the test's CONVERGENT & DISCRIMINANT VALIDITY 5. ASSESSING THE TEST'S FACTORIAL VALIDITY What are the various ways to establish CONSTRUCT VALIDITY? |
|
Describe CONVERGENT VALIDITY.
|
HIGH CORRELATIONS with measures of the SAME TRAIT.
A description of CONVERGENT VALIDITY. |
|
DISCRIMINANT (DIVERGENT) VALIDITY
|
LOW CORRELATIONS WITH MEASURES OF UNRELATED CHARACTERISTICS PROVIDE EVIDENCE OF A TEST'S DISCRIMINANT VALIDITY
|
|
HOW DO YOU ASSESS CONVERGENT AND DISCRIMINANT VALIDITY?
|
MULTITRAIT-MULTIMETHOD MATRIX
|
|
MONTRAIT-MONOMETHOD COEFFICIENTS
|
~Indicates the correlation between a MEASURE & ITSELF
~SHOULD BE LARGE |
|
MONOTRAIT-HETEROMETHOD COEFFICIENTS
|
~Indicates the correlation between DIFFERENT MEASURES of the SAME TRAIT
~Provides evidence of *CONVERGENT VALIDITY ~SHOULD BE LARGE |
|
HETEROTRAIT-MONOMETHOD COEFFICIENTS
|
~The correlation between DIFFERENT TRAITS that have been measured by the SAME METHOD
~results in DISCRIMANT VALIDITY ~NEEDS TO BE SMALL |
|
HETEROTRAIT-HETEROMETHOD COEFFICIENTS
|
~Indicates the correlation between DIFFERENT TRAITS that have been measured by DIFFERENT METHODS
~ Results in DISCRIMINANT VALIDITY ~NEEDS TO BE SMALL |
|
CRITERION-RELATED VALIDITY
|
-USE WHEN TEST (X) SCORES WILL BE USED TO PREDICT SCORES ON SOME OTHER MEASURE or CRITERION (Y) AND IT IS THE SCORES ON Y THAT ARE OF MOST INTEREST
-refers to the relationship between test scores and a criterion measure. |
|
ASSESSING CONSTRUCT VALIDITY
|
CONVERGENT AND DISCRIMINANT VALIDITY METHODS
|
|
CONVERGENT VALIDITY
|
LARGE MONOTRAIT-HETEROMETHOD COEFFICIENT
|
|
DISCRIMINANT VALIDITY
|
SMALL HETEROTRAIT-MONOMETHOD AND SMALL HETEROTRAIT-HETEROMETHOD COEFFICIENT
|
|
FACTOR ANALYSIS
|
IS CONDUCTED TO IDENTIFY THE MINIMUM NUMBER OF COMMON FACTORS (DIMENSIONS) REQUIRED TO ACCOUNT FOR THE INTERCORRELATIONS AMONG A SET OF TESTS, SUBTESTS, OR TEST ITEMS.
USED TO ASSESS A TEST'S CONSTRUCT VALIDITY |
|
5 STEPS OF FACTOR ANALYSIS
|
1. ADMINISTER SEVERAL TESTS TO A GROUP OF EXAMINEES.
2. CORRELATE SCORES ON EACH TEST WITH SCORES ON EVERY OTHER TEST TO OBTAIN A CORRELATION (R) MATRIX 3. USING ONE OF SEVERAL AVAILABLE FACTOR ANALYTIC TECHNIQUES, CONVERT THE CORRELATION MATRIX TO A FACTOR MATRIX. 4. SIMPIFY THE INTERPRETATION OF THE FACTORS BY "ROTATING" THEM. 5. INTERPRET AND NAME THE FACTORS IN THE ROTATED FACTOR MATRIX. |
|
FACTOR LOADINGS
|
-Correlation Coefficents that indicate the DEGREE of ASSOCIATION Between each TEST & EACH FACTOR.
-A factor loading can be interpreted by SQUARING it to obtain a measure of shared variability. When the factor loading is .70, this means that 49% (.70 squared) of variability in the test is accounted for by the factor. |
|
COMMUNALITY
|
INDICATES "COMMON VARIANCE" OR THE AMOUNT OF VARIABLITY IN TEST SCORES THAT IS DUE TO THE FACTORS THAT THE TEST SHARES IN COMMON, TO SOME DEGREE, WITH THE OTHER TESTS INCLUDED IN THE ANALYSIS.
OR A TEST'S COMMUNALITY INDICATES THE TOTAL AMOUNT OF VARIABILITY IN TEST SCORES THAT IS EXPLAINED BY THE IDENTIFIED FACTORS. or the proportion of variance accounted for by multiple factors in a single variable |
|
TWO TYPES OF ROTATION
|
-ORTHOGONAL AND OBLIQUE
-The reason for ROTATION in FACTOR ANALYSIS IS TO INTERPRET THOSE FACTORS. -Rotation ALTERS the FACTOR LOADINGS for each variable and the eigenvalue for each factor (although the total of the eigenvalues remains the same). |
|
ORTHOGONAL rotation
|
~The resulting FACTORS are UNCORRELATED
~A TEST'S COMMUNALITY CAN BE CALCULATED FROM ITS FACTOR LOADINGS. ~THE COMMUNALITY IS EQUAL TO THE SUM OF THE SQUARED FACTOR LOADINGS. |
|
OBLIQUE rotation
|
The resulting factors are CORRELATED and the ATTRIBUTES measured by the FACTORS are NOT INDEPENDENT.
THE SUM OF THE SQUARED FACTOR LOADINGS EXCEEDS THE COMMUNALITY. |
|
WHEN YOU SQUARE A FACTOR LOADING WHAT TYPE OF MEASURE DOES IT PROVIDE?
|
A MEASURE OF "SHARED VARIABILITY"
SQUARING A FACTOR LOADING |
|
IN FACTOR ANALYSIS, WHEN FACTORS ARE ORTHOGONAL, HOW DO YOU CALCULATE THE TEST'S COMMUNALITY?
|
SQUARE AND ADD THE TEST'S FACTOR LOADINGS.
ORTHOGONAL ROTATIONS |
|
CRITERION-RELATED VALIDITY
|
CRITERION-RELATED VALIDITY IS OF INTEREST WHENEVER TEST SCORES ARE TO BE USED TO DRAW CONCLUSIONS ABOUT AN EXAMINEE'S LIKELY STANDING OR PERFORMANCE ON ANOTHER MEASURE.
Note: Criterion-Related Validity Coefficient,.20 OR .30 IS ACCEPTABLE (never exceeds .60) |
|
2 TYPES OF CRITERION-RELATED VALIDITY
|
1. CONCURRENT VALIDITY
2. PREDICTIVE VALIDITY |
|
2 TYPES OF CONSTRUCT VALIDITY
|
1. CONVERGENT VALIDITY
2. DISCRIMINANT (DIVERGENT) VALIDITY |
|
CONCURRENT VALIDITY
|
WHEN CRITERION DATA ARE CORRELATED PRIOR TO OR AT ABOUT THE SAME TIME AS DATA ON THE PREDICTOR, THE PREDICTOR'S CONCURRENT VALIDITY IS BEING ASSESSED.
Concurrent validity would be how does the WISC correlate with the WJ Cog., the Stan. Bin, etc. Predictive would be how the WISC correlates with academic achievement (e.g., WIAT, WRAT), job performance, etc. |
|
When is PREDICTIVE VALIDITY EVALUATED?
|
When the criterion is measured some time AFTER the predictor has been administered.
|
|
THE STANDARD ERROR OF ESTIMATE
|
USED TO CONSTRUCT A CONFIDENCE INTERVAL AROUND A PREDICTED (ESTIMATED) CRITERION SCORE.
68% = 1 STANDARD ERROR 95% = 2 STANDARD ERRORS 99% = 3 STANDARD ERRORS |
|
STANDARD ERROR OF ESTIMATE
|
USED TO CONSTRUCT A CONFIDENCE INTERVAL AROUND AN ESTIMATED (PREDICTED) SCORE.
|
|
WHY WOULD YOU WANT INCREMENTAL VALIDITY? WHAT DO YOU NEED TO ESTIMATE A TEST'S INCREMENTAL VALIDITY?
|
-IT INCREASES DECISION-MAKING ACCURACY.
-The selection ratio, base rate, and validity coefficient are used to estimate a test's incremental validity using the Taylor-Russell tables. INCREMENTAL VALIDITY = + HIT RATE - BASE RATE |
|
HOW DO YOU ESTIMATE INCREMENTAL VALIDITY?
|
INCREMENTAL VALIDITY = + HIT RATE - BASE RATE
|
|
What is the BASE RATE FORMULA?
|
BASE RATE = TRUE POSITIVES + FALSE NEGATIVES DIVIDED BY TOTAL NUMBER OF PEOPLE
BASE RATE FORMULA |
|
What is the + HIT RATE FORMULA?
|
+ HIT RATE = TRUE POSITIVES DIVIDED BY TOTAL POSITIVES
+ HIT RATE FORMULA |
|
What is the PREDICTOR?
|
- (X, IV)
- DETERMINES IF A PERSON IS POSITIVE OR NEGATIVE PREDICTOR |
|
What is the CRITERION ?
|
- (Y, DV)
- DETERMINES IF HE/SHE IS A "TRUE" OR A "FALSE" CRITERION |
|
RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY
|
LOW RELIABILITY = NO HIGH DEGREE OF CONTENT, CONSTRUCT, OR CRITERION-RELATED VALIDITY
HIGH RELIABILILTY = DOES NOT GUARANTEE VALIDTY |
|
CORRECTION OF ATTENUATION FORMULA
|
-A WAY OF ESTIMATING WHAT A PREDICTOR'S VALIDITY COEFFICIENT WOULD BE IF THE PREDICTOR AND/OR THE CRITERION WERE PERFECTLY RELIABLE (E.G. HAD A RELIABILITY COEFFICIENT OF 1.0).
-useful for estimating a validity coefficient when measurement error has affected the magnitude of the coefficient -The correction for attenuation formula is used to estimate a test's validity coefficient when the reliability coefficient for the predictor and/or criterion has been increased to 1.0. |
|
CRITERION CONTAMINATION
|
-TENDS TO INFLATE THE RELATIONSHIP BETWEEN A PREDICTOR AND A CRITERION, RESULTING IN AN ARTIFICIALLY HIGH CRITERION-RELATED VALIDITY COEFICIENT
-Criterion contamination occurs when a rater's knowledge of a ratee's performance on a predictor biases his/her ratings of the ratee on the criterion. -Is of concern when the measure of performance is subjectively scored. |
|
CROSS-VALIDATION
|
THE CROSS-VALIDATION COEFFICIENT TENDS TO "SHRINK" OR BE SMALLER THAN THE ORIGINAL COEFFICIENT.
THE SMALLER THE INITIAL VALIDATION SAMPLE, THE GREATER THE SHRINKAGE OF THE VALIDITY COEFFICIENT WHEN THE PREDICTOR IS CROSS-VALIDATED. |
|
What are the different types of NORM-REFERENCED INTERPRETATION?
|
-PERCENTILE RANKS
-STANDARD SCORES -IT TELLS YOU HOW WELL AN INDIVIDUAL IS DOING COMPARED TO OTHER INDIVIDUALS NORM-REFERENCED INTERPRETATION |
|
PERCENTILE RANKS
|
EXPRESSES AN EXAMINEE'S RAW SCORE IN TERMS OF THE PERCENTAGE OF EXAMINEES IN THE NORM SAMPLE WHO ACHIEVED LOWER SCORES
A PR OF 88 MEANS 88% OF THE PEOPLE IN A NORM SAMPLE OBTAINED SCORES LOWER THAN THE APPLICANT'S SCORE. PR ARE A NONLINEAR TRANSFORMATION BEC DISTRIB IS ALWAYS FLAT (rectangular) REGARDLESS OF THE SHAPE OF THE RAW SCORE DISTRIBUTION. |
|
STANDARD SCORES
|
AN EXAMINEE'S POSITION IN THE NORMATIVE SAMPLE IN TERMS OF STANDARD DEVIATIONS FROM THE MEAN.
ZSCORES; MEAN IS EQUAL TO 0, THE SD IS EQUAL TO 1, TSCORES; MEAN OF 50; SD OF 10 (eg 97% of scores fall below the score that is two standard deviations above the mean.) |
|
CRITERION-REFERENCED INTERPRETATION/SCORES (eg PERCENTAGE SCORES)
|
INVOLVES INTERPRETING SCORES IN TERMS OF A PRESPECIFIED STANDARD.
E.G. PERCENTAGE SCORE (OR PERCENT CORRECT), INDICATES THE PERCENTAGE OF THE TEST CONTENT THAT AN EXAMINEE ANSWERED CORRECTLY. -Indicates how much of the test content(s) the examinee MASTERED. -IT TELLS YOU HOW MANY ITEMS AN INDIVIDUAL GOT NOTE: When the goal of testing is to determine the amount of content an individual has mastered, criterion-referenced (or content-referenced) scores are most useful. |
|
NORM-REFERENCED INTERPRETATIONS
|
-PERCENTILE RANKS
-STANDARD SCORES NORM-REFERENCED INTERPRETATIONS |
|
CRITERION-REFERENCED INTERPRETATIONS
|
-PERCENTAGE SCORES
-REGRESSION EQUATION -EXPECTANCY TABLE CRITERION-REFERENCED INTERPRETATIONS |
|
PERCENTAGE SCORES
|
-Indicates how much of the test content(s) the examinee MASTERED.
NOTE: When the goal of testing is to determine the amount of content an individual has mastered, criterion-referenced (or content-referenced) scores are most useful. |
|
Lords Chi Square is used to:
|
-A way to evaluate the DIFFERENTIAL ITEM FUNCTIONING (DIF) of an item included in a test. (DIF AKA ITEM BIAS)
-Occurs when one group responds differently to an item than another group even though both groups have similar levels of the latent trait (attribute) measured by the test. -Several statistical techniques are used to evaluate DIF. Lord’s chi-square is one of these techniques. |
|
KUDER-RICHARDSON FORMULA 20
|
-Measures INTERNAL CONSISTENCY
-A HIGH KR-20 COEFFICIENT indicates = a HOMOGENEOUS test. |
|
EIGEN VALUES ARE ASSOCIATED WITH
|
PRINCIPLE COMPONENT ANALYSIS
Eigenvalues can be calculated for each component "extracted" in a principal component analysis. |
|
TEST-RETEST RELIABILITY?
|
COEFFICIENT OF STABILITY
|
|
ALTERNATE/PARALLELL FORMS RELIABILTY?
|
COEFFICIENT OF EQUIVALENCE
|
|
INTERNAL CONSISTENCY RELIABILITY?
|
COEFFICIENT OF INTERNAL CONSISTENCY
|
|
SPLIT-HALF?
|
SPEARMAN-BROWN
|
|
COEFFICENT ALPHA?
|
KR-20
|
|
INTERNAL CONSISTENCY RELIABILITY
|
-COEFFICIENT OF INTERNAL CONSISTENCY
-TWO TYPES: -SPLIT-HALF (SPEARMAN-BROWN) -COEFFICENT ALPHA (KR-20) |
|
IPSATIVE SCORES
|
TELLS YOU THE RELATIVE STRENGTHS OF THE DIFFERENT CHARACTERISTICS MEASURED BY A TEST.
|