Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/80

Click to flip

80 Cards in this Set

  • Front
  • Back
RELIABILITY COEFFICIENT?
An estimate of TRUE SCORE VARIABILITY.
RELEVANCE?
When test items actually measure or contribute to achieving the stated GOALS of testing.
How do you determine Relevance in testing?
CONTENT APPROPRIATNESS, EXTRANEOUS ABILITIES (to what extent does the item require knowledge, skills, or abilities outside the domain of interest), AND TAXONOMIC LEVEL (item reflects the approp cogn or abiity level)
ITEM DIFFICULTY INDEX
P = TOTAL PE/(divided) by TOTAL E


PE = passing examiness

E = examinees

ITEM DIFFICULTY INDEX
WHAT ARE THE SHORTCOMINGS OF THE CLASSICAL TEST THEORY?
1. Item & Test Parameters are SAMPLE-DEPENDENT, that is the ITEM DIFFICULTY INDEX AND the RELIABILITY COEFFICIENT are likely to VARY from sample to sample.

2. It's DIFFICULT to EQUATE SCORES OBTAINED on different tests that have been developed on the basis of CLASSICAL TEST THEORY. eg. 50 on Math Test does not equate to 50 on English test.
ITEM DIFFICULTY index (P)

p values:
.50 is optimal

.75 for TRUE/FALSE TEST

-IRT defines Item Difficulty as: the probability that an examinee with a given level of the ability measured by the test will answer the item CORRECTLY
ITEM DISCRIMINATION
A TESTS' ABILITY TO DISCRIMINATE (DIFFERENTIATES) BET EXAMINEES WHO HAD LOW SCORES VS THOSE WHO HAD HIGH SCORES ON THE ENTIRE TEST OR ON AN EXTERNAL CRITERION.

Range: -1.0 to +1.0

Acceptable Range: .35 or higher (.50 maximum discrimination)
MEASURING ITEM DISCRIMINATION
THE DISCRIMINATION INDEX:

D = U - L

U (UPPER SCORING GROUP)

L (LOWER SCORING GROUP)

WHEN D IS = + 1.0, ALL UPPER GROUP SCORED

WHEN D = -1.0, ALL OF LOWER GROUP ANSWERED ITEM CORRECTLY.
ITEM RESPONSE THEORY (IRT) VS. CLASICAL TEST THEORY
IRT ("LATENT TRAIT APPROACH") ADVANTAGES:

1. The item characteristics (parameters) are SAMPLE INVARIANT (the SAME accross different samples).

2. MEASURES SPECIFIC THINGS FOR EXAMPLE AN EXAMINEE'S LEVEL ON A TRAIT BEING MEASURED RATHER THAN JUST A TOTAL SCORE, IT IS POSSIBLE TO EQUATE SCORES FROM DIFFERENT SETS OF ITEMS AND FROM DIFFERENT TESTS. (MEASURES AN INDIVIDUAL'S: status on a latent trait or ability)

3. EASIER TO DEVELOP COMPUTER-ADAPTIVE TETS, IN WHICH THE ADMINISTRATION OF SUBSEQUENT ITEMS IS BASED ON THE EXAMINEE'S PERFORMANCE ON PREVIOUS ITEMS.
ITEM CHARACTERISTICS CURVE (ICC)
-Info the rel. bet an EXAMINEE'S LEVEL on the ABILTY OR TRAIT measured by the test and the PROBABILITY that he or she will RESPOND to the item correctly.

ICC INDICATES:

-THE DIFFICULTY LEVEL (position of the curve)

-DISCRIMINATION (steepness of slope)

-PROBABILTY OF GUESSING CORRECTLY (point at which the curve intercepts the vertical axis)
METHODS FOR ASSESSING RELIABILITY
1. TEST-RETEST reliability(Coefficient of STABILITY)
2. ALTERNATE/PARALLELL FORMS (Coefficient of EQUIVALENCE)reliability
3. INTERNAL CONSISTENCY (Coefficient of INTERNAL CONSISTENCY) reliability
-SPLIT-HALF (SPEARMAN-BROWN)
-COEFFICENT ALPHA (KR-20)

(remember that split-half and other forms of internal consistency reliability overestimate the reliability of a speed test)

(Internal consistency reliability is generally not used to assess the reliability of speed tests because it produces a spuriously high reliability coefficient)

4.INTER-RATER (KAPPA STAT)
RELIABILTY COEFFICIENT
A MEASURE OF TRUE SCORE VARIABILITY

0.0 to +1.0

0.0 = MEASUREMENT ERROR
+1.0 = TRUE SCORE VARIABILITY

NOTE:

-NEVER SQUARED VS. OTHER COEFFICIENTS

-reliability vs. validity:

reliability = consistency, stabililty
validity = assess what it is designed to measure
TEST-RETEST RELIABILITY
TEST-RETEST RELIABILITY INVOLVES TESTING THE SAME GROUP AT DIFF TIMES. IT'S COEFFICIENT IS AKA: DEGREE OF STABILITY (CONSISTENCY) OF EXAMINEES OVER TIME, YIELDS A COEFFICIENT OF STABILITY.

NOT GOOD FOR: MOOD TESTS
GOOD FOR: APPTITUDE TESTS
ALTERNATIVE (EQUIVALENT, PARRELLEL) FORMS RELIABILITY
-ASSESS CONSISTENCY OF RESPONDING TO DIFF ITEM SAMPLES (eg diff test forms)

-ALTERNATIVE FORMS RELIABILITY ENTAILS ADMINISTERING 2 FORMS OF THE TEST TO THE SAME GROUP OF EXAMINEES AND CORRELATING THE 2 SETS OF SCORES.

-BEST METHOD OF ALL, MOST THOROUGH METHOD

-AKA: Coefficient Of Equivalence
INTERNAL CONSISTENCY RELIABILITY
INTERNAL CONSISTENCY RELIABILITY:

~SPLIT-HALF RELIABILITY underestimates sometimes
(so use: SPEARMAN-BROWN)

^KUDER-RICHARDSON FORMULA 20 (KR-20)for items scored dichotomously (right or wrong) (inappropriate for speeded tests, it produces a spuriously high reliability coefficient)
(KR20 is linked to COEFFICIENT ALPHA)

NOT GOOD FOR SPEEDED TESTS
INTER-RATER (INTER-SCORER, INTER OBSERVER) RELIABIILTY
-KAPPA STATISTIC

-CALCULATING a PERCENT AGREEMENT bet 2 raters

-nominal or ordinal data
FACTORS THAT AFFECT THE RELIABILTY COEFFICIENT
1. TEST LENGTH: LARGER SAMPLE IS IDEAL OR TEST LENGTH THE LARGER THE TEST'S RELIABILITY COEFFICIENT.

2. RANGE OF TEST SCORES: UNRESTRICTED RANGE IS IDEAL (heterogeneous examiness is also ideal, also difficulty level is in the mid-range or p = .50)

3. GUESSING: IF TEST TAKERS PROBABILITY OF GUESING INCREASES, THE RELIABILITY COEFFICIENT DECREASES.
STANDARD ERROR OF MEASUREMENT IS USED...
~TO DETERMINE OR CONSTRUCT CONFIDENCE INTERVALS.

~INDEX OF MEASUREMENT ERROR.

~Used to construct confidence interval around an examinee's OBTAINED test score.
METHOD FOR ASSESSING CONSTRUCT VALIDITY:

CONVERGENT VALIDITY
HIGH CORRELATIONS WITH MEASURES OF THE SAME CHARACTERISTICS/TRAIT.

Note: Evidence of Convergent Validity is found when the MONOTRAIT-HETEROMETHOD COEFFECIENT IS LARGE.
DISCRIMINANT (DIVERGENT) VALIDITY
LOW CORRELATIONS WITH MEASURES OF UNRELATED CHARACTERISTICS/TRAITS.

Note: Evidence of Discriminant validity is found when the Heterotrait-Monomethod Coeffecient is SMALL.
What is the MULTITRAIT-MULTIMETHOD MATRIX used for?
-Assess a test's CONVERGENT & DISCRIMINANT validity.

-4 types of correlation coefficients.
CONTENT VALIDITY
-ADEQUATE & REPRESENTATIVE SAMPLE of the target domain.

-ADEQUATELY SAMPLES the CONTENT or BEHAVIOR DOMAIN that it is designed to measure.

-Associated mostly with ACHIEVEMENT-TYPE TESTS that measure knowledge of one or more domains.

NOTE: NOT FACE VALIDITY
Is FACE VALIDITY (FV) real? What happens if there is no FV?
NO. It is not an actual type of VALIDITY.

Is FV real?

But if there is no FV, examinees may not answer honestly.

What happens if there is no FV?
What are CONSTRUCTS?
HYPOTHETICAL TRAITS

What are CONSTRUCTS?

E.G.

-IQ
-MECHANICAL APTITUDE
-SELF-ESTEEM
-NEUROTICISM
What is CONSTRUCT VALIDITY?
When a test measures the HYPOTHETICAL TRAIT or CONSTRUCT it supposed to measure.

WHAT IS CONSTRUCT VALIDITY?
What are the various ways to establish CONSTRUCT VALIDITY?
1. Assessing the test's INTERNAL CONSISTENCY

2. STUDY GROUP DIFFERENCES

3. CONDUCTING RESEARCH & test HO

4. Assess the test's CONVERGENT & DISCRIMINANT VALIDITY

5. ASSESSING THE TEST'S FACTORIAL VALIDITY

What are the various ways to establish CONSTRUCT VALIDITY?
Describe CONVERGENT VALIDITY.
HIGH CORRELATIONS with measures of the SAME TRAIT.

A description of CONVERGENT VALIDITY.
DISCRIMINANT (DIVERGENT) VALIDITY
LOW CORRELATIONS WITH MEASURES OF UNRELATED CHARACTERISTICS PROVIDE EVIDENCE OF A TEST'S DISCRIMINANT VALIDITY
HOW DO YOU ASSESS CONVERGENT AND DISCRIMINANT VALIDITY?
MULTITRAIT-MULTIMETHOD MATRIX
MONTRAIT-MONOMETHOD COEFFICIENTS
~Indicates the correlation between a MEASURE & ITSELF

~SHOULD BE LARGE
MONOTRAIT-HETEROMETHOD COEFFICIENTS
~Indicates the correlation between DIFFERENT MEASURES of the SAME TRAIT

~Provides evidence of *CONVERGENT VALIDITY

~SHOULD BE LARGE
HETEROTRAIT-MONOMETHOD COEFFICIENTS
~The correlation between DIFFERENT TRAITS that have been measured by the SAME METHOD

~results in DISCRIMANT VALIDITY

~NEEDS TO BE SMALL
HETEROTRAIT-HETEROMETHOD COEFFICIENTS
~Indicates the correlation between DIFFERENT TRAITS that have been measured by DIFFERENT METHODS

~ Results in DISCRIMINANT VALIDITY

~NEEDS TO BE SMALL
CRITERION-RELATED VALIDITY
-USE WHEN TEST (X) SCORES WILL BE USED TO PREDICT SCORES ON SOME OTHER MEASURE or CRITERION (Y) AND IT IS THE SCORES ON Y THAT ARE OF MOST INTEREST

-refers to the relationship between test scores and a criterion measure.
ASSESSING CONSTRUCT VALIDITY
CONVERGENT AND DISCRIMINANT VALIDITY METHODS
CONVERGENT VALIDITY
LARGE MONOTRAIT-HETEROMETHOD COEFFICIENT
DISCRIMINANT VALIDITY
SMALL HETEROTRAIT-MONOMETHOD AND SMALL HETEROTRAIT-HETEROMETHOD COEFFICIENT
FACTOR ANALYSIS
IS CONDUCTED TO IDENTIFY THE MINIMUM NUMBER OF COMMON FACTORS (DIMENSIONS) REQUIRED TO ACCOUNT FOR THE INTERCORRELATIONS AMONG A SET OF TESTS, SUBTESTS, OR TEST ITEMS.

USED TO ASSESS A TEST'S CONSTRUCT VALIDITY
5 STEPS OF FACTOR ANALYSIS
1. ADMINISTER SEVERAL TESTS TO A GROUP OF EXAMINEES.

2. CORRELATE SCORES ON EACH TEST WITH SCORES ON EVERY OTHER TEST TO OBTAIN A CORRELATION (R) MATRIX

3. USING ONE OF SEVERAL AVAILABLE FACTOR ANALYTIC TECHNIQUES, CONVERT THE CORRELATION MATRIX TO A FACTOR MATRIX.

4. SIMPIFY THE INTERPRETATION OF THE FACTORS BY "ROTATING" THEM.

5. INTERPRET AND NAME THE FACTORS IN THE ROTATED FACTOR MATRIX.
FACTOR LOADINGS
-Correlation Coefficents that indicate the DEGREE of ASSOCIATION Between each TEST & EACH FACTOR.

-A factor loading can be interpreted by SQUARING it to obtain a measure of shared variability. When the factor loading is .70, this means that 49% (.70 squared) of variability in the test is accounted for by the factor.
COMMUNALITY
INDICATES "COMMON VARIANCE" OR THE AMOUNT OF VARIABLITY IN TEST SCORES THAT IS DUE TO THE FACTORS THAT THE TEST SHARES IN COMMON, TO SOME DEGREE, WITH THE OTHER TESTS INCLUDED IN THE ANALYSIS.

OR

A TEST'S COMMUNALITY INDICATES THE TOTAL AMOUNT OF VARIABILITY IN TEST SCORES THAT IS EXPLAINED BY THE IDENTIFIED FACTORS.

or

the proportion of variance accounted for by multiple factors in a single variable
TWO TYPES OF ROTATION
-ORTHOGONAL AND OBLIQUE

-The reason for ROTATION in FACTOR ANALYSIS IS TO INTERPRET THOSE FACTORS.

-Rotation ALTERS the FACTOR LOADINGS for each variable and the eigenvalue for each factor (although the total of the eigenvalues remains the same).
ORTHOGONAL rotation
~The resulting FACTORS are UNCORRELATED

~A TEST'S COMMUNALITY CAN BE CALCULATED FROM ITS FACTOR LOADINGS.

~THE COMMUNALITY IS EQUAL TO THE SUM OF THE SQUARED FACTOR LOADINGS.
OBLIQUE rotation
The resulting factors are CORRELATED and the ATTRIBUTES measured by the FACTORS are NOT INDEPENDENT.

THE SUM OF THE SQUARED FACTOR LOADINGS EXCEEDS THE COMMUNALITY.
WHEN YOU SQUARE A FACTOR LOADING WHAT TYPE OF MEASURE DOES IT PROVIDE?
A MEASURE OF "SHARED VARIABILITY"


SQUARING A FACTOR LOADING
IN FACTOR ANALYSIS, WHEN FACTORS ARE ORTHOGONAL, HOW DO YOU CALCULATE THE TEST'S COMMUNALITY?
SQUARE AND ADD THE TEST'S FACTOR LOADINGS.

ORTHOGONAL ROTATIONS
CRITERION-RELATED VALIDITY
CRITERION-RELATED VALIDITY IS OF INTEREST WHENEVER TEST SCORES ARE TO BE USED TO DRAW CONCLUSIONS ABOUT AN EXAMINEE'S LIKELY STANDING OR PERFORMANCE ON ANOTHER MEASURE.

Note: Criterion-Related Validity Coefficient,.20 OR .30 IS ACCEPTABLE (never exceeds .60)
2 TYPES OF CRITERION-RELATED VALIDITY
1. CONCURRENT VALIDITY

2. PREDICTIVE VALIDITY
2 TYPES OF CONSTRUCT VALIDITY
1. CONVERGENT VALIDITY

2. DISCRIMINANT (DIVERGENT) VALIDITY
CONCURRENT VALIDITY
WHEN CRITERION DATA ARE CORRELATED PRIOR TO OR AT ABOUT THE SAME TIME AS DATA ON THE PREDICTOR, THE PREDICTOR'S CONCURRENT VALIDITY IS BEING ASSESSED.

Concurrent validity would be how does the WISC correlate
with the WJ Cog., the Stan. Bin, etc. Predictive would be how the WISC correlates
with academic achievement (e.g., WIAT, WRAT), job performance, etc.
When is PREDICTIVE VALIDITY EVALUATED?
When the criterion is measured some time AFTER the predictor has been administered.
THE STANDARD ERROR OF ESTIMATE
USED TO CONSTRUCT A CONFIDENCE INTERVAL AROUND A PREDICTED (ESTIMATED) CRITERION SCORE.

68% = 1 STANDARD ERROR

95% = 2 STANDARD ERRORS

99% = 3 STANDARD ERRORS
STANDARD ERROR OF ESTIMATE
USED TO CONSTRUCT A CONFIDENCE INTERVAL AROUND AN ESTIMATED (PREDICTED) SCORE.
WHY WOULD YOU WANT INCREMENTAL VALIDITY? WHAT DO YOU NEED TO ESTIMATE A TEST'S INCREMENTAL VALIDITY?
-IT INCREASES DECISION-MAKING ACCURACY.

-The selection ratio, base rate, and validity coefficient are used to estimate a test's incremental validity using the Taylor-Russell tables.

INCREMENTAL VALIDITY = + HIT RATE - BASE RATE
HOW DO YOU ESTIMATE INCREMENTAL VALIDITY?
INCREMENTAL VALIDITY = + HIT RATE - BASE RATE
What is the BASE RATE FORMULA?
BASE RATE = TRUE POSITIVES + FALSE NEGATIVES DIVIDED BY TOTAL NUMBER OF PEOPLE

BASE RATE FORMULA
What is the + HIT RATE FORMULA?
+ HIT RATE = TRUE POSITIVES DIVIDED BY TOTAL POSITIVES

+ HIT RATE FORMULA
What is the PREDICTOR?
- (X, IV)

- DETERMINES IF A PERSON IS POSITIVE OR NEGATIVE

PREDICTOR
What is the CRITERION ?
- (Y, DV)

- DETERMINES IF HE/SHE IS A "TRUE" OR A "FALSE"

CRITERION
RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY
LOW RELIABILITY = NO HIGH DEGREE OF CONTENT, CONSTRUCT, OR CRITERION-RELATED VALIDITY

HIGH RELIABILILTY = DOES NOT GUARANTEE VALIDTY
CORRECTION OF ATTENUATION FORMULA
-A WAY OF ESTIMATING WHAT A PREDICTOR'S VALIDITY COEFFICIENT WOULD BE IF THE PREDICTOR AND/OR THE CRITERION WERE PERFECTLY RELIABLE (E.G. HAD A RELIABILITY COEFFICIENT OF 1.0).

-useful for estimating a validity coefficient when measurement error has affected the magnitude of the coefficient

-The correction for attenuation formula is used to estimate a test's validity coefficient when the reliability coefficient for the predictor and/or criterion has been increased to 1.0.
CRITERION CONTAMINATION
-TENDS TO INFLATE THE RELATIONSHIP BETWEEN A PREDICTOR AND A CRITERION, RESULTING IN AN ARTIFICIALLY HIGH CRITERION-RELATED VALIDITY COEFICIENT

-Criterion contamination occurs when a rater's knowledge of a ratee's performance on a predictor biases his/her ratings of the ratee on the criterion.

-Is of concern when the measure of performance is subjectively scored.
CROSS-VALIDATION
THE CROSS-VALIDATION COEFFICIENT TENDS TO "SHRINK" OR BE SMALLER THAN THE ORIGINAL COEFFICIENT.

THE SMALLER THE INITIAL VALIDATION SAMPLE, THE GREATER THE SHRINKAGE OF THE VALIDITY COEFFICIENT WHEN THE PREDICTOR IS CROSS-VALIDATED.
What are the different types of NORM-REFERENCED INTERPRETATION?
-PERCENTILE RANKS

-STANDARD SCORES

-IT TELLS YOU HOW WELL AN INDIVIDUAL IS DOING COMPARED TO OTHER INDIVIDUALS

NORM-REFERENCED INTERPRETATION
PERCENTILE RANKS
EXPRESSES AN EXAMINEE'S RAW SCORE IN TERMS OF THE PERCENTAGE OF EXAMINEES IN THE NORM SAMPLE WHO ACHIEVED LOWER SCORES

A PR OF 88 MEANS 88% OF THE PEOPLE IN A NORM SAMPLE OBTAINED SCORES LOWER THAN THE APPLICANT'S SCORE.

PR ARE A NONLINEAR TRANSFORMATION BEC DISTRIB IS ALWAYS FLAT (rectangular) REGARDLESS OF THE SHAPE OF THE RAW SCORE DISTRIBUTION.
STANDARD SCORES
AN EXAMINEE'S POSITION IN THE NORMATIVE SAMPLE IN TERMS OF STANDARD DEVIATIONS FROM THE MEAN.

ZSCORES; MEAN IS EQUAL TO 0, THE SD IS EQUAL TO 1,

TSCORES; MEAN OF 50; SD OF 10

(eg 97% of scores fall below the score that is two standard deviations above the mean.)
CRITERION-REFERENCED INTERPRETATION/SCORES (eg PERCENTAGE SCORES)
INVOLVES INTERPRETING SCORES IN TERMS OF A PRESPECIFIED STANDARD.

E.G. PERCENTAGE SCORE (OR PERCENT CORRECT), INDICATES THE PERCENTAGE OF THE TEST CONTENT THAT AN EXAMINEE ANSWERED CORRECTLY.

-Indicates how much of the test content(s) the examinee MASTERED.

-IT TELLS YOU HOW MANY ITEMS AN INDIVIDUAL GOT

NOTE: When the goal of testing is to determine the amount of content an individual has mastered, criterion-referenced (or content-referenced) scores are most useful.
NORM-REFERENCED INTERPRETATIONS
-PERCENTILE RANKS
-STANDARD SCORES


NORM-REFERENCED INTERPRETATIONS
CRITERION-REFERENCED INTERPRETATIONS
-PERCENTAGE SCORES

-REGRESSION EQUATION

-EXPECTANCY TABLE

CRITERION-REFERENCED INTERPRETATIONS
PERCENTAGE SCORES
-Indicates how much of the test content(s) the examinee MASTERED.

NOTE: When the goal of testing is to determine the amount of content an individual has mastered, criterion-referenced (or content-referenced) scores are most useful.
Lords Chi Square is used to:
-A way to evaluate the DIFFERENTIAL ITEM FUNCTIONING (DIF) of an item included in a test. (DIF AKA ITEM BIAS)

-Occurs when one group responds differently to an item than another group even though both groups have similar levels of the latent trait (attribute) measured by the test.

-Several statistical techniques are used to evaluate DIF. Lord’s chi-square is one of these techniques.
KUDER-RICHARDSON FORMULA 20
-Measures INTERNAL CONSISTENCY

-A HIGH KR-20 COEFFICIENT indicates = a HOMOGENEOUS test.
EIGEN VALUES ARE ASSOCIATED WITH
PRINCIPLE COMPONENT ANALYSIS

Eigenvalues can be calculated for each component "extracted" in a principal component analysis.
TEST-RETEST RELIABILITY?
COEFFICIENT OF STABILITY
ALTERNATE/PARALLELL FORMS RELIABILTY?
COEFFICIENT OF EQUIVALENCE
INTERNAL CONSISTENCY RELIABILITY?
COEFFICIENT OF INTERNAL CONSISTENCY
SPLIT-HALF?
SPEARMAN-BROWN
COEFFICENT ALPHA?
KR-20
INTERNAL CONSISTENCY RELIABILITY
-COEFFICIENT OF INTERNAL CONSISTENCY

-TWO TYPES:


-SPLIT-HALF (SPEARMAN-BROWN)
-COEFFICENT ALPHA (KR-20)
IPSATIVE SCORES
TELLS YOU THE RELATIVE STRENGTHS OF THE DIFFERENT CHARACTERISTICS MEASURED BY A TEST.