• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/67

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

67 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)
This describes the extent to which items contribute to the stated goals of testing
Think, what does it mean for something to be relevant?
The first dimension of Relevance is
CONTENT APPROPRIATENESS
If content appropriate, the item assesses the behavior domain the test is intended to evaluate
Think, Appropriate Behavior
2nd dimension of Relevance: TAXONOMIC LEVEL
Does item reflect appropriate cognitive or ability level of population its intended for?
Think, "The 3 Bears" Not, too hot, not too cold, its "just right"
3rd dimension of Relevance: EXTRANEOUS ABILITIES
To what extent are knowledge or skills needed that is outside the domain being eval'd?
Think about all the other things/thought that interfere with getting your reports done!--Extraneous info. intrudes!
ITEM DIFFICULTY
It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)
When p= 0 noone got em right. When 1, all got correct.
ITEM DISCRIMINATION
Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable
Good section in the notes.
D= + 1 is all in upper and none in lower grp get it right.
D= - 1 If none in upper grp and all in lower grp get question right.
CLASSICAL TEST THEORY
Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
ITEM RESPONSE THEORY (IRT)
Tests based on examinees level on the trait being measured vs. total test score.
Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".
"Item Characteristic Curve".
Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability
Those at "0" are Low ability
High ability are those above 0 and the steeper the slope the better discrimination
RELIABILITY
The ability of a measure to provide consistent, dependable results.
Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees
obtained scores due to true differences among examinees on what's being measured
RELIABILITY COEFFICENT
Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
TEST-RETEST RELIABILITY
Administering the same test to same group on 2 diff. occasions.
Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)
ALTERNATE FORM RELIABILITY
2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.
Think, (Form A/Form B)

Primary source of measurement error is content sampling. Hard to develop truly equiv. forms
INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half
B. Coefficient Alpha
Admin test once to a single group. Coeff. of internal consistency is calculated
Split-Half
2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test
Even-Steven
What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)
This describes the extent to which items contribute to the stated goals of testing
Think, what does it mean for something to be relevant?
The first dimension of Relevance is
CONTENT APPROPRIATENESS
If content appropriate, the item assesses the behavior domain the test is intended to evaluate
Think, Appropriate Behavior
2nd dimension of Relevance: TAXONOMIC LEVEL
Does item reflect appropriate cognitive or ability level of population its intended for?
Think, "The 3 Bears" Not, too hot, not too cold, its "just right"
3rd dimension of Relevance: EXTRANEOUS ABILITIES
To what extent are knowledge or skills needed that is outside the domain being eval'd?
Think about all the other things/thought that interfere with getting your reports done!--Extraneous info. intrudes!
ITEM DIFFICULTY
It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)
When p= 0 noone got em right. When 1, all got correct.
ITEM DISCRIMINATION
Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable
Good section in the notes.
D= + 1 is all in upper and none in lower grp get it right.
D= - 1 If none in upper grp and all in lower grp get question right.
CLASSICAL TEST THEORY
Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
ITEM RESPONSE THEORY (IRT)
Tests based on examinees level on the trait being measured vs. total test score.
Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".
"Item Characteristic Curve".
Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability
Those at "0" are Low ability
High ability are those above 0 and the steeper the slope the better discrimination
RELIABILITY
The ability of a measure to provide consistent, dependable results.
Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees
obtained scores due to true differences among examinees on what's being measured
RELIABILITY COEFFICENT
Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
TEST-RETEST RELIABILITY
Administering the same test to same group on 2 diff. occasions.
Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)
ALTERNATE FORM RELIABILITY
2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.
Think, (Form A/Form B)

Primary source of measurement error is content sampling. Hard to develop truly equiv. forms
INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half
B. Coefficient Alpha
Admin test once to a single group. Coeff. of internal consistency is calculated
Split-Half
2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test
Even-Steven
Cronbach's Coeff. Alpha
A Test is given once and a formula applied to determine reliability. The result--the avg. reliability obtained from all possible splits of the test.
Kuder-Richardson Formula
Variation of Cronbach's when items scored dichotomously (Yes/No)
Inter-Rater Reliability
Reliability determined % of agreement between 2 or more raters. Associate:Kappa Statistic
Standard Error of Measurement
(SEM)
An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of test. The greater the reliability, the smaller the SEM. Know the formula.
Example: The SEM around a WAIS Score of 125 is something like this: A 95% confidence interval is constructed around a score.Essentially, you are saying there is a 95% chance that the true score falls between Score X and Score Y. By adding and subtracting 2 standard errors from person's obtained score falls between these scores you derive this confidence interval.
Standard Error of Estimation(SEE)
Another form of SEM that takes into account regression to the mean. The SEE formula is used in the WAIS and WMS. Centers confidence interval on Estimated True vs. Observed score. Its a correction for tru-score regression toward the mean.
Validity
Test Accuracy—Does it measure what its intended to measure
Validity
Test Accuracy—Does it measure what its intended to measure
Content Validity
The extent a test adequately samples the content or behavior domain it is trying to measure
Think Content Validity when scores on the test are important because they provide information on how much the examinee knows about the content domain being tested (e.g., Achievement Tests).
What is the primary way that content validity is established?
Answer: The judgment of subject matter experts. If experts agree items are adequate and representative, then the test is said to have content validity
What qualitative evidence do you look for in a task that has good content validity?
1)coefficient of internal consistency will be large

2)The test will correlate highly with other tests of the same domain

3)pre-and posttest evaluations of the program designed to increase familiarity with domain will indicate appropriate changes
Construct validity
When the test has been found to measure the trait or hypothetical construct that it is intended to measure
Mnemonic strategy: We hope that good constructions will converge and diverge at the appropriate junctions. Otherwise the building will fall down
In order to establish construct validity, what must occur?
Answer: there needs to have been a systematic accumulation of evidence showing that the test actually measures the construct it was designed to measure--like intelligence, self-esteem
Convergent Validity
One method of evaluating a tests construct validity. One correlates test scores with scores on measures that do and don't purport to assess the same trait. High correlations with measures of the same trait provide evidence of Convergent Validity
Discriminant Validity
Another aspect of Construct Validity...when there are low correlations with measures of unrelated characteristics
Multitrait-Multimethod Matrix
A method of systematically organizing data to assess a test's convergent and discriminant validity. A matrix table is generated and comprised of correlation coefficients. Needs to be two or more traits that a each been assessed using two or
See the notes to understand the various kinds of monomethod coefficients.

Hint: Mono means same and hetero means different. Helps to translate the names of the correlation coefficients contained in the matrix.
Factor Analysis
A statistical analysis conducted to identify minimum number of common factors required to account for intercorrelations among a set of tests, subtests, or test items.
What method could you use to see if there is good construct validity and associated good convergent and discriminant validity?
Factor Analysis
If you're interested in the steps of factor analysis, take a look at the handout.
Question: How do you determine the meaning of a factor loading and the amount of variability in test scores that is explained by the factor?
Square of the correlation coefficient obtained in the factor analysis.
The correlation between a test and a factor is referred to as a what?
A Factor Loading
Communality
Indicates the total amount of variability in test scores that is explained by the identified factors that have been correlated
(e.g., Factor I and Factor II).
Specificity
The portion of true score variability that has not been explained by the factor analysis
Orthogonal Rotation
When a rotation is orthogonal, the resulting factors are uncorrelated. The attribute measured by one factor is independent from the attributes measured by the other factor.
Oblique Rotation
When the rotation is oblique, the resulting factors are correlated and the attributes measured by the factors are not independent.
Criterion-Related Validity
whenever test scores are utilized to draw conclusions about and examinees likely standing or performance on another measure this is especially important.
Associate predictive and concurrent validity with Criterion-related validity. There is a logical mnemonic-- Just think about that criteria are used to predict and criteria are used to concur.
Which type of validity is key in a situation where the goal of testing is to predict how well an applicant will do on a measure of job performance after they are hired?
Answer: Criterion-related validity when the resulting criterion-related validity coefficient is sufficiently large, this confirms at the predictor (or test) has criterion related validity
Concurrent Validity
(associated with criterion related validity)
When criterion data is collected prior to or at the same time as predictor data
Hint: What type of validity should the Mental Status exam have?
Predictive Validity
Criterion data is collected after predictor data. Preferred when purpose of testing is to predict future performance on the criterion.
How do you interpret criterion-related validity coefficients?
You square the correlation coefficient to interpret it only when it represents the correlation between two different tests or other variables.
What is shared variability?
When the correlation between two measures is squared, it provides a measure of shared variability.
E.g., how much variability in X can be accounted for/is explained by Y... just remember to find out the answer, you need to square the correlation coefficient
Standard Error of Estimate(SEE)
Used to construct a confidence interval around day predicted or estimated criterion score. The magnitude of the SEE affected by the standard deviation of the criterion scores and the predictor's criterion related validity coefficient
Incremental Validity
The increase in correct decisions that can be expected if the predictor is used as a decision-making tool. Criterion and predictor cut off scores must be set.
True Positives
Predicted to succeed by the predictor and are successful on the criterion
False positives
Predicted to succeed by the predictor and are not successful on the criterion
True Negatives
Predicted to be unsuccessful by predictor and are unsuccessful on the criterion
False Negatives
Predicted to be unsuccessful by the predictor and are successful on the criterion
Base rate
The proportion of people who were selected without use of the predictor and who are currently considered successful on the criterion
Formula:
True Positives + Fls
Neg
_____________________________

Total number of people
Positive Hit Rate
The True Positives divided by the Total Positives. The positives are people who are i.d. as having the disorder by the predictor. Negatives are people who are not i.d. as having the disorder by the predictor