What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)

This describes the extent to which items contribute to the stated goals of testing

Think, what does it mean for something to be relevant?


The first dimension of Relevance is
CONTENT APPROPRIATENESS 
If content appropriate, the item assesses the behavior domain the test is intended to evaluate

Think, Appropriate Behavior


2nd dimension of Relevance: TAXONOMIC LEVEL

Does item reflect appropriate cognitive or ability level of population its intended for?

Think, "The 3 Bears" Not, too hot, not too cold, its "just right"


3rd dimension of Relevance: EXTRANEOUS ABILITIES

To what extent are knowledge or skills needed that is outside the domain being eval'd?

Think about all the other things/thought that interfere with getting your reports done!Extraneous info. intrudes!


ITEM DIFFICULTY

It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false(.75)

When p= 0 noone got em right. When 1, all got correct.


ITEM DISCRIMINATION

Extent an item differentiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable

Good section in the notes.
D= + 1 is all in upper and none in lower grp get it right. D=  1 If none in upper grp and all in lower grp get question right. 

CLASSICAL TEST THEORY

Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity



ITEM RESPONSE THEORY (IRT)

Tests based on examinees level on the trait being measured vs. total test score.

Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".


"Item Characteristic Curve".

Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability

Those at "0" are Low ability
High ability are those above 0 and the steeper the slope the better discrimination 

RELIABILITY

The ability of a measure to provide consistent, dependable results.

Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees
obtained scores due to true differences among examinees on what's being measured 

RELIABILITY COEFFICENT

Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret



TESTRETEST RELIABILITY

Administering the same test to same group on 2 diff. occasions.

Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)


ALTERNATE FORM RELIABILITY

2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.

Think, (Form A/Form B)
Primary source of measurement error is content sampling. Hard to develop truly equiv. forms 

INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half B. Coefficient Alpha 
Admin test once to a single group. Coeff. of internal consistency is calculated



SplitHalf

2 scores are derived by splitting test into = halves, and are then correlated. Often uses oddeven# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test

EvenSteven


