• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/23

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

23 Cards in this Set

  • Front
  • Back
Item analysis
Procedures used to assess and index of the item's:
- Difficulty
- Reliability
- Validity
- Discriminiation

- E.g., you have 100 pool items and you have administered this test to tryout to 100 uni students. Now you are at the stage where you need to analyse each item.
The Item-Difficulty Index
- obtained by calculating the proportion of the total number of testtakers that answered the item correctly
- The larger the item difficulty index, the easier the item is
Item-Endorsement Index
- Measure of the proportion of testtakers that agreed with an item (e.g., in a personality test)
Average item difficulty
- Calculated by adding all the deparate item difficulties and dividing this sum by the number of items
Optimum item dificulty
- Must take into account the probability of guessing
- Optimum item difficulty is mid point between the 1.00 and the chance of success portion

- e.g., 5-choice MC item: 0.2 + 1.00 / 2 = 0.6 (optimum item difficulty)
Item-Reliability Index
- Provides an indication of the internal consistency of the test
- Measures using factor analysis: do all the items "tap in to" the same factor?
Item-Validity Index
- Provides an indication as to whether the item is measuring what it purports to measure
- Higher the item-validity index, the higher the item's criterion-related validity
- Complex formula (refer to TB pg 265)
The Item-Discrimination Index
- Indication of how accurately an item discriminated high scorers from low scorers
- Symbolised by lower case 'd' in italics
- Compared performance on an item with performance in the upper and lower regions of the distribution
- Optimal boundary lines for upper and lower regions = 27% of the distribution scores, given it is a normal distribution
- Any percenage between 25 and 33% will yield similar results
Analysis of item alternatives
- Not stastical, more of an eyeballing technique
- Looking at the response patterns to see proportion of high scorers and low scorers that got the item correct
- If more low scorers than high scorers got the item correct, this indicates a bad item
- If a lot of high scorers chose an alternative, this would indicate something is wrong with the test item
Item characteristic curve
A graphic representation of item difficulty and discrimination
- Ability plotted on the horizontal axis and probability of correct respnonse is plotted on the vertical axis
Other considerations for item analysis:
(1) Guessing
- Guesses can be based on a vague idea (i.e., some knowledge), therefore item cannot be completely discarded
- Should ommitted items be scored as wrong?
- Some testtakers are luckier than others when it comes to guessing

(2) Item fairness
- An item can be biased (e.g, one that favours people with experience studying history in a psychology test)
- Item characteristc curves can be used to identify biased items

(3) Speed tests:
- Can yeild misleading and uninterpretable results
- Items at the end of the test are more likely to appear difficult, probably because not as many people made it to the end
- If speed is not an important ability for the variable being assessed then ample time should be given for testtakers to complete the test
Qualitative Item Analysis
- Qualitiative methods are techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures

(e.g., getting testtakers tobe verbal when completing the test and commenting on difficulty/ their attitude towards the test item)
Think aloud test administration
When testtakers commentate while completing the test
Sensitivity review
A study of test items that assesses the fairness and sensitivity of a test.

- Usually done using expert panels
Test revision
- Moulding the test into its final form
- One approach is to characterize each item according to its strengths and weaknesses e.g., some items are reliability, but lack criterion-related validity
- Test developers must balance various strengths and weaknesses across items
Test revision of an existing test
- No hard and fast rules as to when a test should be revised

Should be revised when:
- The materials look dated
- Verbal content of test is dated
- Test norms are no longer adequate
- The reliability and validity of the test can be significnatly improved by revision
Cross-validation
Revalidation of a test on a sample of testtakers other than hose on whom test performance was originally found to be a valid predictor of some criterion
Validity shrinkage
The inevitable decrease in item validity the occurs due to cross validation
Co-validation a.k.a. co-norming
Validation processes conducted on two or more tests using the same sample
Quality assurance during test revision
- Not all test desvelopers hold a doctoral degree
- Publishers can evaluate potential examiners by using a quiz etc.
Anchor protocol
A test protocol scored by a highly authoritative scorer that is designed as a model of scoring and a mechanism for resolving scoring discprepancies.
Scoring drift
A discrepancy between scoring in an anchor protocol and the scoring of another protocol
Item Response Theory (IRT)
*Research this