• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/19

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

19 Cards in this Set

  • Front
  • Back
What are some reasons for Item Analysis?
-Provide validity evidence (e.g., quality of scores)
-Identify examinees’ misconceptions or misunderstandings
Content analysis of responses
Feedback to examinees
Feedback to teachers
-Identify flaws in test items
Inter-item correlation
Item-total correlation
Alpha if item eliminated
Analysis of distracters
Item difficulty and discrimination
-Revision of assessment tasks/selecting best items
What is reliability?
Reliability: The amount of measurement error associated with the scores.
The extent to which all the items are consistent with each other.
Inter-item correlation
Item-total correlation
Alpha if item deleted
What is speededness?
Speededness: The percentage of students who respond to the last item on the assessment.
If the test is not speeded, 90% of the students should be able to finish the test.
What is Analysis of the Distracters?
The extent to which each of the item options in your question are working properly.
What is difficulty?
The proportion of students who get the item correct.
What is Discrimination?
The extent to which items distinguish between those who have acquired the content and those who haven’t.
What is reliability analysis and how do we do this?
-It is important to assess the reliability of the entire scale.
-It is also important to assess what the reliability of the scale would be if each of the items were deleted.
-We do this in three ways:
Inter-item correlation
Item-total correlation
Alpha if item deleted
What is inter item correlation?
-Matrix displaying correlation of each item with every other item.
-Provides information about internal consistency of test.
-Each item should be correlated highly with other items assessing the same construct.
-Items that do not correlate highly with other items assessing the same construct can be dropped without reducing the test’s reliability.
What is Item Total Correlation?
-The correlation between the score on each item and the total score obtained on the entire instrument.
-Interpretation of the item-total correlation illustrates two things:
-That the item in question is measuring the same construct as the test.
-That the item is successfully discriminating between those who perform well and those who perform poorly.
-If the item is assessing the same construct as others in the scale, the correlation will be high.
-This is somewhat similar to the Discrimination Index, which we will discuss later.
Alpha if Item deleted?
-The Cronbach’s alpha of the scale if each individual item was deleted from the scale.
-If the item is assessing the same construct as others in the scale, the reliability will not change (or change very little).
-Again, this statistic gives an indication of the internal consistency of the items on the assessment.
When determining Item Difficulty it is important to remember?
-You want items to discriminate between people.
-Thus, you must choose items of a moderate level of difficulty.
-Item difficulty is most commonly measured by calculating the percentage of test-takers who answer the item correctly.
-Difficulty of the item is ideal if it falls halfway between the mean chance score and a perfect score.
Determining Item Difficulty #2?
-Generally, items with p values of 0.5 yield test scores with the most variation.
-Often, items with difficulty levels between 0 – 0.2 and 0.8 – 1.0 are discarded.
Too hard/easy = low reliability = less validity
-You must also take the chance of “guessing” into account.
Determining Item Difficulty #3?
-For norm-referenced tests, we chose those items with moderate levels of difficulty.
-For criterion-referenced tests, it is not appropriate to choose items based on their difficulty.
-Instead, our focus is on adequately sampling the domain.
-If 100% of people get a question correct, who cares? We are interested in how well they do and how well they know the content.
-For this reason, criterion-referenced tests tend to be easier.
Important note about difficulty?
Although p-values provide an indication of how difficult test items are, they tell us very little about the item’s usefulness in measuring the test’s content.
What are the steps to determining Item Descrimination?
-For each item, high achievers should answer correctly and low achievers should answer incorrectly.
-Score the tests, compute total scores, sort from High to Low
-Determine upper and lower groups
e.g., Kelly’s 27% rule
-Calculate the percentage of test-takers passing each item in both groups.
-Calculate the D-index for each item.
-The difference between the percentages of test-takers passing each item
D = U - L
-1 < D < +1
What is the D-Index Value Reference?
D-index value reference:
D > .40: satisfactory/good
.30 < D < .39: require little or no revision
.20 < D < .29: marginal and need revision
D < .19: eliminate or completely revise
What is NR?
NR
Within each topical area of the test blueprint, select those items with:
.16 < p < .84 if the test represents a single ability
.40 < p < .60 if the test represents different abilities
p-value should be larger if guessing is a factor
D > +.30
D-index is secondary to content; p-value is secondary to D-index
What is CR?
CR
Don’t select items on the basis of their p-values.
Examine p-value to see if it is signaling a poorly written item.
D > 0
What is the Analysis of Distractors?
-With MC tests, there is usually one correct answer and a few distracters. A lot can be learned from analyzing the frequency with which test-takers choose distracters.
-Consider that perfect MC questions should have 2 distinctive features:
-People who have knowledge pick the correct answer.
-People who don’t know guess among the possible responses.
-Thus, each distracter should be equally popular.
-The correct answer will also be chosen by those who don’t have knowledge:
# answering correctly = those who know + some random amount of people