Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
49 Cards in this Set
- Front
- Back
(2) are dependent on the quality of the items
|
reliability and validity
|
|
(2) approaches to item analysis
|
qualitative
quantitative |
|
item difficulty index(p) =
only applicable to _______tests ranges from ____to ___ easier items have a ____decimal number harder items have a ____decimal number |
# of examinees correctly answering the item/#examinees
max performance 0-1 larger smaller |
|
T or F
Items with p values of either 0.0 or 1.0 provide no information about individual differences and are of no use from a measurement perspective |
T
|
|
Item difficulty is dependent on the ___
|
sample
|
|
For maximizing variability and reliability, the optimal item difficulty is ____
|
0.50
*Not necessary for ALL items to have a difficulty level of 0.50, Often desirable to select some items with a difficulty level below and above 0.50, but with a mean of 0.50 |
|
-T or F Different levels are desirable in different testing situations
-constructed response items, ___is typically the optimal level? selected response items? |
T
0.50 ; optimal level varies due to influence of guessing |
|
-in criterion reference tests the expectation is that most test takers will..
-common for items to have p values as high as ___ |
eventually be successful
0.90 |
|
Percent Endorsement Statistic=
for ____tests dependent on the ____ |
-percentage of examinees that responded to an item in given manner
-typical response tests -sample |
|
Item Discrimination=
|
how well an item differentiates among test takers who differ on the construct being measured.
|
|
more than than ___ different indexes of item discrimination have been developed
|
50
|
|
-Discrimination Index (D)
-two groups are typically defined in terms of___performance -common approach used? |
-difference in performance between two groups
-total test -select the top and bottom 27% of test takers in terms of their overall performance on the test, and exclude the middle 46%. |
|
calculation of D
|
-difficulty of the item is computed for each group separately, and these are labeled pT and pB (T for top, B for bottom)
-D = pT − pB |
|
example:
pT = 0.80 means that pB = 0.30 means that |
-80% of the examinees in the top group answered the item correctly
-30% of the examinees in the bottom group answered the item correctly |
|
Items with D values over ___ are acceptable
Items with D values below should be reviewed **these are general rules there are exceptions |
-0.30
*larger is better -0.30 |
|
Most indexes of item discrimination are biased in favor of items with ______difficulty levels
|
intermediate
|
|
Items that all examinees either pass or fail (i.e., p values of either 0.00 or 1.0) do not provide any information and their D values will always be _____
|
zero
|
|
If half of the examinees correctly answered an item and half failed (i.e., p value of 0.50) then it is possible for the
item’s D value to be ___ |
1.0
*This does not mean that all items with p values of 0.50 will have D values of 1.0; just that the item can conceivably have a D value of 1.0 |
|
As a result the relationship between p
and D, items that have excellent discrimination power (i.e., D values of 0.40 and above) will necessarily have p values between___and___ |
0.20 and 0.80
|
|
In testing situations where it is desirable to have either very easy or very difficult items, D values can be expected to be ____than those normally desired.
|
lower
|
|
___D values often indicate problems, but these guidelines should be applied in a flexible manner.
|
Low
|
|
-The interpretation of discrimination indexes is also complicated on ___tests
-It is normal for traditional item discrimination indexes to __-estimate an item’s true measurement characteristics |
mastery
under |
|
approach for item discrimination on mastery tests
limitation of this approach |
-administer the test to two groups: one group that has
received instruction and one that has not. D = p instruction − p no instruction -limited due to difficulty locating group that has not received instruction on relevant material |
|
-another approach for item discrimination on mastery tests
-limitations |
-administering the test to the same sample twice, once before instruction and once after instruction.
D = p posttest − p pretest -Requires that the test be used as both a pretest and posttest, may involve carryover effects |
|
-another approach for item discrimination on mastery tests
|
-Use item difficulty values based on the test takers who reached the mastery cut-off score and those who did not reach mastery
D = pmastery − pnonmastery |
|
Item-Total Correlation Coefficients
Large item-total correlation suggests |
-Item discrimination can also be examined by correlating performance on the items (scored as either 0 or 1) with the total test score. calculated using the point-biserial correlation
-that an item is measuring the same construct as the overall test measures |
|
Item Difficulty and Discrimination on Speed Tests:
-Item performance depends largely on the -Measures of item difficulty and discrimination will reflect _______________, rather than the item’s actual difficulty level or ability to discriminate. |
speed of performance
-the location of the item in the test |
|
Allows you to examine how many examinees in the top and bottom groups selected each option on a multiple choice item.
|
distractor analysis
|
|
effective distractor attracts more examinees in the ___group and demonstrates ____discrimination
|
bottom
negative |
|
qualitative item analysis (4 tips)
|
-Set the test aside and review a few days later
-Have a colleague review the test -Have examinees provide feedback after taking the test -use both quantitative and qualitative approaches |
|
Theory of mental measurement that holds that the responses to items are accounted for by latent traits
|
Item Response Theory
|
|
_____is an ability or characteristic that is inferred based on theories of behavior, as well as empirical evidence, but cannot be assessed directly.
|
latent trait
|
|
Central to IRT is a complex mathematical model that describes
|
how examinees at different levels of ability will respond to individual test items.
|
|
Item Characteristic Curves (ICC):
graph with _____reflected on the horizontal axis and the ________ reflected on the vertical axis |
ability
probability of a correct response |
|
T or F
Each item has its own specific ICC |
T
|
|
ICCs incorporate information about(2)
|
item’s difficulty and discrimination ability
|
|
ICC:
point halfway between the lower and upper asymptotes is referred to as the ____ |
inflection point
|
|
inflection point represents what?
|
difficulty of the item (b parameter)
|
|
ICC:
Discrimination (i.e., the a parameter) is reflected by |
the slope of the ICC at the inflection point
|
|
ICCs with ____slopes demonstrate better
discrimination than those with ___slopes |
steeper
gentler |
|
the "simplest model" also referred to as a one-parameter IRT model
|
Rasch IRT Model
|
|
Rasch IRT Model assumes that items differ in only one parameter, which parameter?
|
difficulty (b parameter)
|
|
Two-Parameter IRT Model assumes that items differ in both (2)
|
difficulty and discrimination
|
|
which one, one parameter IRT or two-parameter IRT model, better reflects real life test development applications
|
Two-Parameter IRT Model
|
|
Three-parameter model assumes that
|
even if the respondent essentially has no “ability,” there is still a chance he or she may answer the item correctly simply by chance
|
|
In both the one- and two-parameter IRT models, the ICCs asymptote toward a probability of ___
this assumes that essentially a ____percent possibility of answering the items by chance |
zero
zero |
|
Item difficulty (i.e., p) and item discrimination (i.e., D) are based on _____theory
|
classical test (meaning both are sample dependent)
|
|
In IRT, the parameters of items (e.g., difficulty and discrimination) are sample-_____
|
free/independent
|
|
(4) Special Applications of IRT
|
• Computer Adaptive Testing (CAT)
• Scores based on IRT (see Chapter 3) • Reliability (see Chapter 4) • Detecting biased items |