Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
84 Cards in this Set
- Front
- Back
A good test item is
|
reliable, valid and discriminates between test takers
|
|
Item that discriminate well
|
1. are correctly scored by high scorers
2. incorrectly scored by low scorers, or 3. show the opposite response patterns are not good items |
|
Revising and discarding of faulty items to improve the reliability and validity of the test is known as what?
|
Item analysis
|
|
The identification of good items involves the analysis of
|
individual items and overall test-scores
|
|
The analytic tools that test developers use to analyse and select items include indices of:
|
1. item difficulties
2. item reliability 3. item validity 4. item discrimination |
|
The proportion of test takers who answered the item correctly (p) is called what?
|
item-difficulty index
|
|
To ensure discriminability it is preferable to have a mix of items whose pass rates average out to about what?
|
p = .5
|
|
In ability testing the two item characteristics of interest are
|
difficulty and discrimination
|
|
Item difficulty is estimated in terms of what?
|
pass rate
|
|
Item difficulty depends what?
|
the ability level of the group of test takers
|
|
The larger the item difficulty index (p) the
|
easier the item is
|
|
The item-difficulty index is used in the _________ context whereas item-endorsement index is used other contexts such as personality.
|
achievement
|
|
When assessing an item for the impact of guessing the optimal item difficulty is calculated as half
|
the probability of guessing correctly + 1
|
|
When accounting for guessing what is the optimal difficulty level of a multiple choice item with 5 possible responses?
|
0.6
|
|
The item-reliability index is an indication of what?
|
the internal consistency of a test
|
|
What are the two ways of ascertaining the item-reliability index?
|
Calculation of the item-reliability and calculation of the test/scale reliability
|
|
The product of the item score standard deviation and the correlation between the item score and the total test score results in what?
|
The individual item-reliability index
|
|
An item with a low item-reliability index suggests that the item is
|
less homogeneous
|
|
Inter item consistency (internal consistency reliability) measures the ________ of a test/scale or clusters of items withing the test/scale.
|
homogeneity
|
|
Internal consistency of a dichotomous scale is calculated using the ______________. Internal consistency of a nondichotomous scale is calculated using the _________.
|
Kuder-Richardson Formula 20
Cronbach Coefficient Alpha |
|
Item-Validity Index is a statistic designed to provide an indication of the
|
degree to which a test measures what it purports to measure
|
|
The higher the item-validity index the greater the what?
|
tests criterion-related validity
|
|
The item-validity index can be calculated once which two statistics are known?
|
1. the item-score standard deviation
2. the correlation between the item score and the criterion score |
|
Items that correlate well with the criterion will produce
|
a test that correlates well with the criterion and therefore predict it.
|
|
Item discrimination refers to the degree to which
|
an item differentiates correctly among test-
takers in the behaviour that the test is designed to measure. |
|
Measures of item discrimination indicate how adequately an item does what?
|
separates or discriminates between high scorers and low scorers on an entire test.
|
|
What are the main two approaches to determining item discrimination?
|
1. calculating item-total correlations; and
2. calculating an index of discrimination. |
|
The correlation between each item and the total test score is known as
|
Item-total correlation
|
|
The difference in pass rate on an item between the high ability group (i.e., ~ top 27% of test-takers) and the low ability group (i.e., the bottom 27% of test-takers) is known as the
|
The Index of Discrimination
|
|
A negative d-value on a particular item is a red flag because it indicates that low-scoring test-takers are more likeley to what?
|
answer the item correctly than high-scoring test-takers
|
|
Using item discrimination if U=30, L=10, n=32 and d =.63 what does this mean?
|
Out of 32 high and low test takers (n=32), 30 (U=30) high test takers and 10 (L=10) low test takers answer the item correctly. d=.63 is the item discrimination index and indicates that this item moderately discriminates.
|
|
A graphic representation of item difficulty and discrimination is called a
|
item-characteristic curve
|
|
In an item-characteristic curve the steeper the curve the
|
greater the discrimination
|
|
On an item-characteristic curve, competency based assessments would have a _____ _____ shape
|
right angle
|
|
Guessing interferes in item discrimination because
|
the developer cannot be sure that the discrimination represents low and high ability
|
|
Test fairness is undermined when items
|
favour a particular group of test takers
|
|
Item-characteristic curve can be used in assessing fairness by comparing the _____ of the curve for different groups.
|
shape
|
|
Item analysis of tests taken under speed conditions yield
|
misleading or uninterpretable results.
|
|
Speed tests pose difficulties in item analysis because they
|
introduce not-item reasons for why an item can be failed.
|
|
A remedy for conducting item analysis on speeded test is to
|
conduct initial testing and item analysis under non-speeded conditions and introduce speeded conditions to establish norms.
|
|
Qualitative item analysis uses
|
non-statistical procedures to analyse items
|
|
Qualitative methods of item analysis involve exploration
|
of the issues through verbal means such as interviews and group discussion conducted with test takers and other relevant parties.
|
|
Two popular qualitative approaches to item analysis are:
|
Think aloud and expert panels
|
|
The think aloud test administration is a qualitative research tool that yields valuable insights regarding
|
the way individuals perceive, interpret and respond to items.
|
|
A sensitivity review is a study of test items in which the items are examined for
|
fairness to all prospective test takers and for the presence of offensive language, stereotypes, or situations.
|
|
A sensitivity review is a type of
|
expert panel for qualitative item analysis
|
|
Maximum-performance Tests of intelligence, aptitude, and achievement where it is assumed that all examinees are equally highly motivated are referred to as
|
Maximum-performance tests
|
|
All aptitude tests imply
|
prediction
|
|
Typical-performance tests are a measure of what test takers
|
actually do or how they tend to behave eg personality tests
|
|
The meaning of a test score depend on whether the test is ______-referenced or ____-referenced.
|
criteria
norm |
|
A criterion-referenced test is one in which scores are expressed in terms of the
|
skills or behaviours achieved, rather than in comparison with other people.
|
|
Critieria-referenced are used to identify the level of mastery that has been reached in some
|
domain where skills can be ordered in a hierarchy of difficulty or complexity
|
|
Criterion-referenced testing is common is assessing achievement in
|
educational settings.
|
|
Norm-referenced testing involves
|
comparing the
test-taker with similar others. |
|
Norm-referenced testing uses _______ scores, not ____ scores
|
derived
raw |
|
What are the two reasons for using derived scores?
|
1. for comparison by using the same metric
2. more meaningful interpretations of the test results |
|
An individuals score may show a person doing well or poorly depending on the
|
groups with which he or she is compared.
|
|
A critical issue in norm-referencing is defining the
|
relevant reference or comparison group.
|
|
What are the three principle bases for expressing test results?
|
1. comparison with 'absolute standard' (type 1)
2. inter-individual comparison (type 11) 3. intra-individual comparison (type 111) |
|
Type 1 scores (absolute scores) considers
|
only the individual's performance; the performance of all other examinees is ignored in assigning the score
|
|
Absolute test scores tell us about the test takers
|
knowledge of the content of a domain of interest
|
|
Type 11 - Inter-individual comparison scores are typically used with
|
standardised tests
|
|
What are the 5 sub-types of the Type 11 Inter-individual comparisons?
|
a) linear standard scores
b) rank within groups c) range of scores within a group d) status of those obtaining the same score |
|
Type 11 A (Linear standard) scores are called standard because they are based on the
|
standard deviation
|
|
What 4 properties make Type 11 A (linear standard) scores valuable in research?
|
1. for each test and group each score gives the same mean and standard deviation
2. retain the shape of the raw-score distribution 3. permit inter-group and inter-test comparisons 4. can be treated mathematically |
|
All linear standard scores (type 11 A) indicate the location of an examinee's raw score in relations to
|
the mean of some specified groups in terms of the groups standard deviation
|
|
What are the 3 main types of linear standard score statistics?
|
1. z score
2. t score 3.. deviation IQ scores |
|
Z-scores involve the transformation of
|
raw scores into standard scores, which are then related to the normal distribution.
|
|
What are the main 2 advantages of the z score?
|
1. permit comparison across different tests
2. permit comparison across different test takers |
|
What are the main 2 disadvantages of the z score?
|
1. half z scores are negative
2. all z scores are expressed to one or two decimal places |
|
The mean of the z score is ___, and the standard deviation is __.
|
0
1 |
|
The mean of the t score is ___, and the standard deviation is ___.
|
50
10 |
|
t =
|
10z + 50
|
|
Verbal IQ, Performance IQ and Full Scale IQ are yields of which IQ test?
|
Weschler IQ test
|
|
Weschler IQ scores have a mean of ____ and a standard deviation of ___.
|
100
15 |
|
The formula for the Weschler IQ test is IQ =
|
15z +100
|
|
The Stanford Binet IQ test was revised from the
|
Ratio IQ to Deviation IQ in the 1960's
|
|
Stanford Binet IQ scores have a mean of ____ and a standard deviation of ____.
|
100
16 |
|
Type 11 B (Rank within group) scores are based on the number of people with scores
|
higher or lower than a specified score value
|
|
What are three examples of type 11 B (rank within group) scores?
|
1. Ranks - relative position
2. Percentile ranks - frequency divided into 100 ranks 3. Decile ranks - frequency divided into 10 ranks |
|
Normalised standard scores are derived scores that are assigned
|
standard-score-like values
but are computed from percentile ranks. |
|
While linear standard scores are reproduce the raw score distribution faithfully, normalised standard scores more closely approximate
|
normal probability distribution
|
|
The term 'stanine' is derived from "_______________", and when normally distributed, stanines have a mean of _ and a standard deviation of _.
|
standard score with nine units
5 2 |
|
The term 'sten' is derived from ________ and when normally distributed have a mean of __ and a standard deviation of __.
|
standard scores with ten units
5.5 2 |