• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/84

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

84 Cards in this Set

  • Front
  • Back
A good test item is
reliable, valid and discriminates between test takers
Item that discriminate well
1. are correctly scored by high scorers
2. incorrectly scored by low scorers, or
3. show the opposite response patterns are not good items
Revising and discarding of faulty items to improve the reliability and validity of the test is known as what?
Item analysis
The identification of good items involves the analysis of
individual items and overall test-scores
The analytic tools that test developers use to analyse and select items include indices of:
1. item difficulties
2. item reliability
3. item validity
4. item discrimination
The proportion of test takers who answered the item correctly (p) is called what?
item-difficulty index
To ensure discriminability it is preferable to have a mix of items whose pass rates average out to about what?
p = .5
In ability testing the two item characteristics of interest are
difficulty and discrimination
Item difficulty is estimated in terms of what?
pass rate
Item difficulty depends what?
the ability level of the group of test takers
The larger the item difficulty index (p) the
easier the item is
The item-difficulty index is used in the _________ context whereas item-endorsement index is used other contexts such as personality.
achievement
When assessing an item for the impact of guessing the optimal item difficulty is calculated as half
the probability of guessing correctly + 1
When accounting for guessing what is the optimal difficulty level of a multiple choice item with 5 possible responses?
0.6
The item-reliability index is an indication of what?
the internal consistency of a test
What are the two ways of ascertaining the item-reliability index?
Calculation of the item-reliability and calculation of the test/scale reliability
The product of the item score standard deviation and the correlation between the item score and the total test score results in what?
The individual item-reliability index
An item with a low item-reliability index suggests that the item is
less homogeneous
Inter item consistency (internal consistency reliability) measures the ________ of a test/scale or clusters of items withing the test/scale.
homogeneity
Internal consistency of a dichotomous scale is calculated using the ______________. Internal consistency of a nondichotomous scale is calculated using the _________.
Kuder-Richardson Formula 20
Cronbach Coefficient Alpha
Item-Validity Index is a statistic designed to provide an indication of the
degree to which a test measures what it purports to measure
The higher the item-validity index the greater the what?
tests criterion-related validity
The item-validity index can be calculated once which two statistics are known?
1. the item-score standard deviation
2. the correlation between the item score and the criterion score
Items that correlate well with the criterion will produce
a test that correlates well with the criterion and therefore predict it.
Item discrimination refers to the degree to which
an item differentiates correctly among test-
takers in the behaviour that the test is designed to measure.
Measures of item discrimination indicate how adequately an item does what?
separates or discriminates between high scorers and low scorers on an entire test.
What are the main two approaches to determining item discrimination?
1. calculating item-total correlations; and
2. calculating an index of discrimination.
The correlation between each item and the total test score is known as
Item-total correlation
The difference in pass rate on an item between the high ability group (i.e., ~ top 27% of test-takers) and the low ability group (i.e., the bottom 27% of test-takers) is known as the
The Index of Discrimination
A negative d-value on a particular item is a red flag because it indicates that low-scoring test-takers are more likeley to what?
answer the item correctly than high-scoring test-takers
Using item discrimination if U=30, L=10, n=32 and d =.63 what does this mean?
Out of 32 high and low test takers (n=32), 30 (U=30) high test takers and 10 (L=10) low test takers answer the item correctly. d=.63 is the item discrimination index and indicates that this item moderately discriminates.
A graphic representation of item difficulty and discrimination is called a
item-characteristic curve
In an item-characteristic curve the steeper the curve the
greater the discrimination
On an item-characteristic curve, competency based assessments would have a _____ _____ shape
right angle
Guessing interferes in item discrimination because
the developer cannot be sure that the discrimination represents low and high ability
Test fairness is undermined when items
favour a particular group of test takers
Item-characteristic curve can be used in assessing fairness by comparing the _____ of the curve for different groups.
shape
Item analysis of tests taken under speed conditions yield
misleading or uninterpretable results.
Speed tests pose difficulties in item analysis because they
introduce not-item reasons for why an item can be failed.
A remedy for conducting item analysis on speeded test is to
conduct initial testing and item analysis under non-speeded conditions and introduce speeded conditions to establish norms.
Qualitative item analysis uses
non-statistical procedures to analyse items
Qualitative methods of item analysis involve exploration
of the issues through verbal means such as interviews and group discussion conducted with test takers and other relevant parties.
Two popular qualitative approaches to item analysis are:
Think aloud and expert panels
The think aloud test administration is a qualitative research tool that yields valuable insights regarding
the way individuals perceive, interpret and respond to items.
A sensitivity review is a study of test items in which the items are examined for
fairness to all prospective test takers and for the presence of offensive language, stereotypes, or situations.
A sensitivity review is a type of
expert panel for qualitative item analysis
Maximum-performance Tests of intelligence, aptitude, and achievement where it is assumed that all examinees are equally highly motivated are referred to as
Maximum-performance tests
All aptitude tests imply
prediction
Typical-performance tests are a measure of what test takers
actually do or how they tend to behave eg personality tests
The meaning of a test score depend on whether the test is ______-referenced or ____-referenced.
criteria
norm
A criterion-referenced test is one in which scores are expressed in terms of the
skills or behaviours achieved, rather than in comparison with other people.
Critieria-referenced are used to identify the level of mastery that has been reached in some
domain where skills can be ordered in a hierarchy of difficulty or complexity
Criterion-referenced testing is common is assessing achievement in
educational settings.
Norm-referenced testing involves
comparing the
test-taker with similar others.
Norm-referenced testing uses _______ scores, not ____ scores
derived
raw
What are the two reasons for using derived scores?
1. for comparison by using the same metric
2. more meaningful interpretations of the test results
An individuals score may show a person doing well or poorly depending on the
groups with which he or she is compared.
A critical issue in norm-referencing is defining the
relevant reference or comparison group.
What are the three principle bases for expressing test results?
1. comparison with 'absolute standard' (type 1)
2. inter-individual comparison (type 11)
3. intra-individual comparison (type 111)
Type 1 scores (absolute scores) considers
only the individual's performance; the performance of all other examinees is ignored in assigning the score
Absolute test scores tell us about the test takers
knowledge of the content of a domain of interest
Type 11 - Inter-individual comparison scores are typically used with
standardised tests
What are the 5 sub-types of the Type 11 Inter-individual comparisons?
a) linear standard scores
b) rank within groups
c) range of scores within a group
d) status of those obtaining the same score
Type 11 A (Linear standard) scores are called standard because they are based on the
standard deviation
What 4 properties make Type 11 A (linear standard) scores valuable in research?
1. for each test and group each score gives the same mean and standard deviation
2. retain the shape of the raw-score distribution
3. permit inter-group and inter-test comparisons
4. can be treated mathematically
All linear standard scores (type 11 A) indicate the location of an examinee's raw score in relations to
the mean of some specified groups in terms of the groups standard deviation
What are the 3 main types of linear standard score statistics?
1. z score
2. t score
3.. deviation IQ scores
Z-scores involve the transformation of
raw scores into standard scores, which are then related to the normal distribution.
What are the main 2 advantages of the z score?
1. permit comparison across different tests
2. permit comparison across different test takers
What are the main 2 disadvantages of the z score?
1. half z scores are negative
2. all z scores are expressed to one or two decimal places
The mean of the z score is ___, and the standard deviation is __.
0
1
The mean of the t score is ___, and the standard deviation is ___.
50
10
t =
10z + 50
Verbal IQ, Performance IQ and Full Scale IQ are yields of which IQ test?
Weschler IQ test
Weschler IQ scores have a mean of ____ and a standard deviation of ___.
100
15
The formula for the Weschler IQ test is IQ =
15z +100
The Stanford Binet IQ test was revised from the
Ratio IQ to Deviation IQ in the 1960's
Stanford Binet IQ scores have a mean of ____ and a standard deviation of ____.
100
16
Type 11 B (Rank within group) scores are based on the number of people with scores
higher or lower than a specified score value
What are three examples of type 11 B (rank within group) scores?
1. Ranks - relative position
2. Percentile ranks - frequency divided into 100 ranks
3. Decile ranks - frequency divided into 10 ranks
Normalised standard scores are derived scores that are assigned
standard-score-like values
but are computed from percentile ranks.
While linear standard scores are reproduce the raw score distribution faithfully, normalised standard scores more closely approximate
normal probability distribution
The term 'stanine' is derived from "_______________", and when normally distributed, stanines have a mean of _ and a standard deviation of _.
standard score with nine units
5
2
The term 'sten' is derived from ________ and when normally distributed have a mean of __ and a standard deviation of __.
standard scores with ten units
5.5
2