• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/31

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

31 Cards in this Set

  • Front
  • Back
When all examinees are in the upper group answer the item correctly but none in the lower group do, D =
+1.0
What is the classic test theory?
Obtained test score (X) is composed of a true score component (T) and an error component (E).
Reliability
estimate of the proportion of variability in examinee's obtained score that is due to true differences among examinees (true score variability)
What is a realiability coefficient?
proportion of variability in obtained test scores that reflects true score variability (ex: .84 means 84% is due to true score differences)
What are methods for estimating reliability?
Test-retest: give same test to same group on 2 different occasions and correlate the 2 sets of scores (best for stable traits)

Alternative forms: two equivalent forms of the test are administered to same group and 2 sets of scores are correlated (most thorough method)

Slit-half/Coeeficient alpha or KR-20: tests for internal consistency reliability

Kappa statistic: inter-rater
What is the spearman-brown prophecy formula?
provides an estimate of what the relaibility coefficient would have been had it been based on full length of test
What are two factors that affect the reliability coefficient?
test length (longer test larger reality coefficient)
range of scores (maximized when unrestriced)
What is content validity?
measure adequately samples content or behavior domain it is designed to measure
What is construct validity?

Convergent vs discriminate validity
when a test has been found to measure the hypothetical trait (construct) it is intended to measure (ex: self-esteem)

Convergent = test scores are highly correlated w/scores w/measure of same traits
Discriminate = test scores have low correlation w/those that measure unrelated traits
What is factor analysis?

Factor loading

communality
used to identify minimum number of common factors required to account for intercorrelations (used to assess construct validity)

correlation b/w test and a factor

communality: total amount of variability in scores accounted for by factor analysis (also the proportion of variance accounted for in a single variable by all of the identified factors.)
What is orthogonal factors and oblique factors?
orthogonal are uncorrelated while oblique are correlated
What are three types of validity?
Content
Construct
and Criterion-related (used to draw conclusions about an examinee's likely performance on another measure)
What is concurrent vs predictive validity?
Both measure of criterion-related validity
concurrent = data collected prior to or around same time as data on predictor
predictive = criterion is measured some time after predictor is administered
What is: Incremental validity?
increase in correct decisions that can be expected if the predictor is used as a decision making tool
What is the relationship between reliability and validity?
reliability is necessary for validity but not sufficient
What is criterion contamination?
can inflate the relationship b/w predictor and criterion. (happens b/c rater familiarity w/performance on predictor)
When monotrait-heteromethod coefficients are large, this provides _____ _______
convergent validity
When heterotrait-monomethod coefficients are small, this provides evidence of _____ _____
discriminant validity
You administer a test to a group of examinees on April 1 and then re-administer the test to the same group of examinees on May 1. When you correlate the two sets of scores, you will have obtained a?
test-retest relaibility (how stable scores are over time). Also known as the coefficient of stability.
Adding more items to a test generally results in an increase in the tests ________
reliability
Which of the following would be used to determine the probability that examinees of different ability levels are able to answer a paticular test item correctly?
a. criterion-related validity coefficient
b. item discrimination index
c. item difficulty index
d. item characteristic curve
d. item characteristic curve which is associated with item response theory (graphs that depict individual test items in terms of the % of individuals in different ability groups who answered the item correctly)
A person obtains a raw score of 70 on a math test with a mean of 50 and a SD of 10
A percentile rank of 84 on a history test
and a T-score of 65 on an English test
What is the relative order of each of these scores?
Math (2 standard deviations above the mean)- English (1 1/2 standard deviations above the mean)- History (1 standard deviation above the mean)
If, in a normally-shaped distribution, the mean is 100 and the standard error of measurement is 5, what would the 95% confidence interval be for an examinee who receives a score of 90?
To calclate we add and subtract two standard errors of measurement to the obtained scores (=10). So confidence interval of 80-100
R-squared is used as an indicator of:
how much your ability to predict is improved using the regression line compared to not using it
In a study examining the effects of relaxation training on test-taking anxiety, a pre-test measure of anxiety is administered to a group of self-identified highly anxious test takers resulting in a split-half reliability coefficient of .75. If the pre-test is administered to a randomly selected group of the same number of people the split-half reliability coefficient will most likely be:
a. Greater than .75
b. Less than .75
c. Equal to .75
d. impossible to predict
A general rule for all correlation coefficients, including reliability coefficients, is that the more heterogeneous the group, i.e. the wider the variability, the higher the coefficient will be. Since a randomly selected group would be more heterogeneous than a group of highly anxious test-takers, the randomly selected group would most likely have a higher reliability coefficient.
Which of the following statements is not true regarding concurrent validity?
a. It is used to establish criterion-related validity
b. It is appropriate for tests designed to assess a person's future status on a criterion
c. It is obtained by collecting predictor and criterion scores at about the same time
d. It indicates the extent to which a test yields the same results as other measures of the same phenomenon
b - There are two ways to establish the criterion-related validity of a test: concurrent validation and predictive validation. In concurrent validation, predictor and criterion scores are collected at about the same time; by contrast, in predictive validation, predictor scores are collected first and criterion data are collected at some future point. Concurrent validity indicates the extent to which a test yields the same results as other measures of the same phenomenon. For example, if you developed a new test for depression, you might administer it along with the BDI and measure the concurrent validity of the two tests
Confidence intervals are used in order to:
a. calculate the test's mean
b. calculate the standard deviation
c. calculate the standard error of measurement
d. estimate true scores from obtained scores
d - Confidence intervals allow us to determine the range within which an examinee's true score on a test is likely to fall, given his or her obtained score. The standard error of measurement ("C") is used to construct confidence intervals, not the other way around
In a factor analysis, an eigenvalue corresponds to:

While in principal components analysis, an eigenvalue would indicate:
the explained variance of one of the factors (based on factor loadings - when high eigenvalue will be high)

the amount of variability in a group of variables accounted for by an independent statistical component
1. Item analysis is a procedure used to:

2. Item discrimination:

3. Item characterisitc curve
1. determine what items will be retained for the final version of a test

2. The degree to which items discriminate among examinees (range is -1.0 to +1.0 w/.35 and higher being acceptable)

3. A graph that depicts percentages of people (A steep slope means greater discrimination)
The correction for attenuation formula is used to determine:
the impact of increasing the reliability of the predictor (test) and/or the criterion on the predictor’s validity
What method is the best reliability coefficient to use but also is likely to produce a lower magnitiude than other methods?
alternative form

there are two sources of error (or factors that could lower the coefficient) for the alternate forms coefficient: the time interval and different content (in technical terms, these sources of error are referred to respectively as "time sampling" and "content sampling"). The alternate forms coefficient is considered the best reliability coefficient by many because, for it to be high, the test must demonstrate consistency across both a time interval and different content.