Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
74 Cards in this Set
- Front
- Back
Pearson r
|
bivariate correlational test
assesses the relationship b/w 2 variables that are measured with continuous (interval or ratio) data criterion related validity ranges from -1.0 to + 1.0 ex. ages and scores on an exam |
|
Statisitical hypothesis tests
|
such as the ANOVA
designed to test hypotheses based on population values |
|
a significant finding for a one-way ANOVA indicates that the
|
population means were different
an observed difference b/e two or more sample means very likely reflects a true difference in the population |
|
t-test can assess the effects of
|
one IV on one DV
the IV must only have two levels one or two groups compared ex. effectiveness of CBT versus medication for treatment of depression |
|
in a negatively skewed distribution, most scores are very __________ but there are a few ____________
|
high
extreme low scores mode is highest, then median, then mean |
|
mean and SD of a z-score
|
M = 0
SD = 1 the shape of a z-score is identical to the shape of the raw score distribution |
|
a one-way ANOVA can be used when you compare ___________________________
|
two or more groups
can only have one IV (type of treatment) ex. effectiveness of CBT vs medication vs combined treatment for depression are compared one-way ANOVA for repeated measures is used when the same subjects are measured at multiple times |
|
an item's difficulty level =
|
p
the percentage of examinees who answer the item correctly ranges from .01 (very difficult) to .99 (very easy) |
|
item analysis is a procedure used to
|
evaluate the difficulty and discriminability of test items
can be quantitative or qualitative |
|
item discriminability
|
the ability of individual items to differentiate b/w two subgroups
|
|
content validity
|
the degree to which the test measures knowledge of the content domain it is supposed to measure
quantified by asking a panel of experts about items no numerical validity coefficient is derived |
|
predictive validity
|
part of criterion-related validity
the degree to which scores of a test are predictive of the examinee's performance on another measure administered at a later time |
|
the range of possible values of the standard error of measurement is
|
0 to the test's SD
|
|
the formula for the standard error of measurement
|
SD x the square root of (1- the reliability coefficient)
bc the reliability coefficient ranges from 0.0 to 1.0, if reliability is 1 the formulates evaluates to 0; if reliability is 0.0 the formulates evaluates to the SD of the test scores |
|
what is the perfered method of calculating the reliability of a test
|
alternate forms reliability
|
|
shrinkage
|
the reduction of the validity coefficient upon cross-validation (the use of a second sample to try out retained items)
|
|
incremental validity
|
associated with criterion-related validity
The usefulness of a selection test in terms of decision making accuracy. Subtract the positive hit rate from base rate (% of employees hired using the current selection procedure that turn out to be good workers). If data from the validity study show that when a new selection test is used, 70% of those hired will be good workers the positive hit rate is 70%. So 50-70 is 20; the incremental validity of the test is 20% - the new selection test will increase the company's decision making accuracy by 20% Can also be determined using Taylor-Russell tables |
|
A moderate base rate of a selection test is __% and indicates ___________ as opposed to a high base rate which indicates __________ or a low base rate which indicates ____________
|
50%, a new measure would be benenficial
a high base rate means the current methods are working fine a low base rate indicates something other than selection is the problem (standard for judging performance is too high) a low selection rate indicates there is a large number of applicants and thus there are many qualified applicants to choose from |
|
multiple regression
|
used to estimate a score on a criterion based on scores on two or more predictors
a compensatory technique (low score on one can be compensated for with a high score on another) predictors should not be related AKA they should have a linear relationship. Otherwise, multicollininerarity is a problem that can occur when the predictors (x's) are highly correlated with one another |
|
reliability
|
as it applies to test construction = consistency
test-retest *alternate forms internal consistency reliability interscorer reliability |
|
valid
|
does it measure what it purports to measure?
|
|
Maximum performance tests
|
achievement and aptitude tests
|
|
Typical performance tests
|
interest and personality tests
|
|
classical test theory
|
a test score consists of 2 factors: "truth" or the true score, and random error
A) a test is reliable to the degree that it is free from error and provides information about the examinee's true test scores B) a test is reliable to the degree that it provides repeatable consistent results |
|
Reliability coefficient
|
From 0.0 (no reliability) to +1.0 (perfect reliability)
|
|
Internal consistency
|
Measured by coefficient alpha or the Kruder-Richardson Formula 20
Split-half reliability Inter-item consistency |
|
Standard error of measurement
|
Used to construct confidence intervals
Multiply the SD of a test score by the square root of the value of 1-the reliability coefficient There is a 68% probability that an examinee’s true test score will be = to the obtained score + the standard error of measurement; a 95% probability that his or her true score will be within the obtained score + (1.96)(omegameas); and a 99% probability that the true score will be within the obtained score + (2.58)(omegameas) |
|
What is the reliability coefficient affected by?
|
Any factor that reduces score variability or increases measurement error will reduce the reliability coefficient
decreased reliability: short tests, very easy or very difficult, tests that allow guessing (true-false), also homogenous populations b/c the score variability is decreased |
|
cluster sampling
|
uses a naturally occurring group as a study group
often used in marketing research ex. using a group of coffee drinkers and a group of tea drinkers in a study |
|
ratio data
|
interval data with no absolute zero
only data that can be multiplied or divided can be used |
|
case studies
|
based on data, refines hypotheses
based on the assumption that specific cases can be generalized they are based on the close examination of one case most useful as pilot studies to identify variables to study using different methods |
|
frequency distribution
|
provides a summary of a data set and indicates number of cases at any given score within a given range
a skewed distribution has asymmetrical frequency distributions |
|
standard error of the mean
|
the expected error of a given sample mean
|
|
Spearman-Brown formula
|
estimates the effect that shortening or elongating the test will have on the reliability coefficient
|
|
Cronbach's coefficient alpha
|
preferrable over Spearman-Brown formula
used for multiple choice tests |
|
Kruder-Richardson formula (KR-20)
|
AKA coefficient alpha
the average of all possible split halves used for estimating internal consistency reliability of split choice tests (true/false) |
|
measures of internal consistency
|
Spearman-Brown forumla
Kruder-Richardson formula Cronbach's coefficient alpha good for assessing reliability of tests that measure unstable traits or that are affected by repeated administration not appropriate for speed tests |
|
kappa coefficient
|
measure of the agreement between two raters
inter rater reliability |
|
finding confidence intervals (68%, 95%, and 99%)
|
68% - achieved score plus and minus 1x standard error of measurement
95% - achieved score plus and minus 1.96x SEM (can round up to 2 for easier figuring) 99% - achieved score plus and minus 2.58x SEM (can round to 2.5 for easier figuring) |
|
restriction of range
|
correlation (along with reliability and validity) is always lower when the range is restricted for one or both variables
ex. elementary and highschool students is a greater range than college and graduate school students heterogeneous subjects (less restricted range) increases the reliability coefficient |
|
what is the advantage of using a two-way ANOVA over two separate one-way ANOVAs?
|
the possibility of detecting an interaction effect
|
|
interaction effects
|
Interaction effects represent the combined effects of variables on the criterion or dependent measure. When an interaction effect is present, the impact of one variable depends on the level of the other variable. Part of the power of MR is the ability to estimate and test interaction effects when the predictor variables are either categorical or continuous.
When interaction effects are present, it means that interpretation of the individual variables may be incomplete or misleading. If there are interaction effects, F tests must be conducted |
|
when should the median be used instead of the mean?
|
when there are either some very extreme scores or a substantial % of maximum scores
ex. average housing prices are usually reported as the median b/c there are some extreme values |
|
Chi square
|
non-parametric test of difference
for nominal and categorical data when there is 1+ IV the multiple sample chi square is run *not appropriate when repeated observations are made (pre and post data) |
|
Mann-Whitney
|
non-parametric test for ordinal data
|
|
shape of the distribution of %ile ranks
|
flat or rectangular
|
|
idiographic
|
describes single subject approaches
|
|
nomothetic
|
group approaches
|
|
normative data
|
data that can be compared within and across subjects
|
|
ipsative data
|
results from forced-choice format
can only describe relative strengths or interests within a subject and cannot be used for comparison across subjects |
|
power
|
ability to correctly reject the null
inverse relationship with beta (type II error) |
|
type II error
|
AKA beta
incorrectly accepting the null inverse relationship with power |
|
effect size
|
measure of SD
how many SD's one group differs from another ex. ppl treated with therapy do about .85 SD better than untreated ppl |
|
assumptions of interval/ratio tests
|
T-tests and ANOVAS
random selection normally distributed data homoscedasticity (same variance) interval/ratio data |
|
when to use factorial ANOVA
|
two more more IVs (type of tx and sex)
data for each IV are independent ex. effectiveness of CBT vs medication and differences between men and women |
|
when to use split plot ANOVA
|
two or more IV's - each has multiple levels; one factor is independent (b/w subjects factor), one is correlated (w/in subjects factor)
one DV ex. difference b/w men and women in their ability to reduce cigarette use. # of cigs smoked daily is measured pre, post, & 6 month follow-up |
|
rectangular/flat distribution
|
percentile scores
bc the frequency is identical at each rank |
|
leptokurtic distribution
|
more peaked in the middle than the normal curve
|
|
non-linear distribution
|
when raw scores are converted to %ile ranks
|
|
coefficient of multiple determination
|
Rsquared
provides an index of the amount of variability in the criterion (y) accounted for by the predictor variables (x's) |
|
regression equation is derived by
|
the best line of fit
|
|
item characteristic curves
|
a plot of the relationship b/w item performance (p) and total score
|
|
Spearman Rho
|
bivariate correlational test
both variables are ordinal |
|
eta
|
bivariate correlational test
both variables are continuous (ratio/interval) and the relationship is curvilinear ex. relationships b/w performance and arousal |
|
phi
|
bivariate correlational test
both variables are true dichotomies ex. relationship b/w sex and eye color |
|
biserial
|
bivariate correlational test
one variables is an artificial dichotomy (depressed/not depressed) and the other is continuous (IQ) |
|
point biserial
|
bivariate correlational test
one variables is a true dichotomy (male/female) and the other is continuous (IQ) |
|
tetrachoric
|
bivariate correlational test
both variables are artificial dichotomies |
|
criterion related validity
|
assess how adequately a test score can be used to predict or estimate criterion outcome
Pearson r two subtypes concurrent validity predictive validity |
|
construct validity
|
assesses how adequately a test measures a hypothetical construct or trait
factor analysis or multi-trait, multi-method matrix two types convergent validity divergent validity |
|
concurrent validity
|
a type of criterion-related validity
|
|
convergent validity
|
used to determine construct validity
tells us the extent to which a test correlates with other tests that measure the same construct |
|
4 estimates of reliability
|
test-retest - stability
parallel forms - equivalence - alternate forms internal consistency reliability - within/split-half: KR-20, KR-21, or Cronbachs coefficient alpha interrater reliability - agreement; Pearson r, kappa, Yules Y |
|
criterion contamination
|
obtaining a spuriously high validity coefficient because ratings on the criterion are contaminated by knowledge of ratings on a predictor
|