• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/74

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

74 Cards in this Set

  • Front
  • Back
Pearson r
bivariate correlational test
assesses the relationship b/w 2 variables that are measured with continuous (interval or ratio) data
criterion related validity
ranges from -1.0 to + 1.0
ex. ages and scores on an exam
Statisitical hypothesis tests
such as the ANOVA
designed to test hypotheses based on population values
a significant finding for a one-way ANOVA indicates that the
population means were different
an observed difference b/e two or more sample means very likely reflects a true difference in the population
t-test can assess the effects of
one IV on one DV
the IV must only have two levels
one or two groups compared
ex. effectiveness of CBT versus medication for treatment of depression
in a negatively skewed distribution, most scores are very __________ but there are a few ____________
high
extreme low scores
mode is highest, then median, then mean
mean and SD of a z-score
M = 0
SD = 1
the shape of a z-score is identical to the shape of the raw score distribution
a one-way ANOVA can be used when you compare ___________________________
two or more groups
can only have one IV (type of treatment)
ex. effectiveness of CBT vs medication vs combined treatment for depression are compared
one-way ANOVA for repeated measures is used when the same subjects are measured at multiple times
an item's difficulty level =
p
the percentage of examinees who answer the item correctly
ranges from .01 (very difficult) to .99 (very easy)
item analysis is a procedure used to
evaluate the difficulty and discriminability of test items
can be quantitative or qualitative
item discriminability
the ability of individual items to differentiate b/w two subgroups
content validity
the degree to which the test measures knowledge of the content domain it is supposed to measure
quantified by asking a panel of experts about items
no numerical validity coefficient is derived
predictive validity
part of criterion-related validity
the degree to which scores of a test are predictive of the examinee's performance on another measure administered at a later time
the range of possible values of the standard error of measurement is
0 to the test's SD
the formula for the standard error of measurement
SD x the square root of (1- the reliability coefficient)

bc the reliability coefficient ranges from 0.0 to 1.0, if reliability is 1 the formulates evaluates to 0; if reliability is 0.0 the formulates evaluates to the SD of the test scores
what is the perfered method of calculating the reliability of a test
alternate forms reliability
shrinkage
the reduction of the validity coefficient upon cross-validation (the use of a second sample to try out retained items)
incremental validity
associated with criterion-related validity

The usefulness of a selection test in terms of decision making accuracy. Subtract the positive hit rate from base rate (% of employees hired using the current selection procedure that turn out to be good workers). If data from the validity study show that when a new selection test is used, 70% of those hired will be good workers the positive hit rate is 70%. So 50-70 is 20; the incremental validity of the test is 20% - the new selection test will increase the company's decision making accuracy by 20%

Can also be determined using Taylor-Russell tables
A moderate base rate of a selection test is __% and indicates ___________ as opposed to a high base rate which indicates __________ or a low base rate which indicates ____________
50%, a new measure would be benenficial
a high base rate means the current methods are working fine
a low base rate indicates something other than selection is the problem (standard for judging performance is too high)
a low selection rate indicates there is a large number of applicants and thus there are many qualified applicants to choose from
multiple regression
used to estimate a score on a criterion based on scores on two or more predictors
a compensatory technique (low score on one can be compensated for with a high score on another)
predictors should not be related AKA they should have a linear relationship. Otherwise, multicollininerarity is a problem that can occur when the predictors (x's) are highly correlated with one another
reliability
as it applies to test construction = consistency
test-retest
*alternate forms
internal consistency reliability
interscorer reliability
valid
does it measure what it purports to measure?
Maximum performance tests
achievement and aptitude tests
Typical performance tests
interest and personality tests
classical test theory
a test score consists of 2 factors: "truth" or the true score, and random error
A) a test is reliable to the degree that it is free from error and provides information about the examinee's true test scores B) a test is reliable to the degree that it provides repeatable consistent results
Reliability coefficient
From 0.0 (no reliability) to +1.0 (perfect reliability)
Internal consistency
Measured by coefficient alpha or the Kruder-Richardson Formula 20
Split-half reliability
Inter-item consistency
Standard error of measurement
Used to construct confidence intervals
Multiply the SD of a test score by the square root of the value of 1-the reliability coefficient
There is a 68% probability that an examinee’s true test score will be = to the obtained score + the standard error of measurement; a 95% probability that his or her true score will be within the obtained score + (1.96)(omegameas); and a 99% probability that the true score will be within the obtained score + (2.58)(omegameas)
What is the reliability coefficient affected by?
Any factor that reduces score variability or increases measurement error will reduce the reliability coefficient
decreased reliability: short tests, very easy or very difficult, tests that allow guessing (true-false), also homogenous populations b/c the score variability is decreased
cluster sampling
uses a naturally occurring group as a study group
often used in marketing research
ex. using a group of coffee drinkers and a group of tea drinkers in a study
ratio data
interval data with no absolute zero
only data that can be multiplied or divided can be used
case studies
based on data, refines hypotheses
based on the assumption that specific cases can be generalized
they are based on the close examination of one case
most useful as pilot studies to identify variables to study using different methods
frequency distribution
provides a summary of a data set and indicates number of cases at any given score within a given range

a skewed distribution has asymmetrical frequency distributions
standard error of the mean
the expected error of a given sample mean
Spearman-Brown formula
estimates the effect that shortening or elongating the test will have on the reliability coefficient
Cronbach's coefficient alpha
preferrable over Spearman-Brown formula
used for multiple choice tests
Kruder-Richardson formula (KR-20)
AKA coefficient alpha
the average of all possible split halves
used for estimating internal consistency reliability of split choice tests (true/false)
measures of internal consistency
Spearman-Brown forumla
Kruder-Richardson formula
Cronbach's coefficient alpha
good for assessing reliability of tests that measure unstable traits or that are affected by repeated administration
not appropriate for speed tests
kappa coefficient
measure of the agreement between two raters
inter rater reliability
finding confidence intervals (68%, 95%, and 99%)
68% - achieved score plus and minus 1x standard error of measurement
95% - achieved score plus and minus 1.96x SEM (can round up to 2 for easier figuring)
99% - achieved score plus and minus 2.58x SEM (can round to 2.5 for easier figuring)
restriction of range
correlation (along with reliability and validity) is always lower when the range is restricted for one or both variables
ex. elementary and highschool students is a greater range than college and graduate school students
heterogeneous subjects (less restricted range) increases the reliability coefficient
what is the advantage of using a two-way ANOVA over two separate one-way ANOVAs?
the possibility of detecting an interaction effect
interaction effects
Interaction effects represent the combined effects of variables on the criterion or dependent measure. When an interaction effect is present, the impact of one variable depends on the level of the other variable. Part of the power of MR is the ability to estimate and test interaction effects when the predictor variables are either categorical or continuous.
When interaction effects are present, it means that interpretation of the individual variables may be
incomplete or misleading.
If there are interaction effects, F tests must be conducted
when should the median be used instead of the mean?
when there are either some very extreme scores or a substantial % of maximum scores
ex. average housing prices are usually reported as the median b/c there are some extreme values
Chi square
non-parametric test of difference
for nominal and categorical data
when there is 1+ IV the multiple sample chi square is run
*not appropriate when repeated observations are made (pre and post data)
Mann-Whitney
non-parametric test for ordinal data
shape of the distribution of %ile ranks
flat or rectangular
idiographic
describes single subject approaches
nomothetic
group approaches
normative data
data that can be compared within and across subjects
ipsative data
results from forced-choice format
can only describe relative strengths or interests within a subject and cannot be used for comparison across subjects
power
ability to correctly reject the null
inverse relationship with beta (type II error)
type II error
AKA beta
incorrectly accepting the null
inverse relationship with power
effect size
measure of SD
how many SD's one group differs from another

ex. ppl treated with therapy do about .85 SD better than untreated ppl
assumptions of interval/ratio tests
T-tests and ANOVAS
random selection
normally distributed data
homoscedasticity (same variance)
interval/ratio data
when to use factorial ANOVA
two more more IVs (type of tx and sex)
data for each IV are independent
ex. effectiveness of CBT vs medication and differences between men and women
when to use split plot ANOVA
two or more IV's - each has multiple levels; one factor is independent (b/w subjects factor), one is correlated (w/in subjects factor)
one DV
ex. difference b/w men and women in their ability to reduce cigarette use. # of cigs smoked daily is measured pre, post, & 6 month follow-up
rectangular/flat distribution
percentile scores
bc the frequency is identical at each rank
leptokurtic distribution
more peaked in the middle than the normal curve
non-linear distribution
when raw scores are converted to %ile ranks
coefficient of multiple determination
Rsquared
provides an index of the amount of variability in the criterion (y) accounted for by the predictor variables (x's)
regression equation is derived by
the best line of fit
item characteristic curves
a plot of the relationship b/w item performance (p) and total score
Spearman Rho
bivariate correlational test
both variables are ordinal
eta
bivariate correlational test
both variables are continuous (ratio/interval) and the relationship is curvilinear
ex. relationships b/w performance and arousal
phi
bivariate correlational test
both variables are true dichotomies
ex. relationship b/w sex and eye color
biserial
bivariate correlational test
one variables is an artificial dichotomy (depressed/not depressed) and the other is continuous (IQ)
point biserial
bivariate correlational test
one variables is a true dichotomy (male/female) and the other is continuous (IQ)
tetrachoric
bivariate correlational test
both variables are artificial dichotomies
criterion related validity
assess how adequately a test score can be used to predict or estimate criterion outcome
Pearson r
two subtypes
concurrent validity
predictive validity
construct validity
assesses how adequately a test measures a hypothetical construct or trait
factor analysis or multi-trait, multi-method matrix
two types
convergent validity
divergent validity
concurrent validity
a type of criterion-related validity
convergent validity
used to determine construct validity
tells us the extent to which a test correlates with other tests that measure the same construct
4 estimates of reliability
test-retest - stability
parallel forms - equivalence - alternate forms
internal consistency reliability - within/split-half: KR-20, KR-21, or Cronbachs coefficient alpha
interrater reliability - agreement; Pearson r, kappa, Yules Y
criterion contamination
obtaining a spuriously high validity coefficient because ratings on the criterion are contaminated by knowledge of ratings on a predictor