• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/545

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

545 Cards in this Set

  • Front
  • Back
What is the item difficulty index(p)?
indicates the percent of examinees in the sample who answerede the item correctly most situations a p=.50 is optimal except true/false tests where optimal p=.75 the closer that p=.50, the more differentiating the index is
What is the basis of classical test theory?
views an obtained test score as reflecting a combination of truth and error
What is the problem with classical test theory?
items are dependent upon original sample inability to compare scores obtained on different tests
What is the basis of item response theory?
involves the use of an item characteristic curve that provides information on relationships between examinee's level on a trait measured by the test and the probability that he will respond correctly to the item
What are the 3 advantages of item response theory?
sample invariant possible to equate test scores easier to develop computer-adapted tests
What does the error component represent in classical test theory?
represents measurement error which is due to factors that are irrelevant to what is being measured and have an unsystematic effect on the score
What is criterion referenced interpretation?
score interpreted in termso f total amount of test mastered (% correct) or in terms of some external criterion
What is reliability?
extent to which test performance is immune to the effects of measurement error
How do you interpret a reliability coefficient?
the proportion of variability in obtained test scores that reflects true score variability reliability coefficient is never squared r(xx)=true score variablity 1-r(xx)=error
What are the different forms of reliability?
test-retest (coefficient of stability) alternate forms (coefficient of equivalence) split-half (coefficient of internal consistency) coefficient alpha (coefficient of internal consistency) inter-rater reliability (coefficient of concordance)
What type of reliability is appropriate to measure time sampling error?
test-retest (coefficient of stability) measure attributes that are relatively stable over time
What type of reliability is appropriate to measure time sampling and content sampling errors?
alternate forms (coefficient of equivalence) not appropriate when attribute measured is expected to fluctuate over time most rigorous and best method for estimating reliability
Why is alternate forms reliability often not assessed?
difficulty in developing forms that are truly equivalent
what are 2 methods for evaluating internal consistency?
split-half coefficient alpha
What is the problem with using split-half reliability?
reliability coefficient based on test scores from one-half of entire test reliability tends to decrease as the length of test decreases-split half usually underestimates test's true reliability
How can you correct for the problems with split-half reliability?
use the Spearman-Brown prophecy formula-provides an estimate of what the reliability coefficient would have been if it had been based on the full length of the test
When do you use the Kuder-Richardson Formula 20 (KR-20)?
when test items are measured dichotomously variation of coefficient alpha not appropriate for speeded tests
What is a drawback of using coefficient alpha?
lower boundary of a test's reliability
What is the purpose of using coefficient alpha?
measure inter-item consistency
When is it appropriate to use inter-rater reliability?
whenever test scores depend on a rater's judgement
When is a kappa coefficient used?
the reliablity coefficient for inter-rater reliabliity
What are the factors that affect the reliability coefficient?
test length range of test scores guessing
What is the acceptable level of a reliability coefficient?
.80 or larger
What is the standard error of measurement?
an index of the amount of error that can be expected in obtained scores due to the unreliability of the test calculation of the confidence interval
What is the formula for the standard error of measurement?
square root of 1-r(xx) (reliability coefficient) multipled by the standard deviation of test scores
What affects the magnitude of the standard error?
standard deviation of test scores and test's reliability coefficient lower the test's standard deviation and higher reliability coefficient = smaller standard error of measurement
How can you interpret the standard error of measurement?
type of standard deviation interpret in terms of areas under the normal curve 68%, 95%, 99% confidence intervals 1, 2, 3 standard deviations
What is validity?
test's accuracy in providing information it was designed to provide
What are the 3 categories of validity?
content validity construct validity criterion-related validity
What type of validity is important when scores on a test provide information on how much each examinee knows about a domain?
content and construct validity
What type of validity is important when scores on a test provide information on each examinee's status with regard to the trait being measured?
content and construct validity
What type of validity is important when scores will be used to predict scores on some other measure and you are interested in the predicted scores?
criterion-related validity
What is content validity?
test items sample content or behavior test was designed to measure
How do you establish content validity?
through the judgement of experts
What type of tests consider content validity to be important?
achievement-type tests work samples
What additional evidence supports good content validity?
large coefficient of internal consistency high correlations with other tests that measure the same domain pre/post test evaluations with a program designed to increase familiarity with material will show changes
What is construct validity?
the test is found to measure theoretical trait or construct designed to measure
What are some methods to establish construct validity?
assess internal consistency study group differences (adequate?) hypotheseis testing-do the scores change following the experiment assess convergent (high correlations with the same trait) and divergent (low correlations with different traits) validity assess factoral validity
What are monotrait-monomethod coefficients?
same trait-same method correlation between measure and itself reliability coefficients should be large
What are monotrait-heteromethod coefficients?
same trait-different method correlation between different measures of the same trait convergent validity
What are heterotrait-monomethod coefficients?
different trait-same method correlations between different traits measured by the same method discriminant (divergent) validity
What are heterotrait-heteromethod coefficients?
different trait-different method correlation between different traits measured by different methods discriminant validity when small
What do factor loadings in factor analysis measure?
square it to determine the amount of variability in test scores explained by the factor
What is communality in factor analysis?
common variance amount of variability in test scores that is due to the factors that the test shares in common to some degree with the other tests included in the analysis
From the perspective of factor analysis, what are the components of a test's reliability?
communality specificity error
What is the relationship between reliability and communality?
communality is a lower-limit estimate of a test's reliability coefficient
What are the two types of rotation of a factor matrix?
orthogonal oblique
What type of rotation has uncorrelated factors?
orthogonal
What type of rotation has correlated factors?
oblique attributes measured by the factor are not independent
When can you calculate a factor's communality from the factor loadings?
when factors are orthogonal communality is equal to the sum of the squared factor loadings
What is a measure of shared variability?
squared factor loading
What is criterion-related validity?
strong correlation between test and a criterion
How is criterion-related validity assessed?
correlating the scores of a sample of individualson the predictor with their scores on the criterion
What are the 2 types of criterion-related validity?
concurrent & predicitive validity
What is the difference between concurrent and predictive validity?
the time when the predictor and the criterion are administered predict future status vs. estimating current status
What is an acceptable level for a validity coefficient?
.20 or .30 rarely exceed .60
How do you interpret validity coefficient?
since correlation between 2 measures-square the coefficient and interpret in terms of shared variability
How do you provide a measure of shared variability?
square the correlation between 2 measures (tests or variables) how much variability in Y is explained by X
What is the standard error of estimate?
used to construct a confidence interval around a predicted criterion score
What is the formula for standard error of estimate?
square root of 1-validity coefficient squared multiplied by standard error of the estimate
When does the standard error of estimate = 0?
when the validity coefficient is equal to +/-1
What is incremental validity?
the increase in correct decisions that can be expected if the predictor is used as a decision-making tool involves using a scatterplot
In a scatterplot of criterion and predictor scores, if the goal is to maximize the proportion of true positives, how do you do this?
set a high predictor cutoff score-will reduce the number of false positives
What is the formula for incremental validity?
positive hit rate - base rate
What is the base rate?
proportion of people selected without the use of the predictor dividing successful people (true positive + false negatives) by the total number of people
What is the positive hit rate?
proportion of people who would have been selected on the basis of their predictor scores and who are successful on the criterion true positives/ total positives
What determines if a person is positive or negative?
predictor
What determines if a person is true or false?
criterion
What is the correction of attenuation formula used for?
to estimate what a predictor's validity coefficient would be if the predictor and/or criterion were perfectly reliable tends to overestimate the actual validity coefficient that can be achieved
What information is needed to calculate the correction of attenuation formula?
predictor's current reliability coefficient criterion's current reliability coefficient criterion-related validity coefficient
What happens to the validity coefficient when it is cross-validated?
tends to shrink because all of the same chance factors operating in the original sample will not ve present in the new sample
What is norm-referenced interpretation?
comparing an examinee's test score to scores obtained by people included in a normative (standardization) sample helps identify individual differences percentile ranks, standard scores, age and grade equivalent scores
What is a nonlinear transformation?
whenever a distribution of transformed scores differs in shape from the distribution of raw scores-the score transformation is this percentile ranks-because always flat in shape
What is a standard score?
indicates the examinee's position in the normative sample in terms of standard deviations from the mean permit comparisons of scores from different tests z-scores T-scores, deviation IQs, and SAT scores
What is the formula for calculating a z-score?
raw score - mean of distribution divided by the distribution's standard deviation
What is a linear transformation?
transformation of raw scores to z-scores
What is the purpose of criterion-referenced (mastery) testing?
to make sure that all examinees eventually reach the same performance level
What is a type of criterion-referenced testing?
percentage score or interpreting test scores in terms of their likely status on an external criterion
When do you use a regression equation and expectancy tape when interpreting test scores?
criterion-referenced interpretation
What is banding?
score adjustment method involves considering people within a specific score range (band) as having identical scores
What is exploratory factor analysis?
identify the minimum number of underlying "factors" (dimensions) needed to explain the intercorrelations among a set of tests, subtests, or test items
What is principal components analysis?
used to identify a set of variablesthat explains all (or nearly all) of the total variance in a set of test scores
What eigenvalue is ued to retain components in a principal components analysis?
1.0 or higher
Relevance
Refers to the extent to which test items contribute to achieving the stated goals of testing. Factors that influence: Content Appropriateness - Does the item assess the domain it is designed to evaluate? Taxonomic Level - Does the item reflect the appropriate cognitive level? Extraneous Abilities - Does it assess other abilities?
Item Discrimination
The extent to which a test items discriminates b/t examinees with low and high scores. D=U-L. Ranges from -1 to 1. Discrimination = Upper group-Lower group. (Most test look for .35 or .5)
Item Charactersitic Curve (ICC)
In item response theory, the curve is made for each item and assesses the probability that the respondent will get the item correct. Diff level, discrimination, and guessing are accounted in the ICC. An ICC provides 1-3 pieces of information about a test item – its difficulty (the position of the curve (left versus right); its ability to discriminate between high and low scorers (the slope of the curve); and the probability of answering the item correctly just by guessing (the Y-intercept).
Classic Test Theory
X=T+E
Reliability Coefficient
A correlation coefficient (ranging from 0-1) which assesses true scores vs error scores. Interpret directly; no need to sq. Increased when similar items are added, the range of scores is unrestricted, and guessing is reduced.
Test-Retest Reliability
Known as the coefficient of stability. Good for tests that are relatively stable over time and not affected by repeated measurement.
Internal Consistency Reliability
Not appropriate for speeded tests. Split-half & Coefficient alpha are 2 types.
Split-Half Reliability
Two halves. Must have enough items or power is low. S-B helps when low.
Spearman-Brown Prophecy Formula
Estimates reliability on short split-half samples.
Coefficient Alpha
Method of assessing internal consistency reliability (i.e., special formula) that provides an index of average inter-item consistency.
Kuder-Richardson 20 (KR-20)
A substitute for coefficient alpha when test items are scored dichotomously (right or wrong).
Inter-Rater Reliability
Diff raters assessment. Assessed using kappa statistic.
Standard Error of Measurement (SEM)
SD times the Sq rt of 1-reliability coef. Makes the confidence interval. Ex: 68% confidence interval would be one standard error on both sides of the actual score.
Content Validity
The extent that a test adequately samples the content that it is designed to measure.
Construct Validity
The extent that a test measures the hypothetical trait.
Converget and Divergent Validity
Methods to assess construct validity (multimethod-multitrait or factor analysis).
Multitrait-Multimethod
"Mono" means same and "Hetero" means different. Same trait-diff methods coeff are large = convergent validity. Diff traits-same method coff are small = discriminant validity.
Factor Analysis
Identifies the min # of common factors accounting for a set of tests. Factors can be sq. Communality is the % of accountability by the factors.
Orthogonal Rotation
FA rotation resluting in seperate or uncorrelated factors.
Criterion-Related Validity
When test scores are to be used to draw conclusions about an examinee's likely standing an another measure.
Predictive Validity
When the criterion data are collected after the predictor.
Incremental Validity
The extent to which a predictor increases decision-making accuracy. (Positive Hit Rate-Base Rate)
True Positive
ID by predictor; meet criterion
False Positive
ID by predictor; do not meet criterion
True Negative
No ID by predictor; do not meet criterion
Predictor
Determines if a S is +/-
False Negative
No ID by predictor; do meet criterion
Criterion
Determines of a S is T/F
Base Rate
True Positives + False Negatives/Total #
Positive Hit Rate
True Positives/Total Positives
Criterion Contamination
When a criterion rater knows the predictor score.
When cross validating...
...shrinkage may occur.
Low reliability means low validity, but...
...high reliability does not always mean high validity.
Percentile Ranks
Nonlinear. Distribution is always flat regardless of the shape of the raw scores. Disadvantage is that it is an ordinal scale.
Criterion-Referenced Interpretation
Interpreting against a predetermined standard using either a percentage score or criterion score from an regression equation and expectancy table.
Content Analysis
Method of organizing data into categories used to summarize information from a narrative record.
Protocol Analysis
A type of Content Analysis. Used when a psychologist is interested in the cognitive processes involved in a problem solving task. Requires the Subject to "think aloud" and then information is coded. Remember Ericsson :)
Behavioral Sampling
Looking at specific aspects of behavior instead of a complete record.
Interval Recording
Recording divided into equal intervals of time. i.e., measuring blood pressure at 10 minute intervals
Event Recording
Recording behavior each time it occurs. i.e., recording the number of times a subject uses his eraser during an exam
Situational Sampling
Observing behavior in different settings. Known to increase generalizability.
Sequential Analysis
Coding behavior sequences rather than isolated events. Used to study complex social behaviors.
Quasi-Experimental Vs True Experimental Research
Only True experimental research provides enough control necessary to conclude that the observed variability in a DV is actually caused by the IV. RANDOM ASSIGNMENT is the key.
Random Assignment Vs Random Selection
Random Assignment distinguishes between true and quasi experimental research. Allows researcher to be more certain that the DV is varying due to the IV Random Selection enables the researcher to generalize her findings from the sample to the actual population.
sample
A sample is a collection of units from a population.
population
A collection of units being studied. Units can be people, places, objects, epochs, drugs, procedures, or many other things. Much of statistics is concerned with estimating numerical properties (parameters) of an entire population from a random sample of units from the population.
nominal scale
A variable whose value ranges over categories, such as {red, green, blue}, {male, female}, {Arizona, California, Montana, New York}, {short, tall}, {Asian, African-American, Caucasian, Hispanic, Native American, Polynesian}, {straight, curly}, etc. Some categorical variables are ordinal. The distinction between categorical variables and qualitative variables is a bit blurry. C.f. quantitative variable.
ordinal scale
In this scale type, the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd, etc.) of the entities assessed.
summative response scale
can be added together and divided by a constant.
interval scale
On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of the scale.
ratio scale
The scale type takes its name from the fact that measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit magnitude of the same kind. Seen in: mean, standard deviation, correlation, regression, analysis of variance, geometric mean, harmonic mean, coefficient of variation, logarithms
qualitative
Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data.
categorical
Categorical variables that judge size (small, medium, large, etc.) are ordinal variables.
nonmetric
color, size, shape, etc
grouped
Grouping variables are utility variables used to indicate which elements in a data set are to be considered together when computing statistics and creating visualizations. They may be numeric vectors, string arrays, cell arrays of strings, or categorical arrays. Logical vectors can be used to indicate membership (or not) in a single group.
classification
Suppose you have a data set containing observations with measurements on different variables (called predictors) and their known class labels. If you obtain predictor values for new observations, could you determine to which classes those observations probably belong? This is the problem of classification.
quantitative
numerical
metric
Metric data is any reading which is atleast at an interval scale. As opposed to Non Metric data which can be nominal or ordinal
ungrouped
data that has not been organized into groups.
independent variables
typically the variable representing the value being manipulated or changed
dependent variables
the observed result of the independent variable being manipulated
covariate variables
A variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or interacting variable.
latent variables
variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (directly measured).
measured variables
directly measured variables
sampling distribution
the distribution of a given statistic based on a random sample of size n. It may be considered as the distribution of the statistic for all possible samples from the same population of a given size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, and the sample size used.
null and alternative hypotheses
typically corresponds to a general or default position. For example, the null hypothesis might be that there is no relationship between two measured phenomena,[1] or that a potential treatment has no effect.[2]
type I error
"error of the first kind", an α error, or a "false positive": the error of rejecting a null hypothesis when it is actually true.
type II error
"error of the second kind", a β error, or a "false negative": the error of failing to reject a null hypothesis when in fact we should have rejected it.
power of a test
The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is false (i.e. that it will not make a Type II error, or a false negative decision).
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.
kurtosis
In probability theory and statistics, kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis[1]. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations.
Multiple regression
Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.
general linear model
y = 0 + 1x1 + 2x2 + ... + pxp
regression coefficients
The slope b of a line obtained using linear least squares fitting is called the regression coefficient.
standardized coefficients
In statistics, standardized coefficients or beta coefficients are the estimates resulting from an analysis carried out on variables that have been standardized so that their variances are 1. This means that they refer to the expected change in the dependent variable, per standard deviation increase in the predictor variable.
least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e. sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every single equation. The most important application is in data fitting. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model.
sum of squares of residuals
In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. It is also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate OLS case, see partitioning in the general OLS model.
F significance test
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact F-tests mainly arise when the models have been fit to the data using least squares.
r squared
It is the proportion of variability in a data set that is accounted for by the statistical model.
t tests for coefficients
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution, if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.
standard regression
In standard multiple regression, all of the independent variables are entered into the regression equation at the same time. Multiple R and R² measure the strength of the relationship between the set of independent variables and the dependent variable. An F test is used to determine if the relationship can be generalized to the population represented by the sample. A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable.
stepwise regression
Stepwise regression is designed to find the most parsimonious set of predictors that are most effective in predicting the dependent variable. Variables are added to the regression equation one at a time, using the statistical criterion of maximizing the R² of the included variables. When none of the possible addition can make a statistically significant improvement in R², the analysis stops.
partial correlation
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed.
semi-partial (part) correlation
The semipartial (or part) correlation statistic is similar to the partial correlation statistic. Both measure variance after certain factors are controlled for, but to calculate the semipartial correlation one holds the third variable constant for either X or Y, whereas for partial correlations one holds the third variable constant for both. The semipartial correlation measures unique and joint variance while the partial correlation measures unique variance. The semipartial (or part) correlation can be viewed as more practically relevant "because it is scaled to (i.e., relative to) the total variability in the dependent (response) variable." [5]
adjusted r squared
In a multiple linear regression model, adjusted R square measures the proportion of the variation in the dependent variable accounted for by the explanatory variables.
moderator variables
We begin with a linear causal relationship in which the variable X is presumed to cause the variable Y. A moderator variable M is a variable that alters the strength of the causal relationship. So for instance, psychotherapy may reduce depression more for men than for women, and so we would say that gender (M) moderates the causal effect of psychotherapy (X) on depression (Y).
suppressor variables
Suppression is defined as "a variable which increases the predictive validity of another variable (or set of variables) by its inclusion in a regression equation"[6]. For instance, if you are set to examine the effect of a treatment (e.g. medication) on an outcome (e.g. healing from a disease), a suppression would mean that instead of the drop that you would see from the direct effect of the treatment on the outcome when the mediator is introduced, the opposite happens.
dummy variables
dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish different treatment groups.
homoscedasticity
n statistics, a sequence or a vector of random variables is homoscedastic (pronounced /ˌhoʊmoʊskəˈdæstɪk/) if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance.
Mahalanobis distance for outliers
based on correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one.
collinearity
A set of points is collinear (also co-linear or colinear) if they lie on a single straight line or a projective line (for example, projective line over any field).
VIF (variance inflation factor)
In statistics, the variance inflation factor (VIF) quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient (the square of the estimate's standard deviation) is increased because of collinearity.
tolerance
the statistical reliability (also known as margin of error or tolerance limit)
standard multiple regression
several independent variables predicting the dependent variable.
hierarchical multiple regression
Hierarchical regression is used to evaluate the relationship between a set of independent variables and the dependent variable, controlling for or taking into account the impact of a different set of independent variables on the dependent variable.
statistical or stepwise regression
In statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure.
Logistic Regression
In statistics, logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression.
dichotomous variables
A variable that has only two categories. Gender (male/female) is an example. See also TWO-POINT SCALE.
log likelihood
In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which (the null model) is a special case of the other (the alternative model). The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other. This likelihood ratio, or equivalently its logarithm, can then be used to compute a p-value, or compared to a critical value to decide whether to reject the null model in favour of the alternative model.
omnibus test of model
They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall.
Nagelkerke pseudo r squared
The Nagelkerke measure adjusts the C and S measure for the maximum value so that 1 can be achieved:
Hosmer and Lemeshow test
The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population.
Wald test
The Wald test is usually used to assess the significance of prediction of each predictor The Wald test is known to be overly conservative (increased type II error) and when a predictor is multinomial it does not give a test of the whole predictor but only the dummy coded versions of the predictor.
discriminant analysis
Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterize or separate two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
dependent and independent variables for discriminant analysis
LDA is closely related to ANOVA (analysis of variance) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements[1][2]. In the other two methods however, the dependent variable is a numerical quantity, while for LDA it is a categorical variable (i.e. the class label). Logistic regression and probit regression are more similar to LDA, as they also explain a categorical variable. These other methods are preferable in applications where it is not reasonable to assume that the independent variables are normally distributed, which is a fundamental assumption of the LDA method.
discriminant function(s)
The opposite of an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. It is useful in determining whether a set of variables is effective in predicting category membership. It is also a useful follow-up procedure to a MANOVA instead of doing a series of one-way ANOVAs, for ascertaining how the groups differ on the composite of dependent variables.
Wilks Lambda test for significance
This can be interpreted as the proportion of the variance in the outcomes that is not explained by an effect. To calculate Wilks' Lambda, for each eigenvalue, calculate 1/(1 + the eigenvalue), then find the product of these ratios.
Wilks Lambda test for equality of group means
The classical Wilks' Lambda statistic for testing the equality of the group means of two or more groups is modified into a robust one through substituting the classical estimates by the highly robust and efficient reweighted MCD estimates, which can be computed efficiently by the FAST-MCD algorithm - see CovMcd. An approximation for the finite sample distribution of the Lambda statistic is obtained, based on matching the mean and variance of a multiple of an chi^2 distribution which are computed by simultaion.
canonical correlation
In statistics, canonical correlation analysis is a way of making sense of cross-covariance matrices. If we have two sets of variables and there are correlations among the variables, then canonical correlation analysis will enable us to find linear combinations of the x \ 's and the y \ 's which have maximum correlation with each other.
Box's M test
MANOVA and LDF assume homogeneity of variance-covariance matrices. The assumption is usually tested with Box's M. Unfortunately the test is very sensitive to violations of normality, leading to rejection in most typical cases.
exploratory factor analysis
Exploratory factor analysis (EFA) is used to uncover the underlying structure of a relatively large set of variables. The researcher's a priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data.
confirmatory factor analysis
Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the loadings of measured (indicator) variables on them confirm to what is expected on the basis of pre-established theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. The researcher's à priori assumption is that each factor (the number and labels of which may be specified à priori) is associated with a specified subset of indicator variables. A minimum requirement of confirmatory factor analysis is that one hypothesizes beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors. The researcher seeks to determine, for instance, if measures created to represent a latent variable really belong together.
KMO Kaiser-Meyre-Olkin statistics
The Kaiser-Meyer-Olkin measure of sampling adequacy is an index for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients (refer to SPSS User's Guide). Large values for the KMO measure indicate that a factor analysis of the variables is a good idea.
Bartlett's test of sphericity
Used to test the null hypothesis that the variables in the population correlation matrix are uncorrelated. The observed significance level is .0000. It is small enough to reject the hypothesis. It is concluded that the strength of the relationship among variables is strong.
principal components analysis vs. factor analysis
PCA takes into account all variability in the variables. factor analysis estimates how much of the variability is due to common factors ("communality").
eigenvalue (Kaiser) criterion
First, we can retain only factors with eigenvalues greater than 1. In essence this is like saying that, unless a factor extracts at least as much as the equivalent of one original variable, we drop it. This criterion was proposed by Kaiser (1960), and is probably the one most widely used.
Scree test criterion
Based on the relative amounts of common and unique variance employed to obtain a single factor. Factors that are dominated by unique variance are of little use for the purpose of data reductiton and summary.
factor rotation
Sometimes, the estimated loadings from a factor analysis model can give a large weight on several factors for some of the measured variables, making it difficult to interpret what those factors represent. The goal of factor rotation is to find a solution for which each variable has only a small number of large loadings, i.e., is affected by a small number of factors, preferably only one.
oblique rotation
oblique rotation gives the correlation between the factors in addition to the loadings.
communalities in factor analysis
Factor analysis is a statistical technique used to identify a smaller number of underlying dimensions, or factors, that can be used to represent relationships among interrelated variables. The amount of the common factorial variance is initially unknown and has to be estimated. The most often used method for obtaining the communality estimate is to find the squared multiple correlations of each variable with all other variables.
loadings in factor analysis
A factor loading is a product-moment correlation of a variable with a factor. The matrix of factor loadings thus permits interpretation of a factor in terms of common properties of variables.
interval
ordered and equal can add or subtract; zero is arbitrary but does not imply absence of (e.g. temp)
mesokurtic
normal curve
leptokurtic
tall and peaked
platokurtic
flat or plateau-like
central limit theorem three principles
1. sampling distribution of means increasingly approaches normal as sample size increases regardless of the shape of the population distribution 2. mean of sampling distribution equals population mean 3. SD of sample equals SD of population from which you drew your sample
why/when do a one or two tailed test
1. one tail when you expect the results will show a specific direction 2. two tail when predicted direction of results are uncertain
alpha- definition and alternatve namesq
rejection region region of unlikely values, critical region
how to increase condience
reduce alpha
to increase power
1.increase alpha 2.increase sample size 3.use a reliable DV measure 4.one tail test
chi square test
nonparametric nominal variable used when analyzing frequency of a category use multiple chi square if more than one variable
when to use parametric tests
interval/ratio
when to use nonparametric tests
nominal/ordinal
t-test for single sample/group
aka student t test used to compare treatment group to population mean
t-test for correlated samples
comparing subjects to other ss or even themselves. pre-test, post-test
t-test for independent samples
have a tx group and control group and comparing them
ANOVA when to use and why
use for two or more group means use to manage experiment wise error rate manages possibility of Type I error
2 way (factorial) ANOVA
2+ IV's 2+ Ind Grps 1 DV
ANCOVA
analysis of covariance ANOVA plus regression analysis reduces within group vriability considered more powerful test
randomized block ANOVA
blocks extraneous variables and use as aonther IV used to study unique effects on the DV
split plot/mixed ANOVA
-used with mixed design study study mixed design of a within and between groups variable e.g. two between groups (tx vs control) and one within gps
trend analysis
IV describe in terms of shape and form four outcomes: linear, cubic, quadratic, quartic
why use post hoc
use after conducting planned comparisons
Scheffe
least vulnerable to Type I more vulnerable to Type II
Tukey
least vulnerable to Type I most vulnerable to Type II
fisher
most vulnerable to type I error least vulnerable to type II
f score
msb/msw
in correlation ______ is to IV as _____ is to DV
predictor, criterion
Pearson r when and under what assumptions
select whenever u have two continuous variables Assumptions 1. linearity 2.unrestricted range 3. homoscedasticity
what is assumption of linearity mean relative to pearson r
should be able to draw a straight line; if not linear would underestimate degree of association between the two variables
assumption of unrestricted range relative to pearson r
must be an unrestricted range of scores along x and y axes consequence would underestimate relationship
assumption of homoscedasticity
range of y scores should be ablout the same at all values of x range- if not homoscedastic, would not represent full range of scores between x and y
spearman rho
select with two rank-ordered variables (e.g. rank in high school and rank in SAT
biserial
one continuous one artifically dichotomous variable
point biserial
one continuous varialb ena done true dichotomy (e.g.scores and gender
eta
two continuous nonlinear relationship variables (on test, level of anxiety and performance on a memory test
regression (aka prediction) analysis
used to predict a score on some criterion based uopn obtained score on a predictor e.g sat score, college GPA
multivariate techniques for prediction
assesses degree of association between 3 or more varialbes using to make predictions
stepwise multiple regression
manages for error, determining the fewest number of variables to determine.
multicolinearity
observed in stepwise multiple regression, if two variables are highly correlated remove one as they are redundant and increase chance for error (jewish mother taking two temps)
LISREL
one or two way paths, relatioship between observed and latent variables
looking at main effects
examine marginal means effects of each IV without considering effects of other IVs
interaction effects
look for the cross, if no cross no interaction interaction occurs when effect of one IV are different at different levels of the other IV
What does classical test theory boil down to?
CLASSICAL TEST THEORY: Reliability means 1) a test yields repeatable, consistent results 2)a test is reliable to the degree that your score reflects the true score on the test rather than error
What does a reliability coefficient of .90 indicate?
90% of the observed variability is d/t true score differences and the remaining 10% is due to measurement error.
Which do you square to interpret? a) the reliability coefficient b) the correlation coefficient?
square the correlation coefficient to interpret it i.e., to determine the proportion of variability that’s shared between 2 measures
Test-retest reliability isn’t appropriate for…
attributes that change over time, e.g., mood.
How do you derive an alternate forms reliability coefficient?
ALTERNATE FORMS RELIABILITY COEFFICIENT administer 2 alternate forms of a test to the same group of examines (Form A at time 1, then Form B at time 2), then obtain the correlation b/w the 2 sets of scores. So, everyone completes Form A and Form B. reduces practice effects
What does Internal Consistency Reliability measure?
measures the correlations among individual items in a test
What are the 3 different methods of determining the coefficient of internal consistency?
1) Split-half 2) cronbach’s coefficient alpha (for multiple scored items, e.g., likert scales) 3) Kuder-Richardson formula 20 (for dichotomously scored right-wrong)
What’s the kappa coefficient for?
a measure of inter-rater reliability
The standard error of measurement is different than a reliability coefficient in that…
the standard error of measurement is used to determine the CONFIDENCE INTERVAL for an INDIVIDUAL test score. Whereas the reliability coefficient represents how much error a whole TEST contains
How do you calculate a 68, 95, and 99% CI given a standard error of the measurement of 4.0, and a score of 110 on an IQ test?
68%CI = Score +/- 1x(Stan Err of Measurement) 95%CI = Score +/- 2x(Stan Err of Measurement) 99%CI = Score +/- 2.5x(Stan Err of Measurement) 68%CI = 110 +/- 4 = 106-114 95%CI = 110 +/- 8 = 102-118 99%CI = 110 +/- 10 = 100-120
How does a decrease in variability of scores impact reliability coefficient of a test?
it DECREASES reliability
Floor effects result from too many a) easy items b) difficult items
FLOOR EFFECT = too many difficult questions
A ceiling effect results from too many a) easy items b) difficult items
CEILING EFFECT = too many easy questions
How is content validity different than construct validity?
CONTENT validity asks ”how much does this test adequately and representatively sample the context area?” and it’s based on careful judgment & selection of items covering all content domains, and/or good convergent or criterion-related validity Construct validity asks ”how much does this test measure the a theoretical construct or trait?” and it’s assessed over time as data is accumulated and test of convergent/divergent validity are made, and/or factor analyses.
Criterion-related validity is…
the correlation between the predictor (e.g., the SATs) and the criterion (what it’s supposed to predict, e.g., college GPA)
What’s the difference b/w concurrent validity and predictive validity?
concurrent = predictor and criterion scores are collected concurrently. predictive = predictor scores are collected first and criterion data are collected later.
The multitrait-multimethod matrix is one way of assessing a test’s….
convergent and divergent validity that is, how much a given test correlates with another test that measures the same construct and how much it doesn’t correlate with a test designed to measure another construct
What’s a factor analysis for?
it measures the degree to which a set of tests are all measuring the same underlying factor, or construct.
In factor analysis, what is a factor loading?
it’s a particular test’s correlation with a particular factor that’s been found in the factor analysis.
In factor analysis, what’s the purpose of rotating factors? What are the two types of rotations? When is it done?
It facilitates the interpretation of the analysis. orthogonal rotation (resulting in uncorrelated factors), and oblique (resulting in correlated factors) factors are rotated as the final step in a factor analysis.
What’s the difference between the standard error of the estimate vs the standard error of the measurement?
the standard error of the ESTIMATE tells us how much error is in the estimated/predicted CRITERION score (e.g., if using SATs to predict GPA, this tells us how off our prediction of GPA might be) standard error of the MEASUREMENT tells us how much error is in the TEST score itself (e.g., where an examinees TRUE test score is likely to fall on the SATs)
How do you calculate a standard error of the measurement?
Standard Error of the Measurement = SD √(1 - reliability coefficient)
When the criterion-related validity of a test is moderated by a variable, the test is said to have…
differential validity
What’s shrinkage?
SHRINKAGE the reduction that occurs in a criterion-related validity coefficient upon cross-validation (i.e., when a test is developed and validated with an initial sample, and then tested again using only the retained items within a second sample)
What is an EIGENVALUE?
EIGENVALUE 1) applies to factor analysis, 2) is used to describe the FACTORS, not the particular tests 3) it is the amount of variance across all tests that is accounted for by the factor. In other words, an eigenvalue tells us how significant or big each factor is.
If for a particular item, the p value is .80, what does this mean?
80% of test takers get the question RIGHT
How hard should test items be to maximize their value in discriminating between high and low scoring test takers?
moderately difficult, p=.50
1) what does "p" stand for 2) what is the formula for calculating "p" 3) what is the range of "p"? 4) What do larger/smaller values mean? 5) many tests retain items with moderate difficulty levels. What is "p" at this level?
1) item difficulty index 2) total # examinees passing item / total # examinees 3) 0 - 1.0 4) larger = easy item smaller = difficult item 5) p is close to .50
when p is close to .50: 1) test score variability increases/decreases? 2) test reliability increases/decreases? 3) discrimination between examinees increases/decreases? 4) what is the distribution
1) increase 2) increase 3) increase 4) normal
1) while most tests look for moderate difficulty levels (p), what type of test prefers retaining more difficult items 2) what value of p is usually optimal for these tests
1) true/false 2) .75 (whereas, most other tests look for moderate difficulty, p=.50)
1) What is "D" 2) when D = +1.0, what does this mean 3) when D = -1.0, what does this mean
1) D = item discrimination index 2) all upper-scoring group got the item correct none of the lower-scoring group 3) all of the lower-scoring group got the item correct none of the upper-scoring group
classical test theory vs. item response theory 1) which is sample invariant (same across different samples) 2) which is sample dependent (varies from sample to sample)
1) item response theory 2) classical test theory
1) Item response theory involves deriving __ for EACH item 2) what does it show
1) item characteristic curve 2) level of ability AND probability of answering item correctly
On an item characteristic curve 1) what is on the x axis? y axis? 2) how do you determine difficulty level 3) how do you determine the item's ability to discriminate btwn high and low achievers 4) how do you determine the probability of guessing correctly
1) x = ability level y = probability of correct response 2) look where on the curve, 50% got the correct response. Then look for corresponding ability level. 3) slope of the curve 4) y intercept (point at which curve intercepts the vertical axis) - proportion of people with low ability who answered the item correctly
on an item characteristic curve, what does a steep slope indicate
the steeper the slope, the greater the item's ability to discriminate btwn high and low achievers
when using an achievement test developed on the basis of item response theory, an examinee's test score will indicate __
ability level
reliability provides an estimate of the proportion of variability in an examinee's score that is due to __
TRUE differences among examinees on attributes measured by test
reliability coefficient 1) range 2) what does a low "r" mean? 3) what does a high "r" mean?
1) 0.0 to +1.0 2) r = 0 -> all variability in score is due to error 3) r = +1 -> all variability reflects true score variability (reliable)
1) r with subscript "xx" stands for 2) r with subscript "xy" stands for
1) reliability coefficient 2) validity coefficient
a reliability coefficient of .84 indicates that 1) __% of variability in scores is due to TRUE score differences 2) __% is due to error
1) 84% 2) 16%
which method for estimating reliability is associated with: 1) degree of stability (consistency) 2) coefficient of equivalence 3) coefficient alpha
1) test-retest 2) alternate forms 3) internal consistency
test-retest reliability 1) what is primary source of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute
1) time sampling factors 2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
which method for estimating reliability can be used for the following: 1) aptitude 2) mood 3) speeded test
1) test-retest, alternate forms, internal consistency 2) internal consistency 3) alternate forms
alternate forms reliability 1) 2 primary sources of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute
1) content sampling and time sampling factors 2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
1) split-half reliability assesses what type of reliability 2) it usually under/over? estimates a test's true reliability 3) how is this corrected
1) internal consistency 2) under 3) Spearman-Brown prohecy forumula
As the length of a test decreses, the reliability decreases/increases?
decreases
1) Cronbach's coefficient alpha assesses what type of reliability 2) it provides the _____ boundary of a test's reliability
1) internal consistency 2) lower
1) KR-20 is used for what type of reliability? 2) it is a variation of what other method 3) how does it differ
1) internal consistency 2) coefficient alpha 3) KR-20 is used when items are scored dichotomously
which method for evaluating internal consistency reliability is used when items are scored dichotomously
KR-20
coefficient alpha 1) as the test content become more heterogeneous, coefficient alpha increases/decreases?
1) decreases
what correlation coefficient is uses with inter-rater reliability
kappa statistic
for inter-rater reliability, percent agreement will provide an over/under? estimate of the test's reliability
overestimate
consensual observer drift will aritificially inflate/deflate inter-rater reliability
inflate
1) what is the most thorough methond for estimating reliability 2) which method is NOT appropriate for speed tests
1) alternate forms 2) internal consistency
what method is used to estimate the effects of lengthening and shortening a test on its reliability coefficient
Spearman-Brown
Spearman-Brown tends to over/under? estimate a test's true reliability
overestimate
when the range of scores is restricted, the reliability coefficient is high/low
low
is the reliability coefficient high or low when: 1) item has low difficulty 2) item has average difficulty 3) item has high difficulty
1) high 2) low 3) low
to maximize reliability coefficient 1) increase/decrease test length 2) increase/decrease range of scores 3) increase/decrease heterogeneity among examinees 4) increase/decrease the probabilty of guessing correctly 5) p should be close to __
1) increase 2) increase 3) increase 4) decrease 5) .50 (average item difficulty)
a reliability coefficient of __ is considered acceptable
.80 or larger
1) what is the standard error of measurement 2) what is the standard error of estimate
1) used to construct a confidence interval around a measured (obtained) score 2) used to construct a confidence interval around a predicted (estimated) crterion score
what is the formula for 1) standard error of measurement 2) standard error of estimate
1) SD x square root of 1 minus reliability coefficient squared 2) SD x square root of 1 minus validity coefficient squared
obtained test scores tend to be inaccurate estimates of true scores 1) scores ABOVE the mean tend to over/under?estimate true scores 2) scores BELOW the mean tend to over/under?estimate true scores
1) over 2) under
when the standard error of measurement = __, an examinee's obtained scores can be interpreted as her true score
0
which of the following would be most appropriate for estimating reliability for anxiety 1) test-retest 2) alternate forms 3) coefficient alpha
3
what are the minimum and maximum values of the standard error of measurement
minimum = 0 maximum - SD of test scores
how do you establish content validity
judgment of subject matter experts
1) what are the types of construct validity 2) what methods are used to assess
1) convergent and disciminant 2) multitrait-multimethod matrix AND factor analysis
multitrait-multimethod matrix 1) which coefficient provides evidence of convergent validity? Is the coefficient large/small? 2) which coefficient provides evidence of discriminant validity? Is the coefficient large/small?
1) large monotrait-heteromethod 2) small heterotrait-monomethod OR small heterotrait-heteromethod
what does a factor analysis assess?
construct validity (convergent and discriminant)
In a factor matrix, correlation coefficients (factor loadings) indicate the degree of association btwn __ and __
each test and each factor
a test has a factor loading of .78 for Factor I. This means that __% of variability in the tests is accounted for by Factor I.
61% (.78 squared)
what is communality
total amount of variability in test scores explained by identified factors
Communality for a test is .64 This means that __% of variability in scores is explained by a combination of identified factors
64% NOT squared because it is already square. Communality IS the amount of shared variance.
according to factor analysis, a test's reliability consists of what two components
communality and specificity communality = factors tests share in common specificity = factors specific to the test (not measured by other tests)
a communality is a lower/upper? estimate of a test's reliablity coefficient
lower-limit
if a test has a communality of .64, the reliability coefficient will necessarily be __
.64 or larger
two types of rotations in a factor analysis 1) which one is associate with uncorrelated factors? 2) with correlated factors?
1) orthogonal 2) oblique
when factors are orthogonal/oblique?, a test's communality can be calculated from factor loadings
orthogonal
In factor analysis, when factors are orthogonal, how do you calcualte communality?
communality = SUM of squared factor loadings
when a criterion-related validity coefficient is large, what does this indicate
predictor has criterion-related validity
what are the forms of criterion-related validity
concurrent and predictive
1) validity coefficients rarely exceed __ 2) validity coefficients as low as __ might be acceptable
1) .60 2) .20-.30
how do you evaluate a predictor's incremental validity
scatterplot
for a scatterplot used to assess incremental validity, what determines: 1) positive/negative 2) true/false
1) predictor 2) criterion
how do you calculate incremental validity? How is each component calculated?
incremental validity = postive hit rate - base rate base rate = true positive + false negative divided by total # of people positive hit rate = true positives divided by total positives
if incremental validity = .34 test can be expected to increase proportion of sucessful employees by __%
34
relationship btwn predictor reliability and validity 1) what is the equation relationship btwn predictor AND criterion reliability and validity 2) what is the equation
1) predictor's criterion-related validity coefficent is less than or equal to (cannot exceed) the sqaure root of its reliability coefficient predictor's validity coefficient is less than or equal to (cannot exceed) the square root of the predictor's reliability coefficient TIMES the criterion's reliability coefficient
If a predictor has a reliability coefficient of .81, it's validity coefficient will necessarily be __ (exact number)
.90 or less
1) what is the correction for attenuation formula used for 2) does it tend to over/under?estimate the actual validity coefficient
to estimate what a predictor's validity coefficient WOULD be if the predictor and/or criterion were perfectly reliable (reliability coefficients = 1.0) 2) overestimate
criterion contamination 1) tends to inflate the relationship btwn __ 2) results in an artificially high __ coefficient
1) predictor and criterion 2) criterion-related validity coefficient
when cross-validating a predictor on another smaple, the cross-validation coefficient tends to __
shrink
shrinkage refers to the the shrinking of __ when __
validity coefficient when the predictor is cross-validated
norm-referenced vs. criterion-referenced Which are the following: 1) percentile ranks 2) percentages 3) regression equation 4) z-score 5) IQ score
1) norm 2) criterion 3) criterion 4) norm 5) norm
distribution of percentile ranks has what kind of shape
flat (rectangular) regardless of the shape of the raw score distribution
what is the tranformation called when: 1) distribution of transformed scores DIFFERS in shape from the distribution of raw scores 2) has SAME shape? 3) example of the first?
1) nonlinear transformation 2) linear transformation 3) percentile ranks
when using correction for guessing, the resulting distribution will have (compared to original distribution) 1) lower/higher mean 2) smaller/larger SD
1) lower 2) larger
single-sample chi-square 1) how many variables 2) measurement scale 3) df (degrees of freedom)
variables = 1 nominal (frequency) data df = (C - 1)
multiple-sample chi-square 1) how many variables 2) measurement scale 3) df (degrees of freedom)
variables = 2 or more nominal (frequency) data df = (C - 1)(R - 1)
Mann-Whitney U 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 1; 2 independent groups DV = 1; rank-ordered data
Wilcoxon Matched-Pairs Signed-Ranks 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 1; 2 correlated groups DV = 1; rank-ordered data
Kruskal-Wallis 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 1; 2 or more independent groups DV = 1; rank-ordered data
t-test for a single sample 1) how many IV's 2) # & type of groups for IV 3) measurement scale 4) df
IV = 1; single group DV = 1; interval or ratio data df = (N - 1)
t-test for independent samples 1) how many IV's 2) # & type of groups for IV 3) measurement scale 4) df
IV = 1; 2 independent groups DV = 1; interval or ratio data df = (N - 2)*
t-test for correlated samples 1) how many IV's 2) # & type of groups for IV 3) measurement scale 4) df
IV = 1; 2 correlated samples DV = 1; interval or ratio data df = (N - 1)* N = pairs of scores
one-way ANOVA 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 1; 2 or more independent groups DV = 1; interval or ratio data
Factorial ANOVA (two-way, three way) 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 2 or more; independent groups DV = 1; interval or ratio data
MANOVA 1) how many IV's 2) measurement scale
IV = 1 or more DV = 2 or more; interval or ratio data
mixed (split-plot) ANOVA 1) how many IV's 2) # & type of groups for IV 3) measurement scale
IV = 2 or more; at least 1 is within-subjects and 1 is between-groups DV = 1; interval or ratio data
Name two types of construct validity.
Convergent and discriminant.
What method is most useful to assess convergent validity?
Multitrait-Multimethod matrix displays the correlations between different tests measuring same and different traits.
What is criterion-referenced scoring?
Pegs performance to some criterion level. A yes or no answer. Pass/fail.
For timed tests, which measure of reliability is most effective?
Alternate forms, becauses measures of internal consistency aren't good, since time is the important variable.
Correction for attenuation formula
Measures the impact on the validity of a test due to an increase in its reliability.
What is an eigenvalue?
A statistic that indicates the variance accounted for by a factor extracted from a factor anlaysis. Used to determine whether a factor should be kept; does it account for a significant amount of variance?
Research can be categorized as: a. qualitative b. quantitative c. both a. and b.
c. Research is the systematic study and investigation of a phenomenon in order to reveal, analyze, and establish facts, principles, and theories. The various methods of research can be categorized as qualitative or quantitative.
True or False. Qualitative research is conducted to obtain a holistic description of the naturalistic, contextual approach, emphasizes understanding and interpretation, and is primarily inductive in nature. The investigator's perspective is an important element of the research process.
True. Qualitative Research is conducted to obtain *holistic* (relating to or concerned with wholes or with complete systems rather than with the analysis of, treatment of, or dissection into parts) description of the quality of relationships, actions, situations, or other phenomena. It uses a *naturalistic approach* whereby (subject(s) is/are observed without interruption under normal or natural circumstances), *contextual approach*, emphasizes understanding and interpretation, and is primarily *inductive* (ideas are processed from the specific to the general in nature).
True or False. Quantitative research is conducted to obtain numerical data on variables. It makes use of empirical methods and statistical procedures, emphasizes prediction, generalizability, and causality, and is primarily deductive.
True. Quantitative Research is conducted to obtain numerical data on variables. It makes use of *empirical* (capable of being verified or disproved by observation or experiment) methods and statistical procedures, emphasizes prediction, generalizability, and *causality*, and is primarily *deductive* (ideas are processed from the general to the specific).
Quantitative research is further categorized as nonexperimental or _____.
experimental
Nonexperimental research is conducted to: a. to test hypotheses b. collect data on variables rather than to test hypotheses c. is emphasized on the EPPP
b. nonexperimental (descriptive) research is conducted to collect data on variables rather than to test hypotheses about the relationship between them. Correlational research, archival research, case studies, and surveys are ordinarily nonexperimental.
Experimental research is conducted to: a. to test hypotheses b. collect data on variables rather than to test hypotheses c. is emphasized on the EPPP d. both a. and c.
d. Experimental research is conducted to test hypotheses about the effects of one or more independent variables on one or more dependent variables. Experimental research is emphasized on the psychology licensing exam.
Name the steps to Planning and Conducting Experimental Research using the acronmym: Dumb Calculations Stop-up Cranial Arterial Reasoning
1.Developing An Idea Into A Testable Hypothesis 2.Choosing An Appropriate Research Design 3.Selecting A Sample 4.Conducting The Study 5.Analyzing The Obtained Data 6.Reporting The Results
a _____ is any characteristic, behavior, event, or other phenomenon that is capable of varying or existing in at least two different states, conditions, or levels (e.g., gender).
variable
A _____ is a characteristic that is restricted to a single state or condition.
Constant; For example, gender may be treated as a constant if only male subjects are used in the study.
Researchers normally distinguish between two types of variables: A person's status on the _____ variable is assumed to affect his/her status on the ______ variable.
independent; dependent
If a psychologist conducts a research study to test the hypothesis that children who watch violent films are more aggressive than children who do not, the study's independent variable is ______.
To answer this question correctly you would need to ask yourself, "What are the effects of (INDEPENDENT VARIABLE) on (DEPENDENT VARIABLE)?" The answer would be: What are the effects of (films: violent vs. nonviolent) on aggressiveness.
To assess the effects of an independent variable on a dependent variable, the independent variable must have at least a. one level b. two levels c. three levels
b. two levels; The IV(s) affect(s) or alter(s) status of the dependent variable (DV); it is manipulated by the experimenter; Each IV must have at least two levels, which provides a point for comparison. Comparisons on the DV are made across different levels of the IV. When the psychologist is using only one variable as the IV, then the effects of that IV may be compared by using a self-control procedure whereby the effects of no treatment on that variable may serve as the second variable.
The dependent variable is: a. manipulated b. measured c. left alone d. not necessary
Dependent Variable (DV, outcome, “Y”) – status on this variable seems to “depend on” the status of another variable (the IV). It is considered the outcome of the study and is measured by pretests and posttests. This variable is not manipulated, but measured only.
To identify the IV(s) and DV(s) in a study, translate the information into a question: What is the effect of _____ on _____ ?
Independent Variable; Dependent Variable
When using manipulated independent variables, the psychologist will be able to determine which levels of the IVs will be administered to subjects. However, in some studies, the psychologist cannot control the independent variables. When this happens, the IVs are considered ______ variables. a. constant b. organismic c. dependent
b. organismic; The use of organismic (a complex structure of interdependent and subordinate elements whose relations and properties are largely determined by their function in the whole) variables also limits the study in that the psychologist will not be able to determine if any observed relationships are causal in nature.
The IV and DV must be defined in terms of the method or process that will be used to identify or measure them. Once this is done, the variables are said to be: a. descriptively analyzed b. operationally defined c. appropriate for the study
b. operationally defined; Each variable must be defined and measured(e.g., score on a measure such as the WAIS or use of observation).
An important decision when using _____ to identify or measure a behavior is how to record that behavior. a. observation b. quasi experimental research c. experimental research
a. Whenever observation is used to identify or measure behavior, an important decision is how to record or measure that behavior.
True or False. When using observational methods to obtain measures on the DV, there are four main ways to measure variables: content analysis, behavioral sampling, situational sampling, sequential analysis.
True.Content Analysis – organizing the data into categories; Behavioral Sampling – systematic method for sampling and recording the frequency or duration of the behavior and/or rating the behavior in terms of its qualitative characteristics; Situational Sampling – alternative to behavioral sampling, used when the goal of the study is to observe a behavior in a number of settings, helps increase generalizability of the study’s findings; Sequential Analysis – entails coding of behavioral sequences rather than isolated behavioral events and is used to study complex social behaviors.
_____ analysis involves recording a subject's verbalizations when she has been instructed to "think aloud" while solving a complex cognitive problem.
Protocol Analysis – subject is asked to think aloud while solving a problem. The subject’s verbalizations are recorded and coded in term of relevant categories. Protocol = (record of a document or transaction).
_____ recording is particularly useful when the target behavior has no clear beginning or end.
Interval recording – observing a behavior for a period of time that has been divided into equal intervals (e.g., a 30-minute period that has been divided into 15-second intervals) and recording whether or not the behavior occurs during each interval. Use for studying complex interactions and behaviors that have no clear beginning or end such as laughing, talking, or playing.
_____ sampling is an effective technique when the behavior occurs infrequently or leaves a permanent record.
Event Sampling (recording) – observing a behavior each time that it occurs. This technique is good for studying behaviors that occur infrequently, that have a long duration, or that leave a permanent record or other product (e.g., a completed worksheet or test).
Experimental research is categorized as either true experimental or quasi-experimental. The primary feature that distinguishes true experimental research from quasi-experimental research is that, in the former, the experimenter can randomly _____ subjects to different treatment groups: a. assign b. rotate
a. assigne subjects; True Experimental Research provides the amount of control necessary to conclude that the observed variability in the dependent variable is actually caused by variability in a independent variable. In order for the study to be “true experimental research”, the psychologist must be able to: ·Control the experimental ·Determine which levels of the IV to include ·Randomly assign subjects to different treatment groups (i.e., to different levels of the IV)
Radomization of subjects to different treatment groups allows the experimenter to be more certain that subjects in different groups are initially similar and, consequently, that any observed differences between then on the _____ varible(s) were caused by the _____ variable(s): a. dependent; independent b. independent; dependent c. constant; organismic
a. dependent; independent Random Assignment (randomization) helps ensure that any observed differences between groups on the dependent variable are actually due to the effects of the IV.
True or False. When using Quasi-experimental Research, an experimenter can sometimes control the assignment of subjects to treatment groups.
False. Quasi-experimental Research – experimenter cannot control the assignment of subjects to treatment groups; must use intact or (pre-existing) groups or a single treatment group.
When a researcher must use intact or (pre-existing) groups or a single treatment group, s/he is doing: a. experimental research b. quasi-experimental research
b. Quasi-experimental Research – experimenter cannot control the assignment of subjects to treatment groups; must use intact or (pre-existing) groups or a single treatment group.
With regard to standard sampling techniques, when using this method, every member of the population has an equal chance of being included in the sample, and the selection of one member from the population has no effect on the selection of another member: a. cluster sampling b. stratified random sampling c. simple random sampling
b. Simple Random Sampling -Each member of population has an equal chance of being included in the sample -Selection of one member from the population has no effect on the selection of another member
With regard to standard sampling techniques, the experiementor might want to use this method when the population of interest varies in terms of specific "strata" (characteristics) that are relevant to the research hypothesis: a. cluster sampling b. stratified random sampling c. simple random sampling
b. Stratified Random Sampling -Use when the population varies in terms of “strata” (characteristics) that are relevant to study -Divide the population into the appropriate strata (e.g., SES, race, age, etc.) and randomly select subjects from each stratum.
With regard to standard sampling techniques, the experimenter might want to use this method when it is not possible to identify or obtain access to the entire population of interest: a. cluster sampling b. stratified random sampling c. simple random sampling
a. Cluster Sampling -Select units (clusters) of individuals that are relevant to study -Include all individual in those units/clusters or randomly select individuals from units/cluster (Multistage Cluster Sampling) when it is not possible to identify or obtain access to the entire population of interest.
Exlain the difference bewteen random assignment and random selection.
Both are important in research but for different reasons. Random Assignment – allows investigator to be more certain that an observed effect on the DV was actually caused by the IV. Random Selection – enables the investigator to generalize his/her findings from the sample to the population. It is random assignment that distinguishes true experimental research from quasi-experimental research.
An educational psychologist believes that children are better spellers if they are provided with "spaced" practice rather than "massed" practice whiel they are learning new words. Identify the IV and DV: IV(s): DV(s):
IV(s): types of practic DV(s): spelling ability
Dr. Mean wants to test the hypothesis that a mastery learning technique is more effective than the traditional instructional approach for teaching college algebra but that its effectiveness is a function of a student's need for achievement and math aptitude. IV(s): DV(s):
IV(s):instuctional method, need for achievement, math aptitude DV(s): algebra achievement
Dr. Freud wants to compare the effects of cognitive-behavioral therapy, client-centered therapy, and psychoanalytic psychotherapy for reducing test anxiety in high- and low-achieving college students as measured by a physiological measure of anxiety and the Taylor Manifest Anxiety Scale. Identify the IV and DV: IV(s): DV(s):
IV(s): type of therapy, achievement level DV(s): physiological measure of anxiety, Taylor Manifest Anxiety Scale
A school principal suspects that a teacher's expectations about a student's academic performance will have a "self-fulfilling prophecy" effect on the student's own expectations and actual academic achievement but that the magnitide of the effect will depend on the student's level of self-esteem. Identify the IV and DV: IV(s): DV(s):
IV(s): teacher expectations, student self-esteem DV(s): student expectations, student achievement
A researcher asks a sample of male and female mental health professionals to describe a "healthy male adult" and a "healthy female adult." Based on his review of the literature, he expects that the adjectives used by both male and female professionals to describe a healthy male will be more positive than the adjectives used to describe a healthy female. Identify the IV and DV: IV(s): DV(s):
IV(s):gender of mental health professionals, gender of healthy adult DV(s): descriptive adjectives of healthy adult
To investigate the effects of watching violent movies on aggressive behavior, Dr. Hatchet has male and female children who have been identified as either very aggressive, moderately aggressive, mildly aggressive, or nonaggressive watch either a violent or neutral film. Following the film, he observes each child during a 60-minute free play period and counts the number of agressive acts the child exhibits: Identify the IV and DV: IV(s): DV(s):
IV(s):initial aggressiveness, gender, type of film DV(s): number of agressive acts
An investigator compares the performance of a single group of subjects before and after exposure to an intervention. This study is a (true/quasi-) experimental study.
quasi experimental study because experimenter could not control the assignment of subjects to treatment groups because s/he had to use an intact or (pre-existing) groups or a single treatment group.
When a study's independent variable is an organismic variable, the study is considered to be a (true/quasi-) experimental study.
quasi-experimental
When using protocol analysis, an investigator is interested in: a. infrequent behaviors b. verbal reports c. historical events
b. verbal reports; Protocol Analysis – subject is asked to think aloud while solving a problem. The subject’s verbalizations are recorded and coded in term of relevant categories. a. INCORRECT - Event Sampling is good for studying behaviors that occur infrequently, that have a long duration, or that leave a permanent record or other product (e.g., a completed worksheet or test).
A psychologist designs a study to assess prosocial behaviors (smiling, making eye contact, etc.) in infants while interacting with caregivers. the best sampling (recording) technique for these behaviors would be: a. interval b. cluster c. event
a. Interval recording – observing a behavior for a period of time that has been divided into equal intervals (e.g., a 30-minute period that has been divided into 15-second intervals) and recording whether or not the behavior occurs during each interval. Use for studying complex interactions and behaviors that have no clear beginning or end such as laughing, talking, or playing. b. no such thing c. INCORRECT - Event Sampling (recording) – observing a behavior each time that it occurs. This technique is good for studying behaviors that occur infrequently, that have a long duration, or that leave a permanent record or other product (e.g., a completed worksheet or test).
The "hallmark" of true (versus quasi-) experimental research is: a. the ability to randomly select subjects from the population b. the ability to randomly assign subjects to treatment groups c. the ability to test hypotheses about the relationship bewteen variables.
b.
To obtain a sample of elementary school children for your research study, you randomly select several schools from the population of schools and then randomly choose students from the schools that you have selected. This is an example of: a. quota sampling b. stratified random sampling c. cluster sampling
c. cluster sampling; Cluster Sampling -Select units (clusters) of individuals that are relevant to study -Include all individual in those units/clusters or randomly select individuals from units/cluster (Multistage Cluster Sampling) when it is not possible to identify or obtain access to the entire population of interest.
Which of the following would be most useful for studying behaviors that leave a permanent record: a. time sampling b. situation sampling c. event recording
c. event recording; Event Sampling (recording) – observing a behavior each time that it occurs. This technique is good for studying behaviors that occur infrequently, that have a long duration, or that leave a permanent record or other product (e.g., a completed worksheet or test).
When conducting an experimental research study, an experimenter wants a design that will maximize variability in the dependent variable that is due to the _____, control variability due to _____, and minimize variability due to _____.
independent variable, extraneous variables (systematic error), random error
Experimental variability, or variability in the dependent variable that is due to the _____ variable, is maximized when groups are made as different as possible with respect to that variable, while variability due to _____ error is minimized by ensuring that random fluctuations in subjects, conditions, and measuring instruments are eliminated or equalized among all treatment groups.
independent, random True experimental research helps an investigator minimize the effects of random (unpredictable) fluctuations in subjects, conditions, and measuring instruments. Tip! It is important to remember to pick a design that minimizes the effects of both systematic error (error due to extraneous variables) and random error.
A number of techniques are used to control the effects of extraneous variables, which are irrelevant to the research hypothesis but correlate with the _____ variable.
dependent Extraneous (Confounding) Variable – source of systematic error; variable that is irrelevant to the purpose of the research study, confounds its results because it has a systematic effect (correlates with) the DV.
Randomization, or the random _____ of subject to different levels of the independent variable, is considered the most powerful method of control because it helps ensure that groups are initially _____ with regard to all known and unknown extraneous variables.
assignment; equivalent Random Assignment of Subjects to Treatment Groups (Randomization) ·Equalizes the effects of extraneous variables ·Most “powerful” method of experimental control ·Primary characteristic of “true experimental research”
Matching is useful for controlling an extraneous variable when the number of subjects is too _____ to guarantee that random assignment will equalize the groups in terms of an extraneous variable.
small; Matching Subjects on the Extraneous Variable (Matching) ·Match subjects in terms of their status on that variable ·Randomly assign match subjects to one of the treatment groups ·Useful when = sample size is too small to guarantee that random assignment will equalize the groups with regard to the effects of the extraneous variable
Blocking is similar to matching except that subjects are not individually matched but are _____ in terms of their status on the extraneous variable, and subjects within each _____ are randomly assigned to one of the treatment groups.
blocked (grouped); block Building the Extraneous Variable into the Study (Blocking) ·Include extraneous variable as IV so that its effects on the DV may be statistically analyzed ·Subjects are grouped (blocked) on the basis of their status on the extraneous variable ·Subjects are then randomly assigned to one of the treatment groups
The ANCOVA or other statistical technique can be used to statistically _____ the effects of an extraneous variable.
remove, Statistical Control of the Extraneous Variable - ANCOVA (Analysis of Covariance) or other statistical technique to remove variability (equalizing all subjects with regard to their status on that variable) in the DV.
When a study has _____ validity, the experimenter can conclude that observations in the dependent variable were caused by variations in the independent variable rather than by other factors: a. internal b. external c. face
a. internal; a study has internal validity when it allows an investigator to determine if there is a causal relationship between independent and dependent variables.
Internal validity is threatened when the investigator cannot:
a. Control the effects of the IV
b. Control the effects of extraneous variables
c. Minimize the effects of random error
d. all of the above
d.
_____ refers to an external event that is irrelevant to the research hypothesis but that occurs duing the course of a study and affects the subjects' status on the dependent variable: a. history b. maturation
a. history; History – external event systematically affects the status of subjects on the DV. History = a significant event that effects people.
_____ refers to changes that occur within subjects during the course of a study as the result of the passage of time and that have a systematic effect on the DV: a. history b. maturation
b. maturation; Maturation – any biological or psychological change that occurs within subjects during the course of a study as a function of time and is not relevant to the research hypothesis (e.g., fatigue, boredom, hunger, physical growth, intellectual growth). Maturation = the emergence of personal and behavioral characteristics through growth processes.
Statistical _____ is the tendency for very high and low scores to move toward the mean on retesting.
regression; Statistical Regression – tendency of extreme scores on a measure to “regress” or move toward the mean when the measure is readministered to the same group of people. Statistical regression threatens a study’s internal validity whenever subjects have been selected because of their extreme status on the dependent variable.
Statistical regression threatens a study's internal validity whenever subjects are selected to participate in the study because of their extreme scores on the _____ variable measure
dependent
_____ is a problem when subjects in different treatment groups are not similar in terms of important characteristics at the onset of the study.
selection; Selection – method used to assign subjects results in systematic differences between the groups at the beginning of the study (e.g., forced to use intact groups).
_____ limits the study's internal validity when subjects who drop out of the study differ in some important way from subjects who remain in the study for its duration.
attrition
_____ can interact with history and threaten a study's internal validity if one group of subjects is exposed to an external condition that does not affect subjects in other groups.
selection; Interactions With Selection – there would be an interaction between selection and history, for example, when one group of subjects is unintentionally exposed to an external event that does not affect subject in other groups.
To control for maturation, the experimenter could: a. Include more than one group and randomly assign subjects to groups b. Use single-group time-series design c. Include more than one group d. both a. and b.
d. both a. and b.; Maturation – any biological or psychological change that occurs within subjects during the course of a study as a function of time and is not relevant to the research hypothesis (e.g., fatigue, boredom, hunger, physical growth, intellectual growth). Maturation = the emergence of personal and behavioral characteristics through growth processes. Control: ·Include more than one group and randomly assign subjects to groups ·Use single-group time-series design
To control for history, the experimenter would: a. Design measure in a way that minimizes memory and practice effects b. Include more than one group c. Randomly assign subjects to groups d. both b. and c.
History – external event systematically affects the status of subjects on the DV. History = a significant event that effects people. Control: ·Include more than one group – history is more problematic when study includes only one group and the event occurs at the same time that the independent variable is applied. ·Randomly assign subjects to groups
To control for the effects of testing, the experimenter would: a. Administer the DV measure only once b. Design measure in a way that minimizes memory and practice effects c. Include at least two groups in study d. all of the above
d. all of the above Testing – exposure to a test might later the subjects’ performance on subsequent tests (tests that are readministered) Control: ·Administer the DV measure only once ·Design measure in a way that minimizes memory and practice effects ·Include at least two groups in study
To control for the effects of instrumentation, the experimenter could: a. Include more than one group in study b. use the same measuring devices and procedures with all subjects c. both a. and b.
c. both a. and b. Instrumentation – changes in the accuracy or sensitivity of measuring devices or procedures. (e.g., rater’s accuracy improves over time) Control: ·Include more than one group in study ·Ensure that all groups are subject to the same instrumentation effects by using the same measuring devices and procedures with all subjects
To control for statistical regression, you could: a. include only extreme scores b. NOT include extreme scores c. include another group that consists of subjects who are similarly extreme. d. Either b. or c.
d. Either b. or c. Statistical Regression – tendency of extreme scores on a measure to “regress” or move toward the mean when the measure is readministered to the same group of people. Statistical regression threatens a study’s internal validity whenever subjects have been selected because of their extreme status on the dependent variable. Control: ·NOT including only extreme scorers in the study ·Include more than one group and ensure that all groups consist of subjects who are similarly extreme.
To control from selection affects, you could: a. Randomly assigning subjects to groups b. Administering pretest to subjects to determine if the groups differ initially with regard to the DV c. either a. or b.
c. either a. or b. Selection – method used to assign subjects results in systematic differences between the groups at the beginning of the study (e.g., forced to use intact groups). Use of the term selection here is somewhat misleading as we are really talking about “assignment”. Control: ·Randomly assigning subjects to groups, or when not possible, ·Administering pretest to subjects to determine if the groups differ initially with regard to the DV
True or false. Pretest can help with attrition rates as they can determine if dropouts and non-dropouts differ with regard to their initial status on the DV.
Attrition (Mortality) – subjects who drop out of one group differ in an important way from subjects who drop out of other groups. Control: · Pretest can help determine if dropouts and non-dropouts differ with regard to their initial status on the DV.
When blocking is used to control an extraneous variable, the extraneous variable is treated as an independent variable, and its effects on the _____ variable are statistically analyzed.
dependent; blocking is a method used to control an extraneous variable when an investigator wants to statistically analyze its main and interaction effects on the DV. Involves blocking (grouping)subjects with regard to their status on the extraneous variable and then randomly assigning subjects in each block to one of the treatment groups.
In a research study, variability in the dependent variable that is attributable to the _____ variable is referred to as "experimental variability."
independent variable
External validity refers to the _____ of the results of a research study.
generalizability
The random selection of subjects for a research study is most useful for maximizing a study's _____ validity, while random assignment of subjects to treatment groups is most important for ensuring that the study has adequate _____ validity.
external; internal
When using the analysis of covariance, the "covariate": a. is treated as an independent variable b. is an extraneous variable c. is the dependent variable
b. is an extraneous variable; Statistical Control of the Extraneous Variable ANCOVA (Analysis of Covariance) or other statistical technique to remove variability (equalizing all subjects with regard to their status on that variable) in the DV.
Extraneous variables: a. correlate with the DV b. correlate with the IV c. correlate with the DV and the IV
a. correlate with the DV
Which of the following is an example of demand characteristics: a. an experimenter double-checks his data whenever it doesn't conform to the research hypothesis b. subjects alter their behaviors in ways that help them avoid negative evaluations by the experiment c. subtle cues in the environment communicate to subjects what behaviors are expected of them
c. demand characteristics are cues in the experimental situation that inform research participants of how they are expected to behave during the course of the study. Demand characteristics can threaten a study's internal and external validity.
Counterbalancing is used to control: a. order effects b. statistical regression c. demand characteristics
a. order effects; Multiple Treatment Interference (Order Effects, Carryover Effects) – when a study involves exposing each subject to two or more levels of an independent variable (i.e., when the study utilizes within-subjects design) the effects of one level of the independent variable can be affected by previous exposure to another level. Control: ·Counterbalanced design – different subjects (or groups of subjects) receive the levels of the IV in a different order (e.g., Latin Square Design).
A psychologist evaluates the effects of a 15-month training program on the conservation skills of preoperational children by administering a measure of conservation to the same group of children before and at the end of the training. The psychologist finds that a significantly greater number of children conserve after the program than before. The biggest threat to this study's internal validity is: a. maturation b. history c. selection
a. Maturation reflects changes that occur within subjects as the result of the passage of time.
Dr. Dogood includes only students who have very low GPAs in her study that is designed to test the hypothesis that a motivational training course will improve academic achievement. The biggest threat to this study's internal validity is: a. reactivity b. statistical regression c. maturation
b. Statistical Regression – tendency of extreme scores on a measure o “regress” or move toward the mean when the measure is readministered to the same group of people. Statistical regression threatens a study’s internal validity whenever subjects have been selected because of their extreme status on the dependent variable. Control: ·NOT including only extreme scorers in the study ·Include more than one group and ensure that all groups consist of subjects who are similarly extreme.
An experimenter compares the effects of three different diets on weight loss by assessing overweight subjects either to Diet A, Diet B, or Diet C and then determining each subject's weight one week, six weeks, and three months after beginning the diet. This study is an example of which type of research design: a. between groups b. within subjects c. mixed
c. mixed designs are research designs in which both between-groups and within-subjects comparisons can be made.
When a study has both main and interaction effects: a. the main effects take precedence over the interaction effects b. the main effects should be interpreted in light of the interaction effects c. the main and interaction effects should be interpreted seperately
b.
Use of an ABAB design involves: a. applying two different treatments to subjects at two different times b. applying a single treatment to subjects at two different times c. applying a single treatment to two different behaviors
b.
The single-subject AB design is most similar to which of the following group designs: a. counterbalanced b. one-group time-series c. factorial
b.
A factorial design includes two or more _____.
independent variables; the name given to any research design that includes two or more "factors" (IVs). Factorial designs permit analysis of main and interaction effects. (An interaction occurs when the impact of one IV differs at different levels of another variable.)
When using a multiple baseline design, a treatment is _____ applied to the different baselines.
sequentially; A single-subject design that involves sequentially applyling a treatment to different "baselines" (e.g., to different behaviors, settings, or subjects). Useful when a reversal design would be impracticle or unethical.
Single-subject research designs always include at least one _____ (A) phase and at least one _____ (B) phase.
baseline (no treatment); treatment
An experimenter conducts a study to investigate the effects of task complexity and motivation on performance and obtains the following mean scores on a measure of task performance: H Motivation L Motivation Simple Task 50 35 Complex Task 10 25 Based on this data (and assuming that there are the same number of subjects in each group), you can tentatively conclude that there is: a. a main effect of task complexity b. a main effect of task complexity and a main effect of motivation c. a main effect of task complexity and an interaction
c.
WAIS Scores are composed on a: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
c.
Minutes to complete a task, consist of a: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
d. ratio scale
Number of siblings consists of a: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
d.
Ranking of peers in terms of popularity would occur on which of the following scales: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
b.
College level (freshman, sophomore, junior, senior) would occur on which of the following scale: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
b.
Althletes' numbers would occur on which of the following scales: a. nominal scale b. ordinal scale c. interval scale d. ratio scale
a.
The assumption of equal intervals between successive points on a measurement scale is characteristic of: a. ordinal, interval, and ratio scales b. interval and ratio scales c. ratio scales only
b.
When using a(n) _____ scale of measurement, a score of zero indicates that the person has none of the characteristics being measured: a. ordinal, interval, or ratio b. interval or ratio c. ratio
c.
When a study's dependent variable is measured on a(n) _____ scale, a researcher does not have scale values or scores to analyze but can only compare frequencies.
nominal
If you can conclude, on the basis of their test scores, that Keisha has twice as much of a characteristic as Kali, the test scores represent a(n) _____ scale of measurement.
ratio
Which of the following describes the relationship between the variance and the standard deviation: a. the variance is twice the size of the standard deviation. b. the variance is the square root of the standard deviation c. the variance is the square of the standard deviation
c.
A teacher administers a test of reading achievement to a 4th grade class. An inspection of the distribution of scores indicates that there are very few high scores but many low scores. If the teacher is most interested in impressing the administration with how well her students are doing, she will report which of the following: a. mean b. median c. mode
a.mean. The outcome for the teacher's class' scores indicates a positively skewed distribution (most of the scores are on the low end). In positively skewed distributions, the mean is greater than the median, which, in turn, is greater than the mode. Pos skew = mo, md, m, (from lowest to highest) Neg skew = m, md, mo (from lowest to highest)
The test scores of a group of 35 students is fairly evenly distributed throughout the range of possible scores. The distribution is best described as: a. mesokurtic b. platykurtic c. leptokurtic
b. platykurtic refers to a "flatter" distribution. "Kurtosis" refers to the relative peakedness (height or flatness) of a distribution: when a distribution is more "peaked" than the normal distribution, it is referred to as "leptokurtic"; when a distribution is flatter, it is called "platykurtic" and a normal curve is "mesokurtic".
If the dependent variable in a research study is college major, the _____ is the appropriate measure of central tendency.
mode
In a normal distribution, approximately ____ % of observations fall between the scores that are plus and minus one standard deviation from the mean.
68
In a _____ skewed distribution, the median is greater (has a higher value) than the mean.
negatively
In the population, an IQ test has a mean of 100 and a standard deviation of 12, and scores on the test are normally distributed. Consequently, it is possible to conclude that about _____& of people have scores between 76 and 124.
95%
In a normal distribution, _____ % of scores fall below the mean and about _____ % of scores fall below the score that is one standard deviation above the mean.
50; 84
A reading test is to be used to select students whose scores are in the bottom 16% in order to provide them with appropriate remedial instruction. If scores are normally distributed, and the distribution's mean is 100 and its standard deviation is 10, the cutoff score should be set at _____.
90 (one standard deviation below the mean)
True or False. The probability of incorrectly retaining the null hypothesis is equal to alpha.
False. This describes beta (type II error) in which a false null hypothesis is retained.
True or False. Power refers to the probability of correctly rejecting a false null hypothesis.
True.
The standard error of the mean is the _____ of the sampling distribution of the mean.
standard deviation
In the population, a test has a mean of 150 and standard deviation of 25. If the research study in which the test will serve as the DV measure includes 100 subjects, the standard error of the mean is equal to _____.
2.5
When alpha is increased from .01 to .05, the probability of making a Type II error _____ and power _____.
decreases; increases
Based on the results of his study, a researcher concludes that a workshop did not improve participants' test performance when, in fact, the workshop did improve performance but the improvement was not detected due to the unreliability of the dependent variable measure. The researcher has made a Type _____ error.
II
Accoring to the Central Limit Theorem, the shape of the sampling distribution of means: a. is normal only when the population distribution is normal b. approaches normal as the number of samples increases regardless of the shape of the population distribution c. approaches normal as the size of the sample increases regardless of the shape of the population distribution
c.
A researcher has analyzed the difference between mean posttest scores of experimental and control group subjects. The researcher will be MOST confident that his decision to reject the null hypothesis is correct if the results of his study are significant at: a. the .10 level b. the .05 level c. the .01 level
c.
Less power means:
a. an increased chance of retaining a false null hypothesis
b. a decreased chance of rejecting a true null hypothesis
c. an increased chance of achieving statistical significance
a.
The mean of a theoretical sampling distribution of the mean is equal to: a. zero b. the sample mean c. the population mean
c.
A researcher conducts a study to test the hypothesis that level of conflict (high, moderate, or low) and participation in a communication skills workshop (yes or no) predicts a couple's relationship status one year later (together or seperated). This study has: a. one IV and two nominal DVs b. two IVs and one ratio DV c. two IVs and one nominal DV
c.
The researcher in the above study would use which of the following tests to analyze the data she has collected: a. Student's t-test b. chi-square test c. analysis of variance
b.
Dr. V. T. Min is interested in comparing the reaction time (in seconds) of two groups of men. One group has been placed on a nutritional supplement for six months: the other group has received no supplement. This study has: a. one IV and one ratio DV b. one IV and one nominal DV c. one variable
a.
The appropriate statistical test for the data Dr. Min has collected is: a. multiple-sample chi-square test b. Mann-Whitney U Test c. t-test for independent samples
c.
An experimenter wants to assess the effectiveness of a training course for improving SAT scores by comparing the pretest and posttest scores of a group of high school students. To analyze the data obtained in this study, the expermenter should use which statistical test: a. two-way ANOVA b. t-test for single samples c. t-test for related samples
c. t-test for related samples
You have collected scores on a measure of cognitive functioning from patients who have and have not received a diagnosis of schizophrenia and whose families have been classified as either high, moderate, or low in "expressed emotion." To analyze the data you have collected, you will use the: a. factorial ANOVA b. one-way ANOVA c. multiple-sample chi-square test
a. factorial ANOVA
The Wicoxon test can be considered a "nonparametric alternative" to the t-test for _____ .
correlated (related) samples
A researcher will use trend analysis when her study's _____ is quantitative.
independent variable
A psychologist uses a t-test to analyze the data he has obtained from a single-group pretest-posttest design that included 29 subjects. The degrees of freedom are _____.
28 - 1 = 27
The numerator of the F-ratio is a measure of variability due to _____ and the denominator is a measure of variability due to _____ .
treatemnt and error; error
Parametric and nonparametric tests share in common which of the following assumptions: a. random assignment of samples to groups b. random selection of the sample from the population c. a normally-shaped distribution of DV scores in the population.
b. random selection of the sample from the population
Dr. Frugal decides to use the MANOVA rather than separate ANOVAs to analyze the data he has collected. Most likely, this is because Dr. Frugal wants to:
a. statistically remove the effects of systematic error
b. statistically analyze both main and interaction effects
c. increase statistical power
c. increase statistical power
A chi-square test would not be the appropriate statistical test in which of the following situations: a. the population distribution is non-normal b. subjects can appear in more than one category c. a quasi-experimental design has been used
b. subjects can appear in more than one category
An experimenter would decide to use a one-way ANOVA instead of seperate t-tests to analyze the data she has collected in a study involving one IV with four levels because: a. she wants to reduce the Type I error rate. b. she wants to control systematic error c. her study includes more than one DV
a. she wants to reduce the Type I error rate.
A _____ (positive/negative) correlation indicates that people scoring low on one variable tend to obtain high scores on another vaiable.
negative
When both variables are reported in terms of ranks, the appropriate correlation coefficient is the _____.
Spearman rho
Which of the following would be the best correlation coefficient when x is cigarette use (smoker vs. non-smoker) and Y is the number of car accidents: a. Spearman rho b. point biserial c. contingency
b. point biserial
The "least squares criterion" is used to: a. determine the optimal location for the "line of best fit" b. statistically "partial out" the effects of a third variable c. identify the criterion group that an examinee most closely resembles.
a. determine the optimal location for the "line of best fit"
To measure the degree of association between two variables when their relationship is known to be curvilinear, you should use: a. eta b. phi c. biserial
a. eta
A correlation of 0 between X and Y is suggested by a scattergram when: a. the variability of Y scores is the same at all values of X b. the variability of Y scores at all values of X is equal to the total variabilty of Y scores c. the variability of Y scores is less than the variablity of X scores at all values of X
b. the variability of Y scores at all values of X is equal to the total variabilty of Y scores
Multicollinearity: a. increases the probability that a correlation coefficient will be statistically significant b. refers to high correlations between predictors and is a problem in multiple regression c. refers to high correlations between each predictor and the criterion and is desirable in multiple regression
b. refers to high correlations between predictors and is a problem in multiple regression
A psychologist wants to use attitude toward the company, years of experience, and need for achievement to predict whether a job applicant is likely to be a "successful manager" or an "unsuccessful manager." The psychologist knows there is a nonlinear relationship bewteen need for achievement and success, with a moderate need for achievement being characteristic of successful managers and a low and high need for achievement being more characteristic of unsuccessful managers. The correct multivariate technique for this situation is: a. canonical correlation b. discriminant analysis c. logistic regression
c. logistic regression
Path analysis is used to: a. test a theory of causal order among a set of variables b. develop a causal model involving multiple variables c. identify causal antecedents
a. test a theory of causal order among a set of variables
If the correlation bewteen X and Y is .70 this means that _____ percent of the variability in Y is explained by variablity in X.
49
If the Pearson r is used to correlate two variables and have a curvilinear relationship, the correlation coefficient is likely to _____ (overestimate/underestimate) their true relationship.
underestimate
For your original sample, R-squared equals .64. When you cross-validate on another sample, R-squared is likely to _____ .
smaller (less than .64)
A school psychologist wants to determine if there is a significant difference in reading readiness scores between male and female students in the school's preschool program. she obtains scores on a standardized reading readiness test for 17 girls and 13 boys. Which statistical test will be most appropriate for determining if there is a significant difference between the scores obtained by boys and girls: a. two-way ANOVA b. Student's t-test c. Kolmogorov test d. chi-square test for contingency tables
b. A t-test (a.k.a. Student's t-test) is used to compare the mean scores obtained by two groups. a. INCORRECT - the two-way ANOVA is used when a study involves two independent variables. In this study, there is one IV (gender). c. INCORRECT - the Kolmogorov test is used iwth a single sample and ordinal data d. INCORRECT - the chi-square test is used to analyze frequency (nominal) data.
A multiple regression equation yields a predicted criterion score for an examinee based on the examinee's scores on the predictors included in a test battery. when computing a multiple regression equation, each test is weighted: a. in direct proportion to its correlation with the criterion and in inverse proportion to its correlation with the other predictors in the test battery b. in inverse proportion to its correlation with the criterion and in direct proportion to its correlation with the other predictors in the test battery c. in direct proportion to its correlation iwth the criterion and with the other predictors in the test battery d. in inverse proportion to its correlation with the criterion and with the other predictors in the test battery.
a. by computing the multiple regression equation so that each test is weighted in direct proportion to its correlation with the criterion and in inverse proportion to its correlation with other tests, the test with the highest criterion-related validity and the least amount of overlap (correlation) with the other tests will be given the largest weight, while the test with the lowest criterion-related validity and the most overlap with other tests will be given the smallest weight.
During the course of the data analysis, a researcher more often double-checks results that seem to conflict with her hypothesis than results that confirm it. This is an example of: a. the experimenter expectancy effect b. demand characteristics c. the Pygmalion Effect d. a correspondence bias
a. Experimenter expectancy (bias) occurs when the experimenter's behavior biases the research results in some (usually unconscious) way so that the results are consistent with the research hypothesis. b. INCORRECT - Demand characteristics are cues in the research situation that communicate to subjects what behaviors are expected of them. Experimenter expectencies can act as a source of demand characteristics (although that wouldn't be the case in this situation). c. INCORRECT - The Pygmalion Effect (aka the self-fulfilling or Rosenthal effect) occurs when a person's expectations about another individual actually produce subtle changes in the individual's behavior so that the behavior conforms to the person's expectations. d. INCORRECT - Correspondence bias is another name for the fundamental attribution bias, which is the tendency for observers to attribute another person's behavior to dispositional (rather than situational) factors.
If your statistical test has low "power," this also means that:
a. there is low probability of making a Type II error
b. there is high probability of making a Type I error
c. you will not likely obtain statistically significant results
d. you have set the level of significance too high
c. although it will be more difficult to reject the null hypothesis, it would not be impossible. a. INCORRECT - power is equal to one minus beta, where beta is equal to the probability of making a Type II error (of retaining a false null hypothesis). Thus, there is an inverse relationship between power and a Type II error, and if there is low power, there's a high probability that a Type II error will be made. b. INCORRECT - The easier it is to make a Type II error, the more difficult it is to make a Type I error. As noted above, when there is low power, there is a high probability of making a Type II error. Consequently, there is a low probability of making a Type I error. d. INCORRECT - One way to increase power is to increase the level of significance since this has the effect of increasing the rejection region. Thus, if you have low power, you are more likely to have set the level of significance too low.
When conducting a research study, you want to ensure that you will detect a difference between the treatment group and the control group. Therefore you wil:
a. decrease error variance by decreasing the magnitude of the independent variable
b. increase experimental variance by controlling the effects of extraneous variable
c. increase experimental variance by increasing the magnitide of the IV
d. decreasing the probability of making a Type II error (retaining a false null hypothesis) by increasing beta
c.
How do you determine Relevance in testing?
CONTENT APPROPRIATNESS, EXTRANEOUS ABILITIES (to what extent does the item require knowledge, skills, or abilities outside the domain of interest), AND TAXONOMIC LEVEL (item reflects the approp cogn or abiity level)
WHAT ARE THE SHORTCOMINGS OF THE CLASSICAL TEST THEORY?
1. Item & Test Parameters are SAMPLE-DEPENDENT, that is the ITEM DIFFICULTY INDEX AND the RELIABILITY COEFFICIENT are likely to VARY from sample to sample. 2. It's DIFFICULT to EQUATE SCORES OBTAINED on different tests that have been developed on the basis of CLASSICAL TEST THEORY. eg. 50 on Math Test does not equate to 50 on English test.
ITEM RESPONSE THEORY (IRT) VS. CLASICAL TEST THEORY
IRT ("LATENT TRAIT APPROACH") ADVANTAGES: 1. The item characteristics (parameters) are SAMPLE INVARIANT (the SAME accross different samples). 2. MEASURES SPECIFIC THINGS FOR EXAMPLE AN EXAMINEE'S LEVEL ON A TRAIT BEING MEASURED RATHER THAN JUST A TOTAL SCORE, IT IS POSSIBLE TO EQUATE SCORES FROM DIFFERENT SETS OF ITEMS AND FROM DIFFERENT TESTS. (MEASURES AN INDIVIDUAL'S: status on a latent trait or ability) 3. EASIER TO DEVELOP COMPUTER-ADAPTIVE TETS, IN WHICH THE ADMINISTRATION OF SUBSEQUENT ITEMS IS BASED ON THE EXAMINEE'S PERFORMANCE ON PREVIOUS ITEMS.
ITEM CHARACTERISTICS CURVE (ICC)
-Info the rel. bet an EXAMINEE'S LEVEL on the ABILTY OR TRAIT measured by the test and the PROBABILITY that he or she will RESPOND to the item correctly. ICC INDICATES: -THE DIFFICULTY LEVEL (position of the curve) -DISCRIMINATION (steepness of slope) -PROBABILTY OF GUESSING CORRECTLY (point at which the curve intercepts the vertical axis)
METHODS FOR ASSESSING RELIABILITY
1. TEST-RETEST reliability(Coefficient of STABILITY) 2. ALTERNATE/PARALLELL FORMS (Coefficient of EQUIVALENCE)reliability 3. INTERNAL CONSISTENCY (Coefficient of INTERNAL CONSISTENCY) reliability -SPLIT-HALF (SPEARMAN-BROWN) -COEFFICENT ALPHA (KR-20) (remember that split-half and other forms of internal consistency reliability overestimate the reliability of a speed test) (Internal consistency reliability is generally not used to assess the reliability of speed tests because it produces a spuriously high reliability coefficient) 4.INTER-RATER (KAPPA STAT)
ALTERNATIVE (EQUIVALENT, PARRELLEL) FORMS RELIABILITY
-ASSESS CONSISTENCY OF RESPONDING TO DIFF ITEM SAMPLES (eg diff test forms) -ALTERNATIVE FORMS RELIABILITY ENTAILS ADMINISTERING 2 FORMS OF THE TEST TO THE SAME GROUP OF EXAMINEES AND CORRELATING THE 2 SETS OF SCORES. -BEST METHOD OF ALL, MOST THOROUGH METHOD -AKA: Coefficient Of Equivalence
FACTORS THAT AFFECT THE RELIABILTY COEFFICIENT
1. TEST LENGTH: LARGER SAMPLE IS IDEAL OR TEST LENGTH THE LARGER THE TEST'S RELIABILITY COEFFICIENT. 2. RANGE OF TEST SCORES: UNRESTRICTED RANGE IS IDEAL (heterogeneous examiness is also ideal, also difficulty level is in the mid-range or p = .50) 3. GUESSING: IF TEST TAKERS PROBABILITY OF GUESING INCREASES, THE RELIABILITY COEFFICIENT DECREASES.
METHOD FOR ASSESSING CONSTRUCT VALIDITY: CONVERGENT VALIDITY
HIGH CORRELATIONS WITH MEASURES OF THE SAME CHARACTERISTICS/TRAIT. Note: Evidence of Convergent Validity is found when the MONOTRAIT-HETEROMETHOD COEFFECIENT IS LARGE.
DISCRIMINANT (DIVERGENT) VALIDITY
LOW CORRELATIONS WITH MEASURES OF UNRELATED CHARACTERISTICS/TRAITS. Note: Evidence of Discriminant validity is found when the Heterotrait-Monomethod Coeffecient is SMALL.
What is the MULTITRAIT-MULTIMETHOD MATRIX used for?
-Assess a test's CONVERGENT & DISCRIMINANT validity. -4 types of correlation coefficients.
Is FACE VALIDITY (FV) real? What happens if there is no FV?
NO. It is not an actual type of VALIDITY. Is FV real? But if there is no FV, examinees may not answer honestly. What happens if there is no FV?
What is CONSTRUCT VALIDITY?
When a test measures the HYPOTHETICAL TRAIT or CONSTRUCT it supposed to measure. WHAT IS CONSTRUCT VALIDITY?
What are the various ways to establish CONSTRUCT VALIDITY?
1. Assessing the test's INTERNAL CONSISTENCY 2. STUDY GROUP DIFFERENCES 3. CONDUCTING RESEARCH & test HO 4. Assess the test's CONVERGENT & DISCRIMINANT VALIDITY 5. ASSESSING THE TEST'S FACTORIAL VALIDITY What are the various ways to establish CONSTRUCT VALIDITY?
Describe CONVERGENT VALIDITY.
HIGH CORRELATIONS with measures of the SAME TRAIT. A description of CONVERGENT VALIDITY.
HOW DO YOU ASSESS CONVERGENT AND DISCRIMINANT VALIDITY?
MULTITRAIT-MULTIMETHOD MATRIX
MONTRAIT-MONOMETHOD COEFFICIENTS
~Indicates the correlation between a MEASURE & ITSELF ~SHOULD BE LARGE
MONOTRAIT-HETEROMETHOD COEFFICIENTS
~Indicates the correlation between DIFFERENT MEASURES of the SAME TRAIT ~Provides evidence of *CONVERGENT VALIDITY ~SHOULD BE LARGE
HETEROTRAIT-MONOMETHOD COEFFICIENTS
~The correlation between DIFFERENT TRAITS that have been measured by the SAME METHOD ~results in DISCRIMANT VALIDITY ~NEEDS TO BE SMALL
HETEROTRAIT-HETEROMETHOD COEFFICIENTS
~Indicates the correlation between DIFFERENT TRAITS that have been measured by DIFFERENT METHODS ~ Results in DISCRIMINANT VALIDITY ~NEEDS TO BE SMALL
DISCRIMINANT VALIDITY
SMALL HETEROTRAIT-MONOMETHOD AND SMALL HETEROTRAIT-HETEROMETHOD COEFFICIENT
5 STEPS OF FACTOR ANALYSIS
1. ADMINISTER SEVERAL TESTS TO A GROUP OF EXAMINEES. 2. CORRELATE SCORES ON EACH TEST WITH SCORES ON EVERY OTHER TEST TO OBTAIN A CORRELATION (R) MATRIX 3. USING ONE OF SEVERAL AVAILABLE FACTOR ANALYTIC TECHNIQUES, CONVERT THE CORRELATION MATRIX TO A FACTOR MATRIX. 4. SIMPIFY THE INTERPRETATION OF THE FACTORS BY "ROTATING" THEM. 5. INTERPRET AND NAME THE FACTORS IN THE ROTATED FACTOR MATRIX.
FACTOR LOADINGS
-Correlation Coefficents that indicate the DEGREE of ASSOCIATION Between each TEST & EACH FACTOR. -A factor loading can be interpreted by SQUARING it to obtain a measure of shared variability. When the factor loading is .70, this means that 49% (.70 squared) of variability in the test is accounted for by the factor.
WHEN YOU SQUARE A FACTOR LOADING WHAT TYPE OF MEASURE DOES IT PROVIDE?
A MEASURE OF "SHARED VARIABILITY" SQUARING A FACTOR LOADING
2 TYPES OF CRITERION-RELATED VALIDITY
1. CONCURRENT VALIDITY 2. PREDICTIVE VALIDITY
2 TYPES OF CONSTRUCT VALIDITY
1. CONVERGENT VALIDITY 2. DISCRIMINANT (DIVERGENT) VALIDITY
CONCURRENT VALIDITY
WHEN CRITERION DATA ARE CORRELATED PRIOR TO OR AT ABOUT THE SAME TIME AS DATA ON THE PREDICTOR, THE PREDICTOR'S CONCURRENT VALIDITY IS BEING ASSESSED. Concurrent validity would be how does the WISC correlate with the WJ Cog., the Stan. Bin, etc. Predictive would be how the WISC correlates with academic achievement (e.g., WIAT, WRAT), job performance, etc.
When is PREDICTIVE VALIDITY EVALUATED?
When the criterion is measured some time AFTER the predictor has been administered.
STANDARD ERROR OF ESTIMATE
USED TO CONSTRUCT A CONFIDENCE INTERVAL AROUND AN ESTIMATED (PREDICTED) SCORE.
WHY WOULD YOU WANT INCREMENTAL VALIDITY? WHAT DO YOU NEED TO ESTIMATE A TEST'S INCREMENTAL VALIDITY?
-IT INCREASES DECISION-MAKING ACCURACY. -The selection ratio, base rate, and validity coefficient are used to estimate a test's incremental validity using the Taylor-Russell tables. INCREMENTAL VALIDITY = + HIT RATE - BASE RATE
HOW DO YOU ESTIMATE INCREMENTAL VALIDITY?
INCREMENTAL VALIDITY = + HIT RATE - BASE RATE
What is the BASE RATE FORMULA?
BASE RATE = TRUE POSITIVES + FALSE NEGATIVES DIVIDED BY TOTAL NUMBER OF PEOPLE BASE RATE FORMULA
What is the + HIT RATE FORMULA?
+ HIT RATE = TRUE POSITIVES DIVIDED BY TOTAL POSITIVES + HIT RATE FORMULA
What is the PREDICTOR?
- (X, IV) - DETERMINES IF A PERSON IS POSITIVE OR NEGATIVE PREDICTOR
What is the CRITERION ?
- (Y, DV) - DETERMINES IF HE/SHE IS A "TRUE" OR A "FALSE" CRITERION
RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY
LOW RELIABILITY = NO HIGH DEGREE OF CONTENT, CONSTRUCT, OR CRITERION-RELATED VALIDITY HIGH RELIABILILTY = DOES NOT GUARANTEE VALIDTY
CROSS-VALIDATION
THE CROSS-VALIDATION COEFFICIENT TENDS TO "SHRINK" OR BE SMALLER THAN THE ORIGINAL COEFFICIENT. THE SMALLER THE INITIAL VALIDATION SAMPLE, THE GREATER THE SHRINKAGE OF THE VALIDITY COEFFICIENT WHEN THE PREDICTOR IS CROSS-VALIDATED.
STANDARD SCORES
AN EXAMINEE'S POSITION IN THE NORMATIVE SAMPLE IN TERMS OF STANDARD DEVIATIONS FROM THE MEAN. ZSCORES; MEAN IS EQUAL TO 0, THE SD IS EQUAL TO 1, TSCORES; MEAN OF 50; SD OF 10 (eg 97% of scores fall below the score that is two standard deviations above the mean.)
PERCENTAGE SCORES
-Indicates how much of the test content(s) the examinee MASTERED. NOTE: When the goal of testing is to determine the amount of content an individual has mastered, criterion-referenced (or content-referenced) scores are most useful.
Lords Chi Square is used to:
-A way to evaluate the DIFFERENTIAL ITEM FUNCTIONING (DIF) of an item included in a test. (DIF AKA ITEM BIAS) -Occurs when one group responds differently to an item than another group even though both groups have similar levels of the latent trait (attribute) measured by the test. -Several statistical techniques are used to evaluate DIF. Lord’s chi-square is one of these techniques.
EIGEN VALUES ARE ASSOCIATED WITH
PRINCIPLE COMPONENT ANALYSIS Eigenvalues can be calculated for each component "extracted" in a principal component analysis.
IPSATIVE SCORES
TELLS YOU THE RELATIVE STRENGTHS OF THE DIFFERENT CHARACTERISTICS MEASURED BY A TEST.