Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
63 Cards in this Set
- Front
- Back
Goal of multiple regression
|
Explain criterion variance
|
|
Applications of multiple regression
|
1. Assess independent effects of particular predictors controlling for others
2. Compare sets of variables (which combination leads to best model?) 3. Test causal models 4. Test nonlinear relations |
|
Level of measurement required for regression
|
Criterion must be continuous
Predictors can be continuous, dichotomous, or categorical |
|
Equations
|
Ŷ = a + B1X1+B2X2
zy=β1(zx1)+β2(zx2) |
|
R^2
|
proportion of DV's variance shared with IV's
|
|
Sums of squares total
|
total variability in system
SS total = ∑(y-Mean of y )^2 |
|
Sums of squares regression
|
variability accounted for by model
SS regression = ∑(ŷ-Mean of y )^2 |
|
Sums of squares residual
|
variability model could not account for
SS residual=∑(y-ŷ)^2 |
|
Partial correlations
|
Correlation between y and a given predictor (x1) from which all other predictors have been partialed from x1 and y
|
|
Zero-order correlations
|
bivariate correlation b/w x1 and y and x2 and y
|
|
Semi-partial/part correlations
|
correlation between y and a given predictor (x1) from which all other predictors have been partialed from x1. Relationship b/w all of y and variance unique to x1
Square semi-partials, variance in y that can be explained by unique predictors. |
|
Adjusted R2
|
sampling error inflates R, must adjust for this bias (function of sampling error, number of predictors)
|
|
Standard error of estimate
|
magnitude of residuals. degree to which our model improved predictive accuracy, compared to SDy, lower = better
|
|
Beta Weights
|
control for other predictors/relationships in the system.
Standard error: SD, how much it varies across cases |
|
Suppression
|
variable increases the predictive validity of another variable by its inclusion in the regression equation. X2 suppresses the variance in X1 that is irrelevant to Y. Partial regression coefficients are larger than the zero-order correlation.
|
|
Assumptions of multiple regression
|
•Form of relation is correctly specified (linear relationships)
•All relevant variables have been included as predictors •All variables are measured without error •Homoscedasticity •Residuals are independent •Residuals are normally distributed |
|
Forward Entry
|
goes through and finds best predictor, i.e. x1, pulls out variance. Can I add any other predictor of the remaining to explain more variance? Continue until you can no longer explain more variance in y. Builds a model, add predictors as long as we are explaining significant additional variance.
•Output: shows step and which variables were entered at each step. |
|
Backward Entry
|
put all in, remove those that explain least variance.
|
|
Stepwise entry
|
starts like forward, enter variable with most variance explained. At each step, variables evaluated for whether they should be entered or removed.
|
|
Approaches to Multiple regression: logical/rational
|
Hierarchical regression/forced entry: focus on change in R2. You decide order of entry based on prior literature/theory.
•Block 1: enter controls, block 2 enter other predictors. Ask for R2 change. |
|
Mediation
|
x causes m causes y
|
|
Partial mediation
|
still a direct relationship b/w x and y
m and x have nonzero weights |
|
Full mediation
|
no relationship b/w x and y except through m
m significant, x not |
|
Steps in mediation
|
1. regress y onto x
2. regress m onto x 3. regress y onto x and m |
|
sobel test
|
test if indirect path is statistically significant using weights and standard error
|
|
Moderation
|
z changes relationship between x and y (interaction)
|
|
Disordinal interaction
|
graph looks like: X
at one level of z, x and y positively related at other level of z, x and y negatively related |
|
Ordinal interaction
|
graph looks like: >
|
|
Steps in moderation
|
Step 1: Form product term (XZ)
Step 2: Generate regression equation in which y is regressed on x and z Step 3: Add the product term to the regression equation and evaluate significance Step 4: Plot interactions |
|
Centering
|
make zero a meaningful value on that variable. Most of our data do not have meaningful zero, coefficients difficult to interpret.
Ways to center: x-x ̅, maintain original units. Doesn’t matter if you’re only interested in significance Convert to z scores, z ̅=0 |
|
Using continuous predictors
|
Pick value on low and high end of continuous variable and plug into regression equation to plot line, can pick any value because variable infinite, usually pick ±1 SD.
|
|
Simple slopes
|
predict Y from X at a particular level of Z. Is slope of line different from zero? Hold moderator variable constant at a particular level, is regression equation significant?
Ŷ = B0 + (B1 +B3Z)X +B2Z |
|
Polytomous categorical predictors
|
Create more than one dummy variable to account for multiple levels (number of groups - 1)
Put in all dummy variables as a set Significance of predictors is in comparison to referent group (is mean significantly different from referent group mean?) |
|
What happens if we change the codes for categorical variables
|
CODES DON'T MATTER! R value and model summary will be the same, only the regression weights change.
|
|
Effects coding: unweighted
|
Use when no obvious referent group
Compare each group to overall mean Same as dummy coding, except code choose one group to code -1 in each dummy variable. No test to compare this group to mean. Constant = unweighted grand mean |
|
Effects coding: weighted
|
No reference group but want to give weight to groups based on sample size, use when sample representative of population. Compare to overall weighted mean.
Code one group: Denominator: sample size of x4 Numerator: sample size of x1 |
|
Contrast coding
|
Use when we want to test specific hypotheses
C1: x1 = 1/3, x2 = 1/3, x3 = 0, x4 = -2/3 C2: x1 = -1/2, x2 = 1/2, x3 = 0, x4 = 0 C3: x1 = -1/4, x2 = -1/4, x3 = 3/4, x4 = -1/4 Constant = unweighted grand mean |
|
2 rules for contrast coding
|
•Codes for each contrast must sum to zero
•Sum of products for each pair of contrasts must sum to zero ∑C1*C2 = 0 Ex. 1/3 + 1/3 + 0 + -2/3 = 0 (1/3)(-1/2) + (1/3)(1/2) + (0)(0) + (-2/3)(0) = 0 |
|
Polynomial regression
|
Step 1: y onto x, step 2: add x2, improves prediction
If we include higher order term, must include lower terms. Can include x3, significantly improves. |
|
Partial regression coefficient
|
Change in Y per unit change in X1 when we partial out the effect of X2. Regression coefficient in multiple regression context.
|
|
Effects in mediation
|
Direct: c', x on y
Indirect: a*b, pathway through m Total: c |
|
Dummy codes
|
D1: x1 = 1, all others = 0
D2: x2 = 1, all others = 0 D3: x3 = 1, all others = 0 Significance of weights is in comparison to referent group |
|
Bulging rule
|
Power down X if in left quadrants, power up X if in right quadrants
|
|
Discrepancy
|
Degree to which a case is in line with the swarm of points
Studentized residuals |
|
Internally studentized residuals
|
ratio of a case's residual to the SD of the residual. Cases close to 0 are not discrepant, but no real method for identifying discrepancy
|
|
Externally studentized residuals
|
case is dropped, regression equation is re-estimated and applied to the hold out case. residual and SD are computed. distributed as t, use t-test for significance
|
|
Distance/Leverage
|
Conveys how far a given case is from the swarm of other cases
Mahalanobis Distance |
|
Mahalanobis Distance
|
distributed as Chi-square, can look up critical value in table. Degrees of freedom = # of predictors. Go through variables look for cases that exceed critical value. Considers combination of variables.
|
|
Influence
|
Leverage X Discrepancy. Addresses the impact of removing the case on the resulting regression equation and coefficients.
|
|
Global Measures of Influence
|
Look at impact on overall model (R^2)
Cook's D DFFit |
|
Specific Measures of Influence
|
impact on individual regression weights
DFBeta |
|
Outliers in the solution
|
look for outliers after analysis has been run, how does model fit for cases? casewise diagnostics
|
|
Multicollinearity
|
Highly correlated predictors are included in the analysis
Changes weights and increases standard error of weights Predictions are less accurate and more unstable Need to consider all variables simultaneously |
|
How do we evaluate multicollinearity?
|
Treat X1 as DV, run regression with other predictors, interested in R2. If small, X1 is unique.
VIF Tolerance |
|
VIF
|
variable inflation factor. How large is the standard error of a particular coefficient in relation to what it would have been if the predictor was completely unrelated to the others? Rule of thumb is to define serious as VIF > 10
|
|
Tolerance
|
reciprocal of VIF, how much variance in a given predictor is independent of the others? Ranges from 0-1.0, 0 = no unique variance, 1.0 = totally independent. Rule of thumb is to define serious as tolerance < .10
|
|
Violations of relationship form
|
Plot residuals against each predictor. Visual evaluation of linearity (Loess lines). “Handwritten” regression line. Is it a generally linear relationship?
To fix: incorporate nonlinear terms |
|
Missing predictors
|
can’t check for it. Know the literature. If you measured predictor, include it.
To fix: if you didn’t measure predictor, you’re screwed. If you did, include it. If R2 increases, model improves. |
|
Violations: reliability
|
Use measures that have demonstrated reliability. Adds random error to system variance. Usually don’t worry about because unreliability makes it more difficult to find a relationship.
o To fix: can use equation to correct for unreliability, ignore it, or use SEM. |
|
Heteroscedasticity
|
Visual approach: compare residuals to each of the predictors and predicted criterion. Should be no relationship, want variability to be equal around line. Concern if small and fans out.
Empirical approach: use residuals as DV, individual predictors as IVs and conduct t-tests. Or can split variable into groups (i.e. divide into 5 groups with 20% of data). Run an ANOVA. If Levene’s significant, means variance different in groups. Rare to have heteroscedasticity. Rule of thumb: ratio of largest to smallest conditional variance > 10. Weighted Least Squares: differentially weight cases based on variance of residuals. Cannot be interpreted the same as OLS. |
|
Nonindependence of residuals
|
Visual: plot case by residual (case x, residual y) should be no relationship.
Empirical: Durbin-Watson test. Centered around z, look for extreme values away from 2 (0, 4). Is there a relationship between case and residuals? Worry about in studies in which cases are meaningfully organized, i.e. something at beginning of data different from end. In most data sets, case numbers are arbitrary. To fix: If related to clustering, create dummy variables to represent groups. Multilevel modeling procedures. Effect of higher level variables on lower level variables. Happens if variables are nested. If related to serial dependency, transform data. |
|
Checking if residuals are normally distributed
|
Visual: histogram, q-q plot (plot residuals), best approach
Empirical: Shapiro-Wilk test, conservative |
|
R
|
Correlation between observed and predicted values of the DV
|