• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/79

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

79 Cards in this Set

  • Front
  • Back
Describe the components of the linearity assumption.
1. There is a linear relationship between Y and the predictors.

2. The slope of the relationship between x and predicted y is the same (constant), regardless of the value of the other variables.
How do you test for linearity?
Examine a residual plot.
The residual plot involves placing the standardized residuals on the Y axis and the standardized predicted values on the X axis.

(This is the graph with the horizontal zero error line and lots of points clustered above and below it).

Ze = standardized residuals
Zyhat= standardized predicted values.
How do you know if there is a violation of the linearity assumption?
You'll see a curve in the residual plot. Most residuals will be above the zero line at some predicted values and below zero at others.
Discuss homoskedasticity and heteroskedasticity in terms of their relative desirability for MLR.
homoskedasticity is good, heteroskedasticity is bad.
Discuss potential violations of the linearity assumption.
1. interaction. When X1 changes, y changes one amount at one level and a different amount at another level(?) This means that predicting y depends upon the level of another variable.
2. curvilinear relationship.
Describe the components of the linearity assumption.
1. There is a linear relationship between Y and the predictors.

2. The slope of the relationship between x and predicted y is the same (constant), regardless of the value of the other variables.
How do you test for linearity?
Examine a residual plot.
The residual plot involves placing the standardized residuals on the Y axis and the standardized predicted values on the X axis.

(This is the graph with the horizontal zero error line and lots of points clustered above and below it).

Ze = standardized residuals
Zyhat= standardized predicted values.
How do you know if there is a violation of the linearity assumption?
You'll see a curve in the residual plot. Most residuals will be above the zero line at some predicted values and below zero at others.
Discuss homoskedasticity and heteroskedasticity in terms of their relative desirability for MLR.
homoskedasticity is good, heteroskedasticity is bad.
Discuss potential violations of the linearity assumption.
1. interaction. When X1 changes, y changes one amount at one level and a different amount at another level(?) This means that predicting y depends upon the level of another variable.
2. curvilinear relationship.
Discuss what we would hope to see on a residual plot and what it means.
y = Ze (Standardized residuals)
x = Zy (standardized predicted values)

We want errors to fall equally above and below the line. That means that we are better at predicting y.
What would heteroskedasticity look like on a residual plot?
megaphone shape
What are 2 problematic things that you might see on a residual plot?
1. curvilinear relationship
2. Megaphone shape to the errors (this means heteroskedasticity)
If you see a curvilinear relationship on a residual plot (strd residuals vs. strd predicted values) what do you need to do?
If there is a curvilinear relationship to the errors, including x^2 or x^3 term might allow you to caputre the relationship in order to better predict y. (this process is polynomial regression?)
What do you do if you detect nonlinearity?
1. try polynomial regression
2. try a nonlinear transformation of y
3. If there is a nonlinear regression relationship, but the distribution of error terms is normal and the error terms are homoskedastic, then transform x instead of y.
Describe the homoskedasticity assumption.
1. There are equal variances for residuals across each value of the IVs.

2. There is constant scatter around e=0.
If you see a megaphone shape in your error points on the residual plot, what does this tell you?
you have heteroskedasticity

there is a greater error of measurement at some levels of the IV than at others.
On a residual plot of vacation spending versus family income, the error points are megaphone shaped. How do you correct for this?
Use an interaction term (here, the interaction term will be level of enjoyment of vacation: low, mid, high. Errors cluster more closely around each of these three seperate lines).
For a megaphone-shaped residual plot, what does adding an interaction term do?
Adding an interaction term helps to reduce the heteroskedasticity of error variance.
What are the three central testing assumptions in MLR?
1. linearity assumption
2. homoscedasticity assumption
3. assumption of normality of errors
How do you test for homoscedasticity?
Plot standardized residuals (Ze) on y axis and standardized predicted values (Zy hat) on x axis.
Residual plots allow you to test for:
1. linearity
2. homoskedasticity
How do you identify a violation of homoscedasticity?
you see megaphone shape in residual plot.
What might cause a violation of homoscedasticity?
1. nonnormality of a variable
2. one variable related to some transformation of another
3. greater error of measurement at some levels of an IV than at others. (this is the situation for which you have to add intxn term).
What is the assumption of normality of errors?
Errors will be normally distributed across each value of the IVs.
How do you test for normality of errors?
Use SPSS output to look at histogram of residuals, PP plot, QQ plot.
If you use SPSS output to look at histogram of residuals, PP plot, QQ plot, which MLR assumption are you testing?
Assumption of normality of errors
If you plot standardized residuals (Ze) on the y axis and standardized predicted values (Zy) in order to get a residual plot, what assumptions of MLR are you testing?
linearity assumption
homoskedasticity assumption
This plot rank orders residuals and compares them to the rank order of residuals expected from a normal distribution.
PP Plot
This plot compares standardized residuals to expected values.
QQ plot
Describe what good/bad histograms and PP plots look like.
Good histogram = normally distributed
Bad histogram = skewed (highest point off center, to left or right)

Good PP plot: points form straight diagonal line
Bad PP plot: points form a curvy line.
the potential to change the slope
leverage
have unusual values on the independent variable
outliers
A point that is high in leverage....
is unusual in its value on the independent variable
Looks at the distance/diference between predicted y and observed y.
discrepancy
On a graph for which a diagonal line has been drawn through a cloud of points, the point that falls directly above the mean (and far above the cloud of points) can be characterized in what ways based upon discrepancy and leverage?
This point is high in discrepancy and low in leverage.
For a cloud of points through which a diagonal line has been drawn, a point that is far to the right of the cloud of point but is still along the line of the diagonal (if the diagonal line were extended beyond the cloud of points) can be said to have what qualities in terms of discrepancy and leverage?
High in leverage (change slope), but low in discrepancy (difference in predicted y and obs y).
For a cloud of data points through which a diagonal line has been drawn, a point that is not above the mean and not immediately beyond the diagonal would be considered to be what (in terms of discrepancy and leverage)
High in leverage (change slope), and high in discrepancy (difference in predicted y and obs y). This means that this is an influential data point.
This type of point is high in both leverage and discrepancy.
influential data pt
This type of point is particularly likely to change regression coeficients and impact the slope.
influential data pt
If we keep a point that is high in discrepancy but low in leverage in the regression equation, what will happen.
If this type of point is above the mean, the regression constant will go up.
If this high discrepancy, low leverage point is below the mean, the regession constant will go down.
How do you know if there is a violation of normality of errors with the histogram of residuals?
Residuals should be normally distributed in the histogram.
How do you know if there is a violation of normality of errors with the PP and QQ plots?
In PP and QQ plots, the points for the cases should fall along the diagonal running from lower left to upper right. If the pts are way off line (often below the line at one end and above at the other end) there's a violation.
What do you do if you detect nonnormality?
First, correct nonconstant error variance first to see if it fixes the problem.

Most transformations that correct for heteroscedasticity also correct for non-normality.
Most transformations that correct for _____ also correct for _____.
Most transformations that correct for heteroscedasticity also correct for non-normality.
A point that is high in leverage is...
far from the mean on the IV.
A point that is highly discrepant has...
a big difference between observed y and predicted y.
This is the number of SDs by which predicted y (y hat) would change if we deleted case i.
DIFITS(Difference in fit, standardized).
Indicates how much deleting this case effects each individual b (for instance, b for variable X1, b for variable X2, etc).
DF Beta
this is at the mean of all of your variables
centroid
We are looking at cases that are far from the cetroid when determining...
Mahalonobis distance
What are outliers?
1. Cases that do not fit the pattern of data
2. points that are far from the mean of x or y or both
3. in SLR, an observation whose DV value is unusual given the value of the IV.
What are the consequences of having outliers in my data?
1. Outliers may not be predicted well by the regression line, so they have large residual values.
2. Outliers may influence the regression line (by impacting either the regression constant or the slope)
3. Outliers may lead to type I and II errors
4. Outliers may lead to results that do not generalize except to another sample with the same type of outlier.
How do I detect possible outliers in my data?
Examine cases with large standardized scores on one or more variables (z>3.29 or simply disconected from the other z scores). This helps you examine a person who is way out of range

2. Examine the split for your dichotomous variables to make sure it's not terribly uneven

3. use graphical methods- histograms, box plots, normal probability plots. Outliers will appear to be far away from the rest of the data points

4. Examine studentized residuals

5. Examine studentized deleted residual (t1)
In a residual plot, points that fall ______ are potential outliers.
In a residual plot, points that fall far away from the regression surface (e=0) are potential outliers.
Most of these fall between -3 and 3 on the standardized normal curve
studentized residuals
How do you create studentized residuals (which are used to detect outliers)?
To create studentized residuals, you must standardize the residuals by dividing by the standard error of the residuals.
When creating a studentized deleted residual...
1. delete case i from the regression and compute the regression
2. use that regression solution to compute error for case i, this is the deleted residual
3. studentize by dividing by its standard error.
Why can it be difficult to detect outliers?
Sometimes, outliers can be well predicted by the line.

Also, when you run a regression, an outlier might shift the line by influencing the slope, so it might not look like an outlier.
This statistic is distributed as a t distribution with (n-k-2) df, so we can formally test for outlier status.
studentized deleted residual
What is the question that we are asking when we look for an influential data point?
When I take this case out, does it change the slope of the line, b?
What are influential data points?
1. cases that exert an influence on the position of the regression line
2. The removal of an IDP will result in a significant change in the regression.
3. Are both highly discrepant and have high leverage
Influence on coefiecients =
discrepency * leverage
How do I test for influence?
see how much regression coeficients change when a case is deleted.
Measures the observation's influence on the regression coefficients
Cook's distance
Represents the difference between the bs (slope) when the case is and is not included in the regression.
Cook's distance
How do I know if I have an influential Data point?
1. Cases with influence scores larger than 1 are suspected of being outliers.

2. Formal test: Cook's distance is distributed similar to an F stat with (k+1,n-k-1)df. (h0= cases do not change the regression coeficients).
What do I do if I have Influential Data points?
1. Exclude them (but this could lead to unstable regression coeficients).
2. Examine curvilinear relationships or interactions
3. try a transformation
How do I test for high leverage data points?
1. Use hat values
2. Use a related index called Mahalonobis distance
3. Use the mean shift outlier model (not discussed much in class).
Index of how much a single case can impact predicted Ys
hat values (which are used to test for high leverage data points).
If the hat value is large for a particular case, then...
that case has a big impact on the fitted values.
(hat values are used when testing for high leverage data points).
Hat values range from...
1/n to 1
In SLR, hat values measure distance from __, while in MLR, hat values measure distance from__.
In SLR, hat values measure distance from the mean of X, while in MLR, hat values measure distance from the centroid.
What is the centroid?
Centroid = point at the intersection of the means of all the variables in multivariate space.
In MLR, hat values measure distance from the centroid.
Measures the distance of a case from the centroid of the remaining cases
Mahalonobis distance
Mahalonobis distance tests for what type of data points?
high leverage data pts
Mahalonobis distance allows you to evaluate for each case using a _________
Chi square distribution.
As hat values reach one, what does this mean?
Hat values approaching one have the potential to exert a big influence on the line.
The further away from the centroid they get, hat values ____.
increase.

Hat values increase the further from the centroid that they get.