Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
28 Cards in this Set
- Front
- Back
what is a linear regression line |
the line that summarizes a scatterplot by, on average, passing through the center of the Y scores at each X
-- the best fitting straight line |
|
what is the linear regression procedure used for |
to predict Y scores based on the scores from a correlated X |
|
what is Y' and how do you obtain it |
Y' = Y prime = the predicted Y score for a given X; computed from the regression equation |
|
general form of a linear regression equation |
Y' = bX + a |
|
what does the Y intercept indicate |
the value of Y when the regression line crosses the Y axis (X = 0) |
|
what does the slope indicate |
the direction and degree that the regression line is slanted |
|
distinguish between the PREDICTOR variable and the CRITERION variable in linear regression |
X = predictor variable = the given X variable
Y = criterion variable = the to-be-predicted Y variable |
|
what is Sy'
|
the standard error of the estimate |
|
what does Sy' tell you about the spread in the Y scores
|
it is a standard deviation, indicating the "average" amount that the Y scores deviate from their corresponding vales of Y' |
|
why does Sy' tell you about your errors in predicition |
indicates the average amount that the actual scores differ from the predicted Y' scores, so it is the "average" error |
|
in order for the standard error of the estimate to accurate, there are 2 assumptions: |
assumes Y scores are:
1) homoscedastic -- scores are equally spread out around the regression line at each X
2) normally distributed -- forming a normal distribution around the regression line at each X |
|
how does heteroscedasticity lead to an inaccurate description of the data? |
heterscedasticity means Y scores are not spread out from Y' to the same extent at all Xs, so the standard error of the estimate will over/under-estimate the error, depending on the value of X |
|
how is the value of Sy' related to the size of r |
Sy' is inversely related to the absolute value of r, because the smaller Sy' indicates the Y scores are closer to the regression line (and Y') at each X, which is what happens with a stronger relationship (a larger r) |
|
when are multiple regression procedures used |
when more than one predictor variable (X variable) is correlated with and used to predict the scores on one criterion variable (Y variable) |
|
what are 2 statistical names for r^2 |
1) the coefficient of determination
2) the proportion of variance in Y that is accounted for by the relationship with X |
|
how do you interpret r^2 |
indicates the proportional improvement in accuracy when using the relationship with X to predict Y scores, compared to using the overall mean of Y to predict Y scores |
|
regression analysis is used when: |
we have 2 variables and they are correlated
uses the correlation between 2 variables to predict unknown or future scores |
|
the mean is the best predictor when... |
we only have ONE variable and we want to predict an unknown score on that variable
--avg amount of error in prediction = standard deviation for that variable |
|
accuracy of prediction ENTIRELY depends on |
the correlation of your two variables
the close r is to +1 or -1, the greater the level of prediction
|
|
error in prediction |
the distance from each real data point (y) to the predicted value (y')
best fitting regression line = smallest amount of total error |
|
the strong the r (correlation), the _______ the error in prediction |
smaller |
|
2 elements describe the regression line |
1) slope 2) y-intercept |
|
if r = 0, your standard error of estimate = ? |
the standard deviation of that variable (this is no better than the mean) |
|
assumptions of linear regression (when its ok to perform linear regression) |
1) linearity -- there is a linear relationship between variables
2) homoscedasticity -- occurs when Y scores are spread out to the same degree at every X
3) Y scores at each X form an approximately normal distribution |
|
when we do NOT use the relationship to predict score |
--use over mean of Y scores as everyone's predicted Y
--error is Y - (mean of Y) error is Sy^2 |
|
when we DO use the relationship to predict scores |
use the corresponding Y' as determined in by the linear regression equation as our predicted Y value
--error is (Y-Y')
error is Sy^2 |
|
proportion of variance accounted for is.... |
the proportional improvement in accuracy when using the relationship with X to predict Y, compared to using (mean of Y) to predict Y |
|
how is proportion of variance accounted for computed |
find r^2 "the coefficient of determination" |