Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
39 Cards in this Set
- Front
- Back
The response variable
|
the variable whose value can be explained by the value of the explanatory or predictor variable.
|
|
scatter diagram
|
is a graph that shows the relationship between two quantitive variables measured on the same individual. Each individual in the data set is represented by a paint in the scatter diagram. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.
|
|
linear correlation coefficient OR Pearson product moment correlation coefficient
|
is a measure of the strength and direction of the linear relation between two quanitive values. The Greek letter p (rho) represents the population correlation coefficient, and r represents the sample correlation coefficient. We present only the formula of the sample correlation coefficient.
|
|
The linear correlation coefficient is always between
|
-1 and 1 inclusive. That is -1 ≤ r ≤1
|
|
if r = +1
|
then a perfect positive linear relation exists between the two variables
|
|
if r = -1
|
the a perfect negative linear relation exists between the two variables
|
|
if r is close to 0
|
then little or no evidence exists of a linear relation between the two variables. SO r CLOSE TO 0 DOES NOT IMPLY NO RELATION, JUST NO LINEAR RELATION.
|
|
The linear correlation coefficient is a unit less measure of association. So
|
so the unit of measure for x and y plays no role in the interpretation of r.
|
|
The correlation coefficient is
|
not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient.
|
|
positively associated
|
two variables that are linearly related are positively associated. That is two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.
|
|
negatively associated
|
two variables are negatively associated if. whenever the value of one variable increases, the value of the other variable decreases.
|
|
correlation matrix
|
excel provides it. for every pair of columns in the spreadsheet it will compute and display the correlation in the bottom triangle of the matrix.
|
|
How do you test for a linear relation
|
1. test for the absolute value of the correlation coefficient.
2. Find the critical value in table 2 from appendix A 3. If the absolute value of the correlation coefficient is greater than the critical value, we say a linear relation exists between the two variables. Otherwise no linear relation exists. * if the correlation coefficient is positive & greater than the critical value, the variables are positively associated. If the correlation coefficient is negative and less than the opposite of the critical value, the variables are negatively associated. |
|
The linear correlation coefficient that implies strong positive or negative association does not imply causation if
|
it was computed using observational data. Why? lurking variables
i.e. as air-conditioning bills increase so does crime rate. |
|
xbar
|
sample mean of the explanatory variable
|
|
sx
|
sample standard deviation of the explanatory variable
|
|
Ybar
|
sample mean of the response variable
|
|
Sy
|
sample standard deviation of the response variable
|
|
the closer r is to +1
|
the stronger the evidence of positive association
|
|
The closer r is to -1
|
the stronger the evidence of negative association between two variables
|
|
residual
|
residual represents how close our prediction comes to the actual observation. The smaller the residual the better the prediction.
residual= observed y - predicted y |
|
slope (m)
|
y₂- y₁ ÷ x₂ - x₁ or rise/run or change in y ÷ change in x
|
|
slope point formula
|
y - y₁ = m (x - x₁)
|
|
least-squares regression line
|
the line that minimizes the sum of the squared errors (or residuals). This line minimizes the sum of the squared vertical distance between the observed values of y and those predicted by y-hat. We represent this as "minimize ∑residuals²"
|
|
The equation of the least-squares regression line is given by
|
y hat= b₁x + b₀
|
|
a good fit
|
means that the line drawn appears to describe the relation between two variables well
|
|
the slope of the least squares regression line
|
b₁ = r × (sy ÷ sx) AND b₀ = ybar - b₁xbar is the y-intercept of the least squares regression line
note: xbar is the sample mean and sx is the sample standard deviation of response variable y |
|
in a TI-84 the slope is
|
a
|
|
in a TI-84 the y-intercept is
|
b
|
|
Be careful when using the least-squares regression line to make predictions that are
|
much larger or much smaller than those observed
|
|
if the linear correlation between two variables is negative
|
the slope of the regression line will also be negative
|
|
the leas squares aggression line always travels through the point (xbar, ybar)
|
the point (xbar, ybar)
|
|
residual =
|
observed y- predeicted y= y-yhat
|
|
coefficient of determination (R²)
|
measures the proportion of total variation in the response variable that is explained by the least-squares regression line
in other words, the coefficient of determination is a measure of how well the least-squares regression line describes the relation between the explanatory and response variables. The closer R² is to 1, the better the line describes how changes in the explanatory variable affect the value of the response variable. |
|
total deviation
|
the deviation between the observed and mean values of the response variable is called the total deviation. So total deviation = y - ybar
total deviation = unexplained deviation + explained deviation |
|
explained deviation
|
The deviation between predicted and mean values of the response variable is called the explained deviation.
the explained deviation = yhat- ybar |
|
unexplained deviation
|
deviation between observed and predicted values is the unexplained deviation
unexplained deviation = y-yhat unexplained variation is also found by summing the squares of the residuals |
|
residual plot
|
a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis.
|
|
Constant error variance aka homoscedasticity
|
if a plot of the residuals agains the explanatory variables shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated.
|