• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/86

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

86 Cards in this Set

  • Front
  • Back
Central Limit Theorem
For populations that aren’t normally
distributed, if n ≥ 25, X ̄ approx. ∼ N(μ,σ2/n).
significance level
aka Type I error rate

if H0 true, when # tests -> infinity, % of tests rejected under alpha by chance -> 5 percent
cor(X, Y )
cov(X, Y )/σX σY
cor(X, Y ) = 0 =⇒??
ONLY IF X, Y ~N, X, Y are independent
Four Kinds of Influence
Reversability of correlation
Cor(X,Y) = Cor(Y,X)
Error term in Simple Regression
assumed mean/E(epsilon-i) = 0, variance constant, epsilon-i and epsilon-j uncorrelated so covariance is zero for all i, j i=/j
Adjectives for regression models and what they mean
simple = one predictor
'linear in the parameters' = no paramater appears as exponent or multiplied/divided by other parameter (nonlinear)
Yi (response variable) -- random?
Yes: random error term added makes Yi a random var
E(Yi)
Since E(error-i) = 0, E(Yi) = Beta0 + Beta1*(Xi)

so predicted model is E(Y|X) = Beta0 + Beta1*(X)
Distribution of Yi
Since errors are (are assumed) to be normally distributed N(0, sigma^2),

Yi ~ N(Beta0 + Beta1(Xi), sigma^2)
Predicting changes in Y when X changes
Beta0 and Beta1 describe the relationship between the *mean* of Y and X, while the variance of the residuals sigma^2 describes variability of Yi around the mean
Beta0 and Beta1 in terms of E(Y)
E(Y|X=0) = Beta0
E(Y|X=x+1) - E(Y|X=x) = Beta1
Ordinary Least Squares regression
a way to find Betas; minimizes the sum of squared residuals, where residual ei = Yi - Yhat (i.e. solves calculus minimization for sum of squared residuals Sigma[(Yi - Beta0 - Beta1*Xi)^2] )
Three Assumptions of LS Regression
All observations are uncorrelated:
- corr(Yi, Yj) = corr(error-i, error-j) = 0

All observations are equally informative:
- Var(Yi) = Var(error-i) = sigma^2

There is no systematic bias in the model
E(error-i) = 0

For testing & inference, fourth assumption is that errors are ~ N(0, sigma^2)
Least Square regression estimate of B1
E(Beta1-hat) = Beta1;
var(Beta1-hat) = sigma^2/Sigma[(Xi - Xbar-i)^2]
Least Square regression estimate of B0
E(Beta0-hat) = Beta0
B0-hat Variance
B0hat and B1hat distributions
Because sigma^2 (of Yi and error-i) isn't known, distributed t with n-2 degrees of freedom
Consequences of LS assumptions
Sum of residuals is 0, Sum of observations = sum of fitted values (Sigma(Yi) = Sigma(Yhat-i), regression line always goes through (Xbar, Ybar)
Why least squares estimators are random variables
different samples yield different values !
Distribution of least square estimators & associated test statistic
if errors are normally distributed as assumed, they are normally distributed;

For testing H0: Beta = 0eta-hat - 0 / S.e.(Beta-hat) ~ N or t(n-2) depending on sample size
"Is there a relationship between...?"
First look at correlation to determine if vars are linearly related and if so what direction relationship is in -- can also just look at a scatterplot to glean this. To quantify the relationship we employ SLR (simple linear regression) and look at R^2, which is the square of the correlation coefficient between X and Y (for simple linear regression).
What R^2 is and means
R^2 is the amount of variation in response that can be explained by predictor(s) in model ALONE, ignoring the effect of other things not in model; R^2 = SSR/SST

R^2 is NOT GOOD FOR TESTING, only for understanding.
Bonferroni correction
Testing both hypotheses at once requires adjustment to the significance level of each in order to preserve the overall significance level of the entire test

If k tests are performed, at least one of them needs to be significant at alpha/k for overall testing to be signficiant
Standard error for mean response (prediction)
note smaller than forecast because it's harder to estimate a single point; "standard error of the fitted value"
Standard error for one observation (forecast)
"standard error of the forecast value"

Forecast interval is technically NOT a confidence interval. It is not for testing! This is a 95% highest probability density interval for the new forecast variable.
Graphical interpretation of Betas in MLR
Beta1 = effect of X1 on Yhat with all other predictors held constant ('adjusted for other Xs')
Collinearity
inter-predictor relationship
An example of model fitting procedure
1) look at scatter matrix, check out which predictors have a clearly linear relation to outcome

2)
Comparing R^2 of regress y x1 and regress y x2 to R^2 of regress y x1 x2
Rcombined^2 <= R1^2 + R2^2; only equal if X1 and X2 are totally unrelated
Deriving betas for regress y x1 x2 from regress y x1 and regress y x2
To find the unique contribution of x2 that is not already explained by x1, we regress the unexplained part of y onto the unexplained part of x2. We use **eyx1∼ex2x1**. Beta1 from this regression is x2's regression coefficient.
Marginal vs. partial coefficients
marginal coefficient Beta1 in SLR describes effect of X1 on Y ignoring all other variables; partial coefficient Beta1 in MLR describes effect of X1 on Y 'adjusted for' other predictors (i.e. with other predictors fixed).
Difference in distribution of predictors for MLR
All holds as in SLR, but estimates are distributed on t- with n-p-1 degrees of freedom. Residual variance still estimated with s^2.
How to test H0: all coefficients are zero
1) compare p-values associated with each Beta to alpha/(# parameters); if at least one exceeds significance can reject H0

OR

2) Cf. Overall F-test in regression output w/ associated p-value; this p-value is defaulted to be a measure of extremity against your H0
How to test H0: some coefficients are zero
1) Bonferroni

2) Run FM (full model) and RM (reduced model) regressions and compare adjusted R^2 (adjusts for # parameters)

3) Partial F-test ('exact' way)
Adjusted R^2
unlike R2, adjusted R2 has expectation zero, and it can be negative
Partial F-test
df of distribution is difference in # of parameters in two models and n-p-1 (df of full model)

judiciously, can be used for the following scenarios:

Testing whether one single coefficient is 0, Testing whether all coefficients are simultaneously 0, Testing whether several coefficients are simultaneously 0
Testing H0: Beta1=Beta2
Two ways to do this; first way is to generate a new variable Star = X1+X2

If full model is, for example:
Y = Beta0 + Beta1X1 + Beta2X2
reduced model to compare is
Y = Beta0 + Beta1(X1+X2 = Star)

Alternatively, can derive SSE of reduced model from Root MSE which is on output of 'constrain', and can calculate partial F-statistic from there.
Implications of Unmet Assumptions
for example: If errors are not ~N, least square estimates don't have known distribution, can't test hypotheses b/c statistics are undefined
Implications of Unmet Assumptions
for example: If errors are not ~N, least square estimates don't have known distribution, can't test hypotheses b/c statistics are undefined
Implications of Unmet Assumptions
for example: If errors are not ~N, least square estimates don't have known distribution, can't test hypotheses b/c statistics are undefined
Implications of Unmet Assumptions
for example: If errors are not ~N, least square estimates don't have known distribution, can't test hypotheses b/c statistics are undefined
Main assumption of regression inference
Errors are iid ~ N(0,Sigma^2)
Main assumption of regression inference
Errors are iid ~ N(0,Sigma^2)
Main assumption of regression inference
Errors are iid ~ N(0,Sigma^2)
Main assumption of regression inference
Errors are iid ~ N(0,Sigma^2)
Leverage
hii, the amount of weight Yi is given in predicting Yhat-i, is the leverage of the i-th observation
Leverage
hii, the amount of weight Yi is given in predicting Yhat-i, is the leverage of the i-th observation
Leverage
hii, the amount of weight Yi is given in predicting Yhat-i, is the leverage of the i-th observation
Leverage
hii, the amount of weight Yi is given in predicting Yhat-i, is the leverage of the i-th observation
About Variance of Residuals and Why its the case
although residuals do have mean 0, but each has different variance. In addition, they are correlated

because looking at Yhat-i in terms of leverage, we see that all residuals are estimated from the same data, and they each interdepend on the rest of the observations
About Variance of Residuals and Why its the case
although residuals do have mean 0, but each has different variance. In addition, they are correlated

because looking at Yhat-i in terms of leverage, we see that all residuals are estimated from the same data, and they each interdepend on the rest of the observations
About Variance of Residuals and Why its the case
although residuals do have mean 0, but each has different variance. In addition, they are correlated

because looking at Yhat-i in terms of leverage, we see that all residuals are estimated from the same data, and they each interdepend on the rest of the observations
About Variance of Residuals and Why its the case
although residuals do have mean 0, but each has different variance. In addition, they are correlated

because looking at Yhat-i in terms of leverage, we see that all residuals are estimated from the same data, and they each interdepend on the rest of the observations
Leverage point
if hii (leverage) is close to 1; means that Yi plays a strong role in predicting Yhat-i; means further that Yi is an outlier
Leverage point
if hii (leverage) is close to 1; means that Yi plays a strong role in predicting Yhat-i; means further that Yi is an outlier
Leverage point
if hii (leverage) is close to 1; means that Yi plays a strong role in predicting Yhat-i; means further that Yi is an outlier
Leverage point
if hii (leverage) is close to 1; means that Yi plays a strong role in predicting Yhat-i; means further that Yi is an outlier
Screen for leverage points
Screen for leverage points
Leverage in SLR
Leverage in SLR
standardized residual
mean 0, variance 1; note that this relies on Sigma (variation of error) which is infrequently known (IE THESE ARE RARE!)
standardized residual
mean 0, variance 1; note that this relies on Sigma (variation of error) which is infrequently known (IE THESE ARE RARE!)
studentized residual
s is estimate of Sigma

in STATA: rstandard = studentized (our class) = 'internally studentized' (the book)

"too big" is > 2, and these residuals no longer add up to 0
studentized residual
s is estimate of Sigma

in STATA: rstandard = studentized (our class) = 'internally studentized' (the book)

"too big" is > 2, and these residuals no longer add up to 0
Checking Homoscedacity with Studentized residuals
rstandard vs fitted values 'box of judgement'
Checking Homoscedacity with Studentized residuals
rstandard vs fitted values 'box of judgement'
Checking normality with studentized residuals
Q-Q plot!
Checking normality with studentized residuals
Q-Q plot!
Checking linearity with studentized residuals
rstandard vs fitted values scatter
Checking linearity with studentized residuals
rstandard vs fitted values scatter
Checking independence with studentized residuals
Only pertinent if order matters in data; index plot of rstandard
Checking independence with studentized residuals
Only pertinent if order matters in data; index plot of rstandard
I decided to omit a predictor from my model. How do I check if this omitted content is correlated with my new model's residuals?
scatter plot of residuals vs omitted predictor; if you see a linear pattern you should see if adding the variable back in is a benefit!
I decided to omit a predictor from my model. How do I check if this omitted content is correlated with my new model's residuals?
scatter plot of residuals vs omitted predictor; if you see a linear pattern you should see if adding the variable back in is a benefit!
Review Placard
Review Placard
Review placard contd
Insidious Outliers!
outlier(s) --> big residual --> overstated s.e. --> internally studentized residuals are too SMALL --> ability to detect less glaring outliers and to check assumptions is impaired

EXTERNALLY STUDENTIZED residuals conceived in response -- these are from MSE, which don't need the whole data (can leave out outliers to make a more 'usual' data set)
externally studentized or 'jackknife' procedure
dont want to contaminate w obs that will be dropped later, so calculate s.e. w/o current obs

tends to make outlier residuals more prominent than internally studentized residuals
Potential harm of keeping an outlier
regression surface tilted to accommodate them, OR if MSE inflated too much. This extra noisiness could mask other
outliers and/or regression violations
Potential harm of dropping an outlier
may introduce bias!
Influence
a point is influential if removing the point causes substantial model change

In order to have much influence, a case must have large leverage AND residual
Cook's Distance
Residual^2 * potential

Cook’s distance combines the size of the residuals with the amount of leverage. Both are required to be large for Cook’s distance to be large. Large Cook’s distance means likely outlier and likely leverage point (i.e. likely influence point).