Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
66 Cards in this Set
- Front
- Back
NB: When you've transformed a variable for regression ...
|
Make sure you remember the units are transformed, too!
|
|
First step when you're not sure if data set needs to be transformed into linear
|
'center' the independent variables (predictors): i.e. subtract mean of x from each x, so transformed var is original's distance from the mean
just wanna bring observations closer to zero! |
|
Second step when you're not sure if data needs to be transformed
|
look at X-Y scatter -- is there curvature? if so, take transform (maybe log or exp) on Y.... if linear, result: E(log(Y)) = Beta0 + Beta1(X)
can derive meaning of betas like above |
|
how confidence bands change when converting to and from transformation
|
|
|
'Least serious' regression violation vis a vis transformation consideration
|
Non-normality; also happens to usually be remedied automatically after other violations are addressed
|
|
Violations of linearity
|
1) Analytically (Table 6.1 in book; try logit if response is a probability etc.)
2) Numerically (look at scatter & try what looks best!) |
|
Other transformations other than Logarithmic (mostly about boxcox)
|
Ladder of power: exponentiating Y by -1, -0.5, 0.5, 1, 2
Box-Cox transform: attached; NB: If after running boxcox, lambda = 0 can't be rejected, 'simply' use log transform (i.e. Y-prime = log(Y)) if lambda = -1 can't be rejected, use Y-prime = 1/Y (reciprocal) if lambda = 1 can't be rejected, do nothing! Y-prime = Y! after running boxcox, can either use suggested lambda in boxcox fit (last of printout) or can follow above guidlines if last bit didn't reject all hypotheses |
|
How do we spot non-linearities?
|
Scatterplots, partial plots, augmented CPR plots ('acrplot' in STATA), LOOK FOR CURVED PATTERNS IN THESE PLOTS
acpr is augmented partial residual plot, need to do one-by-one if have multiple predictors ',lowes' fits curved line and fitted line |
|
Encountering heteroscedastic errors (POWER TRANSFORM)
|
In applications, when increasing variance over predictor var encountered, st.dev of residuals increases as predictor increases. so we hypothesize that st.dev(error) = k*x
leading to the following transform: Y' = Y/X' where X' = 1/X generate ty = y/x generate tx = 1/x and re-scatter Can try log too to see if it helps but didn't help like this way did in example! next what we did: regress y x (x^2) it worked OK but better to use Y/X and 1/X because it has fewer predictors than log(Y) ~ X + X^2 |
|
Transform options
|
square root, reciprocal (^(-1)), reciprocal of square root!, log, log(sqrt(y))!
~ AVOID UNNECESSARY TRANSFORMATIONS ~ |
|
Box-Cox before and after
|
|
|
Weighted Least Square method
|
another way to correct for heteroscedacity, by assigning weight to each observation's SSE
. regress y x [w=1/x^2] (analytic weights assumed) (sum of wgt is 1.0470e-04) useful when it looks like predictor is influencing magnitude of variance spread; we 'standardize' by dividing by their own variance "each observation is weighted by the estimate of its standard deviation" Computing the weights from the data: how to estimate these within-region variances? Answer: Find the MSE for each region! regress y x1 x2 x3 . predict res, resid . generate r2 = res^2 . egen r = mean(r2), by(region) r represents MSE for each region Weight Least Squares (WLS) divides each term in SSE by r |
|
what's good...
|
Compared with LS, the R2
, F-statistic and the MSE of WLS are much better! Much better t |
|
NB Regarding WLS
|
|