Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Applied Regression final

Applied Regression Final

by williams.maddie9, Dec. 2012

Subjects: statistics, regression analysis

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/66

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

66 Cards in this Set

Front
Back

	NB: When you've transformed a variable for regression ...	Make sure you remember the units are transformed, too!
	First step when you're not sure if data set needs to be transformed into linear	'center' the independent variables (predictors): i.e. subtract mean of x from each x, so transformed var is original's distance from the mean just wanna bring observations closer to zero!
	Second step when you're not sure if data needs to be transformed	look at X-Y scatter -- is there curvature? if so, take transform (maybe log or exp) on Y.... if linear, result: E(log(Y)) = Beta0 + Beta1(X) can derive meaning of betas like above
	how confidence bands change when converting to and from transformation
	'Least serious' regression violation vis a vis transformation consideration	Non-normality; also happens to usually be remedied automatically after other violations are addressed
	Violations of linearity	1) Analytically (Table 6.1 in book; try logit if response is a probability etc.) 2) Numerically (look at scatter & try what looks best!)
	Other transformations other than Logarithmic (mostly about boxcox)	Ladder of power: exponentiating Y by -1, -0.5, 0.5, 1, 2 Box-Cox transform: attached; NB: If after running boxcox, lambda = 0 can't be rejected, 'simply' use log transform (i.e. Y-prime = log(Y)) if lambda = -1 can't be rejected, use Y-prime = 1/Y (reciprocal) if lambda = 1 can't be rejected, do nothing! Y-prime = Y! after running boxcox, can either use suggested lambda in boxcox fit (last of printout) or can follow above guidlines if last bit didn't reject all hypotheses
	How do we spot non-linearities?	Scatterplots, partial plots, augmented CPR plots ('acrplot' in STATA), LOOK FOR CURVED PATTERNS IN THESE PLOTS acpr is augmented partial residual plot, need to do one-by-one if have multiple predictors ',lowes' fits curved line and fitted line
	Encountering heteroscedastic errors (POWER TRANSFORM)	In applications, when increasing variance over predictor var encountered, st.dev of residuals increases as predictor increases. so we hypothesize that st.dev(error) = k*x leading to the following transform: Y' = Y/X' where X' = 1/X generate ty = y/x generate tx = 1/x and re-scatter Can try log too to see if it helps but didn't help like this way did in example! next what we did: regress y x (x^2) it worked OK but better to use Y/X and 1/X because it has fewer predictors than log(Y) ~ X + X^2
	Transform options	square root, reciprocal (^(-1)), reciprocal of square root!, log, log(sqrt(y))! ~ AVOID UNNECESSARY TRANSFORMATIONS ~
	Box-Cox before and after
	Weighted Least Square method	another way to correct for heteroscedacity, by assigning weight to each observation's SSE . regress y x [w=1/x^2] (analytic weights assumed) (sum of wgt is 1.0470e-04) useful when it looks like predictor is influencing magnitude of variance spread; we 'standardize' by dividing by their own variance "each observation is weighted by the estimate of its standard deviation" Computing the weights from the data: how to estimate these within-region variances? Answer: Find the MSE for each region! regress y x1 x2 x3 . predict res, resid . generate r2 = res^2 . egen r = mean(r2), by(region) r represents MSE for each region Weight Least Squares (WLS) divides each term in SSE by r
	what's good...	Compared with LS, the R2 , F-statistic and the MSE of WLS are much better! Much better t
	NB Regarding WLS	The WLS returns the residual using y y^. But our fitted y^ is obtained by WLS. To check residuals, we need to obtain the weighted residuals.�
	autocorrelation	if observations follow a natural order ---> have patterns, the errors are correlated and correlation is called 'autocorrelation'
	Common autocorrelation patterns
	Common causes of autocorrelation	1. same observations measured in adjacent periods (stock market today and tomorrow) 2. spatially close observations (temp in NC and VA) 3. another important predictor omitted
	How autocorrelation and heteroscedacity really endangers Ordinary Least Squares regression	1. OLS estimates are unbiased but not efficient so no longer minimize SSE 2. sigma^2 and s.e. of betas may be way over- or under-estimated. Positive correlation -> under-estimation of s.e. -> increased false positives ==> C.I., OTHER TESTS INVALID
	Time series & Runs	Time series (ordered) data -> index plot (standardized residual vs time) IF errors are correlated, plot will display clusters of + or - data called 'runs'; sequence plot shows runs under H0: no autocorrelation, expected number of runs / variance of statistic is as attached 'runtest' in STATA tests if observed # runs is different from expected; " thresh(0) just means 0 is the threshold or fulcrum for +/- whereas median is default threshold "
	Durbin-Watson test	for detecting serial correlation; based on assumption that successive errors are correlated in the following, where et is the t-th OLS residual. H0: rho = 0 => errors are uncorrelated
	Approximations and tips on Durbin-Watson statistic	when d is near 2, there isn't much autocorrelation in the error, but if d is far from 2, it suggests autocorrelation is present
	How close is 'close to 2' for D-W statistic?	for positive autocorrelations (d is below 2): d < dL => reject null (that there is NO positive autocorrelation) d > dU => we fail to reject the null dL < d < dU => test is inconclusive dL and dU are in tables, vary by n, p *FOR NEGATIVE AUTOCORRELATION, "WORK WITH (4-d) and use the same procedure as above"
	You can remove autocorrelation by transformation!	Cochrane-Orcutt transformation is one way to remove autocorrelation: 'prais' command in STATA; 'two' stops procedure after first estimate of p OR you can let the iteration run until convergence we generate residual from et and e(t-1), generate latter from lagged variable "L.r2"
	Remember that one cause of autocorrelation is...	artificial: the omission of another predictor variable
	Limitations of DW statistic	When TWO LEVELS OF AUTOCORRELATION PRESENT, DW statistic only look at the overall autocorrelation, and it can't detect higher level of autocorrelation. Thus, high DW statistic does not necessarily indicate no autocorrelation. i.e. omitted season in ski rental data; regressing with season will fix autocorrelation which is different in ski season and warm season
	Multicollinearity	The symptom: p-values for betas in regression are insignificant but F-statistic is signaling signficance! ==> multicollinearity, which is when some of the 'independent' predictors are correlated multicollinearity almost always exists but concern is about severity
	If all predictors in a model are independent...	we call the predictors 'orthogonal'
	Multicollinearity is not...	an error! is from lack of info in data set... what if X3 can be approx. expressed as combo of X1 and X2 ? etc.
	Collinearity	when one variable can be expressed completely as function of other variables; "most serious case"; HAVE TO DROP VAR FROM MODEL! stata autodetects perfect collinearity and makes decision about what var to drop
	What if we ignore multicollinearity?	If mild, major consequence is that CIs are wider than usual ("more conservative tests"), but if multicollinearity is too severe CIs are TOO big and Beta estimates are uselessly haphazard
	If we add a predictor and C.I. blows up in size / changes other estimates drastically, then	new predictor is highly correlated with old one(s) need assumption of independence b/c we assume beta represents roc of xn on y WITH ALL OTHER VARS HELD CONSTANT
	When is multicollinearity "serious" and how do we detect it?	If F-test is significant and ALL individual t-tests are INsignificant, multicollinearity is likely if fewer t-tests are insignificant, probably just means that some variables are non-important another sign is if a predictor has a coefficient w/ a sign that is opposite of sense/context/expectation
	BEWARE: multicollinearity and pairwise combos
	How to combat fact that multicollinearity can arise out of any linear combo of predictor vars and so is impossible to screen with just pairwise scatter?
	tolerance	tolerance for predictor Xj measures fraction of unexplained variance (1-(R^2)j ) in Xj after adjustment for other variables
	VIF	variance inflation factor; 1/tolerance VIF > 10 suggests multicollinearity
	Once we detect multicollinearity...	First: if Xj is collinear with other variables => Xj is basically redundant So: 1) can OMIT predictor (~ fitting model with Beta-j = 0 ) 2) can CENTER near mean before constructing powers and interaction terms... will reduce quadratic/interaction collinearity coming with bigger (uncentered) coefficients 3) get more data / study up on mechanism 4) CONSTRAINED REGRESSION
	Constrained regression	realize that there's a conceptual reason that your residual plot looks like this (i.e. distinctive change in error correlation around point), DO SEPARATE ANALYSIS! here, about x = 12 Next, make sure R^2 is high and predicted Betas all make sense conceptually... if concept-check turns out with something unreasonable, it's v. likely it's highly correlated check with pwcorr; VIF -- > if high, run separate regressions on each var that has high VIF HERE IT IS: instead of omitting one regressor, we keep both of them in 'constrained' context
	How constrained regression works	Constrained regression avoids collinearity by using a same parameter for two predictors. Relationship of the two coefficients is predefined by the constraint function.
	Variable Selection	How do we pick between a bunch of OK models? Various methods: F-test (if nested), adj-R^2, Mallow's Cp, likelihood-based criteria like AIC and BIC. Gets annoying tho! n variables --> 2^n possible models
	How to be efficient in var selection?	What is model being used for? Exploration (building simple -> complicated, considering variables' conceptual inclusion), predictive (already know what's up, not thinking about vars much), explanatory (parsimonious variable choosing here, worried about confounding, i.e. corr. vs causation ) ULTIMATELY we want some of all => minimize bias^2 + variance == Mean Square Error model should be easy to interpret and watch out for confounders!
	Stepwise regression	USE W/ CAUTION!
	Stepwise regression: forward selection	4. With new model, return to step 2 The probability to enter option, pe, was set to .99 so that all of the variables would enter and their order of entry depends on their signifcance. The procedure is repeated until adding any other remaining variables would have the added variable's p-value being > pe.
	AIC and BIC
	Stepwise regression: Backward selection	The probability to remove option, pr, was set to .01 so that all of the variables except the last one would be removed and their order of removal also depends on their signicance. The procedure is repeated until the p-values of the variables in the model are all smaller than pr. The probability to remove option, pr, was set to .33 to correspond to a t-statistic of 1.0
	Warning about stepwise regression:	Completely meaningless results can happen. Unless you can use theory and common sense to justify the resulting models, don't rely on this method. NB: pr should be > pe
	Robustness	find a relationship that holds for most of the data and isn't excessively influenced by small # of deviant data mean is NOT robust estimate of center, but median is
	Least Median of Squares regression	Robust version of mean of squared residuals in Ordinary Least Regression (which minimizes " " )
	Robust regression	another alternative to Least Squares regression, better dealing with data contamination like outliers or overly influential observations; 'rreg' in stata messes with weights of data so that data with Cook's D > 1 are excluded from robust analysis doesn't assume normality, allows for outliers
	robust regression vs OLS	In OLS regression, all cases have a weight of 1. Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions. Try robust, if it's close to OLS go back to OLS model and treat the robust regression as a check for outliers (negative!)
	Balancing weight in robust regression
	Poisson regression	used to model count variables as outcome (can't be negative), aka 'log-linear' N.B. The 'i.' before a categorical var indicates that it is a factor variable (i.e., categorical variable), and that it should be included in the model as a series of indicator variables.
	Big assumption of Poisson	recall mean and variance of Poisson are the same => BIG assumption: conditional on predictors, est. mean and est. var are the same
	estat gof	goodness of fit chi square; is is not a test of the model coefficients, but a test of the model form: does the Poisson model form fit our data? Large p-value indicates good t (i.e. H0 is that model fits)
	Chi-square goodness-of-fit test indicate bad fit?	Try to figure if there are missing predictors, if linearity assumption holds, or/and if conditional mean and conditional variance are very different
	In count situation, what if conditional mean and variance aren't the same?	Can try to fit negative binomial
	Poisson (for counts) regression summary	Poisson model characterized by v. Strong assumption that conditional variance of outcome variable equals the conditional mean if bad fit, first check that model is "appropriately specified" i.e. no omitted variables or bad functional forms if modeled bona fide, assumption that conditional var = conditional mean should be checked Poisson regression estimated with MLE, * requires large sample *
	Logistic regression	for binary situations (binomial); Y ~ Bin(n, pi) logit(pi) called 'log odds' and pi / (1 - pi) called 'odds'
	Generalized linear models and the 'link' function
	Commonly used link functions	identity link: f(t) = t; if Y ~ normal => OLS regression log link: f(t) = log(t); if Y ~ Poisson => Poisson regression logit link: see attached; if Y ~ Binomial => logistic regression Cf. command 'glm'
	Interpreting logistic regression results
	Relationship between 'odds' and Betas (logistic regression output)
	Multiple logistic regression	How do we interpret Beta-hats here? analog to the multiple linear regression: Beta-hat is interpreted as the partial log-odds ratio. What Beta-hat_j is, in words: when comparing the odds of two groups that differ by only one unit of Xj, Beta-hat_j is the log of the odds ratio between those two groups.
	Multiple logistic regression if interaction present
	Logistic regression for grouped data	mind that in both cases you want fitted values between 0 and 1 (because est. prob)
	Logistic reg for grouped data vs ordinary logistic reg

Share This Flashcard Set

Set the Language

Related Flashcards

Applied Regression Final

Add to Folders

Upgrade to Cram Premium

Card Range To Study

66 Cards in this Set