Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
91 Cards in this Set
- Front
- Back
Special case, when the regressor is binary, ie, it can take only one of two variables
Where Di is a dummy variable DI= 1 if STR < 20 DI= 0 if STR > 20 |
Regression when X is a binary variable
|
|
Binary variables are also known as...
|
Indicator variables or dummy varibales
|
|
Omitting variables generates a bias in OLS estimates if...
|
1) The omitted variable is a determinant of the dependable variable
2) The omitted variable is correlated with included regressor |
|
The estimators of B0, B1...Bk that minimize the sum of squared mistakes in expression are called...
|
Ordinary Least Squares
|
|
Ideal randomized controlled experiments in economics are
|
Useful because they give a definition of a causal effect
|
|
when the estimated slope coefficeint in the simple regression model B1, is zero,
|
R2 =0
|
|
Which of the folliwng statments is true
|
TSS=ESS+SSR
|
|
In the simple linear regression model, the regression slope
|
Indicates by how many units Y increases, given a one unit increase in X
|
|
The OLS estimator is derived by
|
Minimizing the sum of squared residuals
|
|
to obtain the slope estimator using the least squares principle, you divide the
|
Sample convariance of X and Y by the sample variance of X
|
|
Interpreting the intercept in a sample regrssion function is
|
Reasonable if your sample contains values of X, around the origin
|
|
In the sample linear regression model, Yi=Bo+B1X1+U1
|
B0+B1X1, represents the populaiton regression function
|
|
The T-Statistic is caluculated by
|
The estimator minus its hypothesized value by the standard error of the estimator
|
|
The construction of the t-statistic for a one and a two sided hypothesis
|
is the same
|
|
The 95% CI for B0 is the interval
|
Provides the range of coefficeints that are likely to show up in 95% of all samples
|
|
The OLS residuals, Ui, are defined as follows
|
Yi-Yi
(there is dash above 2nd Y) |
|
All of the following are desirable properties of the OLS estimators except
|
The OLS estimators are not efficeint
|
|
To test the hypothesis of that the intercept coefficient is zero against the alternatiave that is positive at the 1% level, one has to compare the T-Statistic against
|
The critical value of -2.33
|
|
OVB can occur in regressions with a low R2, a moderate R2, or a high R2. Conversely a low R2 does not imply that there necessarily is OVB; or would be biased toward a negative number, p.191
1) The ommited variable is a deteminant of the dependant variable 2) The excluded variable has to correlated with the included regressor |
Omitted Variable Bias
|
|
1) U has to have a positive effect of Y and Pxu>0. Positive bias- Overestimate true relation
2) U has a positive effect on Y and Pxu<0. Negative bias-Undersestimate the relationship between Y+X 3) U has a negative effect on Y and PXU>0. Negative bias-Undersestimate true relationship effect of X on Y 4) U has a negative effect on Y and PXU<0 Positive Bias - Overestimate the true effect |
4 Possible causes of Ommited variable bias
|
|
1) Estimate the regression for subsets of districs with similar fractions of English learners.
- Divide into quartiles, deciles and run regress for the quartiles/Hectiles. 2) Include % of English learners as an additional regressor in the model |
How do we hold other things constant?
|
|
Binary Variables...
|
Can take on only two values
|
|
Most economic data are obtained
|
By observing real-world data
|
|
Studying inflation in the US from 1970-2006 is an example of using
|
Time series data
|
|
In the simple linear regression model, the regression slope
|
Indicates by how many units Y increases, given a one unit increases in X
|
|
the regression R2 is a meausre of
|
Goodness of fit of your regression line
|
|
to obtain the slope estimator using the least squares principle, you divide
|
Sample covariance of X and Y by the sample variance of X
|
|
In the linear regression model, Yi=B0+B1X1+ui, B0+B1X1 is referred to as
|
The populaiton regression function
|
|
If the absolute value of your caluculated t-statistic exceeds the critical value from the standard normal distribution, you can...
|
Reject the null hypothesis
|
|
the error term is homoskedastic if
|
var(ui l X1=x) is constant for i=1,...n
|
|
Imagine you regressed earnings of individuals on a constant, a binary variable ("Male) which takes on the value 1 for males and is 0 otherwise, and another binary variable ("Female") which takes on the value 1 for females and is 0 otherwise. Because females typlically earn less than males, you would expect
|
None of the OLS estimators to exist because there is perfect multicollinearity
|
|
When there are ommited variables in the regression, which are determinants of the dependant varibale, then...
|
the OLS estimator is biased if the ommited varibale is correlated with included varibale
|
|
For a single restriciton (q=1), the F-Statistic...
|
Is the square of the T-statistic
|
|
All of the following are examples of joint hypothese on multiple regression coefficients, with the exception of...
|
H0: B1+B2=1
|
|
3 Goodness of fit measures
|
1) SER=Su where
SU=SSR/N-K-1 (SER) Standard error of regression:Estimates the spread of the distribution of YI around the regresson line 2) R2=ESS/TSS=1-SSR/TSS R2 - Estimates the fraction of the sample variance of Yi that is explained byregressors 3) Adjusted R2-R2 = 1-N-1/N-K-1 SSR/TSS Adjusted R2 estimates the fraction fo the variance of Y explained by the regressor but correcting/adjusting by a factor that takes into account the number of regressors included R2=1-N-1/N-K-1 x SSR/TSS |
|
Imperfect multicollinearity arises when....
|
2 or more variables are highly correlated
coefficents will less precise, i.e SE is large Becomes difficult to do hypothesis tests and estimate confidence intervals |
|
Imperfect multicollinearity doens't generate a problem for the thoery of OLS estimators, but...
|
the coefficents will be imprecisely estimated
|
|
Hypothesis testing for a single coefficent in a multiple regression
|
(i) Calculate SE(Bk)
(ii) Calcualte T-actual B0-B1/SE (Bk) (iii) Compare t-actual with critical value |
|
Learing methods to estimate non-linear functions
|
1) The first group of methods will allow us to address the effect on Y, of a change in one independant variable, X, when the effect depends on the value of X itself.
2) the second group of methods is useful when considering the effect on Y of a unit change in X, depends on another indendant varible, eg. x2 |
|
Quadratic Regrssion model
|
Test score = Bo+B1, Incomei+B2Income 2/1+Ui
- Approxiamtes a curve - B0, B1, & B2 ar eunknown and must be estimated a sample of data - OLS and methods are the same, since this is simply a variant of the multiple regression model |
|
Quadratic Regression Model...
|
Test Score=B0+B1 incomei+B2 Income i+Ui
(see notes) |
|
1) Identify possible non-linear relationships
2) Specify non-linear function and estimate using OLS 3) Determine whether non-linear model improves over linear model-use t-stats of F-stats 4) Plot nonlinear regression 5) Estimate effect on Y of change in X |
How do we go about modeling non-linearities?
|
|
1) Functions involving polynomials or polynomial regression
2) Functions involving logarithms |
Aside from a quadratic model, other possible non-linear functions
|
|
Yi = β0+β1X1+β2Xi2+β3Xi3 +…βrX1r + Ui
where r denotes the highest power of x included in the regression. Sequential hypothesis testing to decide how many polynomials to include If r=2 Quadratic regression model R=3 Cubic regression model |
Polynomial Regression Model
|
|
How do we test the linear against polynomial model
|
H0: B2=0, B3=0, Br=0
H1: one or more of the coefficents is different from 0 |
|
Methods to identify effect on Y of a change in X1, when the effect depends on value of X1 itself
|
a) Polynomial functions
b) Functions that depend on logarithms |
|
Another wayof specifying a nonlinear regression funciton is to use natural logarithm of Y and or X. Logarithms convert changes of variables into percent changes
|
Logarithm functions
|
|
The natural logarithm is written as "ln" and it is the inverse of the....
|
Exponential function
|
|
3 Types of logarithmic regression models
|
(i) Linear-log model: A change in X1 generates a change in Y of 0, 01* B1
(ii) Log-linear: A change of a 1 unot in X generates a change in Y of 100 * B1% (iii) Log-log model: A change in 1% of X1 generates a change in Y of B1 % |
|
Effect of Y of change in X, depends on value of X1
a) Polynomial Regression Function b) Logarithmic Model |
Non-linear Regression Model
|
|
(i) Both independant variables are binary
(ii) One independant variable is binary and the other one is continuous (iii) Two independant variables are continuous |
Effects on Y of a change in X, when the effect depends on the value of X2 or other x's
|
|
(i) Both independant variables are binary
(ii) One independant variable is binary and the other is continuous (iii) Two or both independant variables are continuous |
Methods that allow to examine: The effect on Y of a change in X1 when this effect depends on the value of X2 or other X's
|
|
A statistical analysis is internally valid of the statistical regression inferences about causal effects are valid
|
Internal validity
|
|
A statistical analysis is externally valid of its inferences and conclusion can be generated from the population and setting being studies to pther populations and settings
|
External Validity
|
|
1) The estimator of the causal effect is NOT unbiased and or consistant, e.g coeff B STR is biased and or inconsistant
2) Hyposthesis tests do not have the desired significance level and confidence intervals do not have the desired condfidence level |
Threats to internal validity
|
|
Arises when a variable that is omitted from regression determines Y and is correlated with one or more of hte included regressors
Solutions: Include the omitted variable (ii) If don't observe the variable → Rely on Panel data (observational at differnt points in time) to control for unobserved omitted variables. - Use instrumental variables where these are variables that are coreelated with regressor but not with omtted variables - Design randomized controlled experiments |
Biased &/or Inconsistent OLS Estimator:
Omitted Variable Bias |
|
Type of omitted variable bias and all it takes is to include the non-linear terms.
Solution: Include non-linear terms |
Misspecification of functional form of the regression function
|
|
Regressors are measured with error; (see notes) Chapter 9.
Solutions: (i) Use instrumental variables which are correlated with regressor of interst but uncorrelated with error term (ii) Adjust estimates for measurement error; use error measurement error correction formula |
Measurement error; Errors in variables
|
|
Arises when we seleciton process influences the availability of data and this same process is related to the dependant variable.
Solutions to sample selection: Include a selection correction term and use information on something that determines selection but is uncorrelated with outcome |
Sample selection
|
|
so far we assumed that causality goes from X to Y, but in fact causality could run the other way, i.e from Y to X.
Solutions: (i) Use instrumental varaibles (ii) Design a randomized experiment |
Simultaneous/Reverse Causalty
|
|
ommited variable bias occurs when an ommited varible
|
1) Is correlated with an included regressor
2) Is a determinant of Y |
|
The coefficents in multiple regression can be estimated by OLS...
|
When the four least square assumptions aresatisfied, the OLS estimators are unbiased, consistent, and normally distributed in large samples
|
|
One or more regressors can be expressed as a linear combination of the other variables/regressors
- The model cannot be estimated so need to drop one of the variables - Dummy Variable Trap: Occurs when include constant ande all possible categories Which occurs when one regressor is an exact linear funciton of the other regressors, usually arises from a mistake in choosing which regressors to include in a multiple regression. solving perfect multicollinearity requires changing the set of regressors |
Perfect Multicollinearity
|
|
The standard error of the regrssion,
|
The R2 and R2 are measures of fit for the multiple regression model, (p.211)
|
|
A regression model in which β1 represents the expected change in Y in response to a 1-unit increase in X1 is
|
Y = β0 + β1X1 + u.
|
|
A regression model in which .01β1 represents the expected change in Y in response to a 1% increase in X1 is
|
Y = β0 + β1 ln(X1) + u.
|
|
A regression model in which 100×β1 represents the expected percentage change in Y in response to a 1-unit increase in X1 is
|
ln(Y) = β0 + β1X1 + u.
|
|
A regression model in which β1 represents the expected percentage change in Y in response to a 1% increase in X1 is
|
d. ln(Y) = β0 + β1 ln(X1) + u.
|
|
A quadratic regression...
|
includes X and X2 as regressors
|
|
a) Omitted Variable Bias
b) Misspecification of functional form c) Measurement error d) Sample Selection e) Simultaneous/Reverse Causality |
Biased & or Inconsistent OLA estimator
|
|
If using homoskedasticity only standard error then get unreliable confidence interval
Solution: Use heteroskedasticity robusr-standard error formula |
Inconsistent OLS standard errors: Heteroskedastisicity
|
|
If repeated observations for an entities over time e.g. time series data, or panel data then Yi+Xi are not independantly distributed and the standard error formula is incorrect
Solution: Fix formula for standard error to account for serial correlation |
Inconsistent OLS standard errors: Serial Correlation of error term
|
|
Threats to external validity arise from differnces between the population and setting being studied and th epopulation and setting of interest
(i) Difffernces in population Ex: Test scores i Virginia and populaiton (Type of people in sample) (ii) Differnces in settings/Institutions (regulations may be different) |
Threats to external validity
|
|
_______ allows to obtain consistent estimates when the regressor Xi is correlated with the error term Ui
to understand IV regression, it is useful to think about the variation in Xi as being divided into 2 parts 1) One part, which for whatever reason, is correlated, with error term 2) Another part that is uncorrelated with Ui "Instrumental Variables" or instruments are variables that allow use to isolate the second component of the variation in Xi For instruments to work they have to satisfy two condition (i) Instrument relevance - (Instrument labeled as Zi) there has to a positive correlation between z and x (Corr (ZiXi)/=0 (ii) Instrument Exogenity - Corr (ZiUi) =0 |
Instrumental variables (IV) regression
|
|
_____________ works in two stages (two stage least squares TSLS-estimation Yi=B0+B1Xi+Ui
First stage: Decomomposing X1 into two parts 1) Problematic component correlated with Ui 2) Problem free component (uncorrelated with Ui) Xi = Pi*`0`+Pi*`1`*Zi+Vi (Problematic component) While Pi0 and Pi2 are population parameters, we can obtain OLS estiamtors of Pi0 & Pi1 and get a predictor of the problem free componant: Xi - Pi0+Pi1 Zi Second stage: Run a regression of Yi = Bo+B1X1+Ui E(B1) = B1 |
Instrumental variables (IV) regression
|
|
Why is internal validity an important criterion for evaluating an econometric study?
|
If an econometric study is not internally valid, then it does not provide valid statistical inferences for causal effects for the population being studied.
|
|
Why is external validity an important criterion for evaluating an econometric study?
|
If an econometric study is not externally valid, then its conclusions cannot be generalized to
other populations. |
|
Key Concept 7.4, p.238 lists four least squares assumptions for the multiple regression model.
Which of these assumptions is/are likely fail in the following circumstances? a. There is misspecification of the regression’s functional form. b. The regressors are measured with error. c. Data are not available when the dependent variable falls in a certain range. d. There is reverse causation flowing from Y to one of the regressors. e. The regression error terms are correlated across observations. |
a. 1 (An included variable is statistrically significant)
b. 1 (An included variable is statistrically significant) c. 1 (An included variable is statistrically significant) d. 1 (An included variable is statistrically significant) e. 2 (The regressors are a true cause fo the movments in the dependant variable) |
|
Which are especially collected when conducting experiments
|
Two Sources of Data:
Experimental Data |
|
Which are obtained by observing behavior in the real world using surveys or administrative data or sources
|
Two Sources of Data:
Observational Data |
|
Is te science of using economy theory combined with statistical techniques to analyze data
|
Econometrics (Definition)
|
|
Data on differerent entites for single period of time
|
3 Types of Data:
- Cross Sectional - Times Series - Panel Data Cross Sectional Data |
|
Data on a single entry collected over multiple periods of time
|
3 Types of Data:
- Cross Sectional - Times Series - Panel Data Time Series Data |
|
Data for multiple entries in which each each entry is observed for more than two periods
|
3 Types of Data:
- Cross Sectional - Times Series - Panel Data Panel Data |
|
Yi Bo+B1X1+Ui
|
Single Regression Framework
|
|
Occurs when two conditions hold
1) The first condition is the ommited variable os a deteminant of the dependable variable, ie, ommited variable is part of ui 2) The ommitted variable is correlated with the included regressor |
Ommited Variable Bias
|
|
E(YilX1i = x1, X2i = x2) = B0 + B1x1 + B2x2+BkXk
B0 - Expected value of Y when all x's are zero Bk - Expected change in Y from a unit change in Xk, when holding all other X's constant |
Population regression line
|
|
Yi = β0+β1X1+β2X2i+β3X1i x X2i +Ui
|
Interaction regression Models
|
|
Including an additional regressor in a regression
|
Always reduces the sum of squared residuals
|
|
Simultaneous causality bias
|
arises becauses at least one the regressors is correlated with the regression error
|
|
If regression error terms are correlated with one another, then...
|
T-Statistics will not be distributed as standard normal variables in large samples
|