• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/91

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

91 Cards in this Set

  • Front
  • Back
Special case, when the regressor is binary, ie, it can take only one of two variables

Where Di is a dummy variable

DI= 1 if STR < 20
DI= 0 if STR > 20
Regression when X is a binary variable
Binary variables are also known as...
Indicator variables or dummy varibales
Omitting variables generates a bias in OLS estimates if...
1) The omitted variable is a determinant of the dependable variable

2) The omitted variable is correlated with included regressor
The estimators of B0, B1...Bk that minimize the sum of squared mistakes in expression are called...
Ordinary Least Squares
Ideal randomized controlled experiments in economics are
Useful because they give a definition of a causal effect
when the estimated slope coefficeint in the simple regression model B1, is zero,
R2 =0
Which of the folliwng statments is true
TSS=ESS+SSR
In the simple linear regression model, the regression slope
Indicates by how many units Y increases, given a one unit increase in X
The OLS estimator is derived by
Minimizing the sum of squared residuals
to obtain the slope estimator using the least squares principle, you divide the
Sample convariance of X and Y by the sample variance of X
Interpreting the intercept in a sample regrssion function is
Reasonable if your sample contains values of X, around the origin
In the sample linear regression model, Yi=Bo+B1X1+U1
B0+B1X1, represents the populaiton regression function
The T-Statistic is caluculated by
The estimator minus its hypothesized value by the standard error of the estimator
The construction of the t-statistic for a one and a two sided hypothesis
is the same
The 95% CI for B0 is the interval
Provides the range of coefficeints that are likely to show up in 95% of all samples
The OLS residuals, Ui, are defined as follows
Yi-Yi
(there is dash above 2nd Y)
All of the following are desirable properties of the OLS estimators except
The OLS estimators are not efficeint
To test the hypothesis of that the intercept coefficient is zero against the alternatiave that is positive at the 1% level, one has to compare the T-Statistic against
The critical value of -2.33
OVB can occur in regressions with a low R2, a moderate R2, or a high R2. Conversely a low R2 does not imply that there necessarily is OVB; or would be biased toward a negative number, p.191

1) The ommited variable is a deteminant of the dependant variable

2) The excluded variable has to correlated with the included regressor
Omitted Variable Bias
1) U has to have a positive effect of Y and Pxu>0. Positive bias- Overestimate true relation

2) U has a positive effect on Y and Pxu<0. Negative bias-Undersestimate the relationship between Y+X

3) U has a negative effect on Y and PXU>0. Negative bias-Undersestimate true relationship effect of X on Y

4) U has a negative effect on Y and PXU<0
Positive Bias - Overestimate the true effect
4 Possible causes of Ommited variable bias
1) Estimate the regression for subsets of districs with similar fractions of English learners.

- Divide into quartiles, deciles and run regress for the quartiles/Hectiles.

2) Include % of English learners as an additional regressor in the model
How do we hold other things constant?
Binary Variables...
Can take on only two values
Most economic data are obtained
By observing real-world data
Studying inflation in the US from 1970-2006 is an example of using
Time series data
In the simple linear regression model, the regression slope
Indicates by how many units Y increases, given a one unit increases in X
the regression R2 is a meausre of
Goodness of fit of your regression line
to obtain the slope estimator using the least squares principle, you divide
Sample covariance of X and Y by the sample variance of X
In the linear regression model, Yi=B0+B1X1+ui, B0+B1X1 is referred to as
The populaiton regression function
If the absolute value of your caluculated t-statistic exceeds the critical value from the standard normal distribution, you can...
Reject the null hypothesis
the error term is homoskedastic if
var(ui l X1=x) is constant for i=1,...n
Imagine you regressed earnings of individuals on a constant, a binary variable ("Male) which takes on the value 1 for males and is 0 otherwise, and another binary variable ("Female") which takes on the value 1 for females and is 0 otherwise. Because females typlically earn less than males, you would expect
None of the OLS estimators to exist because there is perfect multicollinearity
When there are ommited variables in the regression, which are determinants of the dependant varibale, then...
the OLS estimator is biased if the ommited varibale is correlated with included varibale
For a single restriciton (q=1), the F-Statistic...
Is the square of the T-statistic
All of the following are examples of joint hypothese on multiple regression coefficients, with the exception of...
H0: B1+B2=1
3 Goodness of fit measures
1) SER=Su where
SU=SSR/N-K-1

(SER) Standard error of regression:Estimates the spread of the distribution of YI around the regresson line

2) R2=ESS/TSS=1-SSR/TSS

R2 - Estimates the fraction of the sample variance of Yi that is explained byregressors

3) Adjusted R2-R2 =
1-N-1/N-K-1 SSR/TSS

Adjusted R2 estimates the fraction fo the variance of Y explained by the regressor but correcting/adjusting by a factor that takes into account the number of regressors included R2=1-N-1/N-K-1 x SSR/TSS
Imperfect multicollinearity arises when....
2 or more variables are highly correlated

coefficents will less precise, i.e SE is large

Becomes difficult to do hypothesis tests and estimate confidence intervals
Imperfect multicollinearity doens't generate a problem for the thoery of OLS estimators, but...
the coefficents will be imprecisely estimated
Hypothesis testing for a single coefficent in a multiple regression
(i) Calculate SE(Bk)

(ii) Calcualte T-actual
B0-B1/SE (Bk)

(iii) Compare t-actual with critical value
Learing methods to estimate non-linear functions
1) The first group of methods will allow us to address the effect on Y, of a change in one independant variable, X, when the effect depends on the value of X itself.

2) the second group of methods is useful when considering the effect on Y
of a unit change in X, depends on another indendant varible, eg. x2
Quadratic Regrssion model
Test score = Bo+B1, Incomei+B2Income 2/1+Ui

- Approxiamtes a curve

- B0, B1, & B2 ar eunknown and must be estimated a sample of data

- OLS and methods are the same, since this is simply a variant of the multiple regression model
Quadratic Regression Model...
Test Score=B0+B1 incomei+B2 Income i+Ui
(see notes)
1) Identify possible non-linear relationships

2) Specify non-linear function and estimate using OLS

3) Determine whether non-linear model improves over linear model-use t-stats of F-stats

4) Plot nonlinear regression

5) Estimate effect on Y of change in X
How do we go about modeling non-linearities?
1) Functions involving polynomials or polynomial regression

2) Functions involving logarithms
Aside from a quadratic model, other possible non-linear functions
Yi = β0+β1X1+β2Xi2+β3Xi3 +…βrX1r + Ui
where r denotes the highest power of x included in the regression.

Sequential hypothesis testing to decide how many polynomials to include

If r=2 Quadratic regression model

R=3 Cubic regression model
Polynomial Regression Model
How do we test the linear against polynomial model
H0: B2=0, B3=0, Br=0
H1: one or more of the coefficents is different from 0
Methods to identify effect on Y of a change in X1, when the effect depends on value of X1 itself
a) Polynomial functions

b) Functions that depend on logarithms
Another wayof specifying a nonlinear regression funciton is to use natural logarithm of Y and or X. Logarithms convert changes of variables into percent changes
Logarithm functions
The natural logarithm is written as "ln" and it is the inverse of the....
Exponential function
3 Types of logarithmic regression models
(i) Linear-log model: A change in X1 generates a change in Y of 0, 01* B1

(ii) Log-linear: A change of a 1 unot in X generates a change in Y of 100 * B1%

(iii) Log-log model: A change in 1% of X1 generates a change in Y of B1 %
Effect of Y of change in X, depends on value of X1

a) Polynomial Regression Function

b) Logarithmic Model
Non-linear Regression Model
(i) Both independant variables are binary

(ii) One independant variable is binary and the other one is continuous

(iii) Two independant variables are continuous
Effects on Y of a change in X, when the effect depends on the value of X2 or other x's
(i) Both independant variables are binary

(ii) One independant variable is binary and the other is continuous

(iii) Two or both independant variables are continuous
Methods that allow to examine: The effect on Y of a change in X1 when this effect depends on the value of X2 or other X's
A statistical analysis is internally valid of the statistical regression inferences about causal effects are valid
Internal validity
A statistical analysis is externally valid of its inferences and conclusion can be generated from the population and setting being studies to pther populations and settings
External Validity
1) The estimator of the causal effect is NOT unbiased and or consistant, e.g coeff B STR is biased and or inconsistant

2) Hyposthesis tests do not have the desired significance level and confidence intervals do not have the desired condfidence level
Threats to internal validity
Arises when a variable that is omitted from regression determines Y and is correlated with one or more of hte included regressors

Solutions: Include the omitted variable

(ii) If don't observe the variable → Rely on Panel data (observational at differnt points in time) to control for unobserved omitted variables.

- Use instrumental variables where these are variables that are coreelated with regressor but not with omtted variables

- Design randomized controlled experiments
Biased &/or Inconsistent OLS Estimator:
Omitted Variable Bias
Type of omitted variable bias and all it takes is to include the non-linear terms.

Solution: Include non-linear terms
Misspecification of functional form of the regression function
Regressors are measured with error; (see notes) Chapter 9.

Solutions:
(i) Use instrumental variables which are correlated with regressor of interst but uncorrelated with error term

(ii) Adjust estimates for measurement error; use error measurement error correction formula
Measurement error; Errors in variables
Arises when we seleciton process influences the availability of data and this same process is related to the dependant variable.

Solutions to sample selection:

Include a selection correction term and use information on something that determines selection but is uncorrelated with outcome
Sample selection
so far we assumed that causality goes from X to Y, but in fact causality could run the other way, i.e from Y to X.

Solutions:

(i) Use instrumental varaibles
(ii) Design a randomized experiment
Simultaneous/Reverse Causalty
ommited variable bias occurs when an ommited varible
1) Is correlated with an included regressor

2) Is a determinant of Y
The coefficents in multiple regression can be estimated by OLS...
When the four least square assumptions aresatisfied, the OLS estimators are unbiased, consistent, and normally distributed in large samples
One or more regressors can be expressed as a linear combination of the other variables/regressors

- The model cannot be estimated so need to drop one of the variables

- Dummy Variable Trap: Occurs when include constant ande all possible categories


Which occurs when one regressor is an exact linear funciton of the other regressors, usually arises from a mistake in choosing which regressors to include in a multiple regression. solving perfect multicollinearity requires changing the set of regressors
Perfect Multicollinearity
The standard error of the regrssion,
The R2 and R2 are measures of fit for the multiple regression model, (p.211)
A regression model in which β1 represents the expected change in Y in response to a 1-unit increase in X1 is
Y = β0 + β1X1 + u.
A regression model in which .01β1 represents the expected change in Y in response to a 1% increase in X1 is
Y = β0 + β1 ln(X1) + u.
A regression model in which 100×β1 represents the expected percentage change in Y in response to a 1-unit increase in X1 is
ln(Y) = β0 + β1X1 + u.
A regression model in which β1 represents the expected percentage change in Y in response to a 1% increase in X1 is
d. ln(Y) = β0 + β1 ln(X1) + u.
A quadratic regression...
includes X and X2 as regressors
a) Omitted Variable Bias
b) Misspecification of functional form
c) Measurement error
d) Sample Selection
e) Simultaneous/Reverse Causality
Biased & or Inconsistent OLA estimator
If using homoskedasticity only standard error then get unreliable confidence interval

Solution: Use heteroskedasticity robusr-standard error formula
Inconsistent OLS standard errors: Heteroskedastisicity
If repeated observations for an entities over time e.g. time series data, or panel data then Yi+Xi are not independantly distributed and the standard error formula is incorrect

Solution: Fix formula for standard error to account for serial correlation
Inconsistent OLS standard errors: Serial Correlation of error term
Threats to external validity arise from differnces between the population and setting being studied and th epopulation and setting of interest

(i) Difffernces in population
Ex: Test scores i Virginia and populaiton (Type of people in sample)

(ii) Differnces in settings/Institutions (regulations may be different)
Threats to external validity
_______ allows to obtain consistent estimates when the regressor Xi is correlated with the error term Ui

to understand IV regression, it is useful to think about the variation in Xi as being divided into 2 parts

1) One part, which for whatever reason, is correlated, with error term

2) Another part that is uncorrelated with Ui

"Instrumental Variables" or instruments are variables that allow use to isolate the second component of the variation in Xi

For instruments to work they have to satisfy two condition

(i) Instrument relevance - (Instrument labeled as Zi) there has to a positive correlation between z and x (Corr (ZiXi)/=0

(ii) Instrument Exogenity - Corr (ZiUi) =0
Instrumental variables (IV) regression
_____________ works in two stages (two stage least squares TSLS-estimation Yi=B0+B1Xi+Ui

First stage: Decomomposing X1 into two parts

1) Problematic component correlated with Ui
2) Problem free component (uncorrelated with Ui)

Xi = Pi*`0`+Pi*`1`*Zi+Vi (Problematic component)

While Pi0 and Pi2 are population parameters, we can obtain OLS estiamtors of Pi0 & Pi1 and get a predictor of the problem free componant: Xi - Pi0+Pi1 Zi

Second stage: Run a regression of Yi = Bo+B1X1+Ui E(B1) = B1
Instrumental variables (IV) regression
Why is internal validity an important criterion for evaluating an econometric study?
If an econometric study is not internally valid, then it does not provide valid statistical inferences for causal effects for the population being studied.
Why is external validity an important criterion for evaluating an econometric study?
If an econometric study is not externally valid, then its conclusions cannot be generalized to
other populations.
Key Concept 7.4, p.238 lists four least squares assumptions for the multiple regression model.
Which of these assumptions is/are likely fail in the following circumstances?

a. There is misspecification of the regression’s functional form.
b. The regressors are measured with error.
c. Data are not available when the dependent variable falls in a certain range.
d. There is reverse causation flowing from Y to one of the regressors.
e. The regression error terms are correlated across observations.
a. 1 (An included variable is statistrically significant)
b. 1 (An included variable is statistrically significant)
c. 1 (An included variable is statistrically significant)
d. 1 (An included variable is statistrically significant)
e. 2 (The regressors are a true cause fo the movments in the dependant variable)
Which are especially collected when conducting experiments
Two Sources of Data:
Experimental Data
Which are obtained by observing behavior in the real world using surveys or administrative data or sources
Two Sources of Data:
Observational Data
Is te science of using economy theory combined with statistical techniques to analyze data
Econometrics (Definition)
Data on differerent entites for single period of time
3 Types of Data:
- Cross Sectional
- Times Series
- Panel Data

Cross Sectional Data
Data on a single entry collected over multiple periods of time
3 Types of Data:
- Cross Sectional
- Times Series
- Panel Data

Time Series Data
Data for multiple entries in which each each entry is observed for more than two periods
3 Types of Data:
- Cross Sectional
- Times Series
- Panel Data

Panel Data
Yi Bo+B1X1+Ui
Single Regression Framework
Occurs when two conditions hold

1) The first condition is the ommited variable os a deteminant of the dependable variable, ie, ommited variable is part of ui

2) The ommitted variable is correlated with the included regressor
Ommited Variable Bias
E(YilX1i = x1, X2i = x2) = B0 + B1x1 + B2x2+BkXk

B0 - Expected value of Y when all x's are zero

Bk - Expected change in Y from a unit change in Xk, when holding all other X's constant
Population regression line
Yi = β0+β1X1+β2X2i+β3X1i x X2i +Ui
Interaction regression Models
Including an additional regressor in a regression
Always reduces the sum of squared residuals
Simultaneous causality bias
arises becauses at least one the regressors is correlated with the regression error
If regression error terms are correlated with one another, then...
T-Statistics will not be distributed as standard normal variables in large samples