Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
200 Cards in this Set
 Front
 Back
Econometrics 
The application for statistical and mathematical theories to economics for the purpose of testing hypotheses and forecasting future trends; takes economic models and tests them through statistical trials; the branch of economics concerned with the use of mathematical methods in describing economic systems


Tools of Econometrics

Econometrics that uses tools such as frequency distributions, probability and probability distributions, statistical inference, simple and multiple regression analysis, simultaneous equations models, and time series methods


Limitations to the Collection of Data Process in Econometrics

(1)Aggregation of Data
(2)Statstically correlated but economically irrelevant variables (3)Qualitative Data (4)Classical Linear Regression Model Assumption Failures 

Linear Regression

A statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables; linear regression uses one independent variable to explain and/or predict the outcome of a dependent variable


Ordinary Least Squares

A statistical technique to determine the line of best fit for a mode; a straight line is sought to be fitted through a number of points to minimize the sum of the squares of the distances from the points to the line of best fit


Typical Data Types in Econometrics

(1) Cross Sectional
(2) Time Series (3) Panel/Longitudinal 

Cross Sectional Data

Consists of measurements for individual observations at a given point in time; tend to be popular in labor economics, industrial organization, urban economics, and other microbased fields; data typically collected through surveys


Time Series Data

Consists of measurements on one or more variables over time; a sequence of data points measured in successive points in time at uniform time intervals; often used for examining seasonal trends and adjustments; data often collected by government agencies


Panel/Longitudinal Data

Consist of time series for each crosssectional unit in a sample; involves repeated observations of the same variables over a period of time; data typically collected through surveys


2 CLRM Assumption Failures

(1) Inability to account for heteroskedasticity
(2) Inability to account for autocorrelation 

Heteroskedasticity

The standard deviations of a variable, monitored over a specific amount of time, are nonconstant; volatility isn't constant


Autocorrelation

Also known as serial correlation, may exist in a regression model when the order of the observations in the data is relevant; refers to the correlation of a time series with its own past and future values; also known as lagged correlation; complicates teh application of statistical tests by reducing the number of independent observatiosn


Positive Autocorrelation

A form of "persistence" whereby a system has a tendency to remain in the same state from one observation to the next


Data Mining

Using statistics to find models that fit data well; approached viewed unfavorably in economics


Dummy Variables

A variable that takes on the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome; a true/false variable


Random Variables

A variable whose value is subject to variations due to chance; conceptually does not have a single, fixed value (even if unknown), rather it can take on a set of possible different values, each with an associated probability; uncertain values


Discrete Variable

Variables that only take on a finite number of values(thus, all qualitative variables and some quantitative variables); can be described by integers and the outcomes are countable


Continuous Variable

Variables that can take on any value in a certain range; infinite and noncoutable


Probability Density Functions (PDF)

Shows the probabilities of a random variable for all its possible values; a function that describes the relative likelihood for this random variable to take on a five value


Normal Distribution

Also known as the Gaussian or continuous probability distribution; plots all values in a symmetrical fashion and most of the results are situated around the probability'smean


Normal Distribution Characteristics

(1) Total area under the curve equals 1
(2) About 68% of the density is within one standard deviation of the mean (3) About 95% of the density is within two standard deviations of the mean (4) About 99.7% of the density is within three standard deviations of the mean (5) Because a continuous random variable can take on infinitely many values, the probability that a specific value occurring is zero 

Cumulative Density Function (CDF)

The sum or accrual of probabilities up to some value; gives the area under the probability density function (PDF) from infinity to X.


Integration (Calculus)

Allows us to find densities under nonlinear functions


Differential (Calculus)

Concerns the rates of change and slopes of curves


Bivariate or Joint Probability Density

Provides the relative frequencies or chances that events with more than one random variable will occur; the probability that two events will occur simultaneously


Marginal (Unconditional) Probability

The probability of the occurrence of a single event; the probability of one variable taking a specific variable irrespective of the values of the others


Conditional Probabilities

Calculate the chance that a specific value for a random variable will occur given that another random variable has already taken a value; requires both joint and marginal probabilities in order to calculate


Independent Events

One event has no statistical relationship with the other event; to check for independence observe that the probability of one event is unaffected by the occurrence of another event; independent events if the conditional and unconditional probabilities are equal


Expected Value

Mean of a random variable, provides a measure of central tendency or one measurement of where the data tends to cluster; the sum of all variables and their respective probabilities (with continuous variables, it is the derivative of the sum of all variables and their probabilities)


Variance

A measure of dispersion, or how far a set of numbers is spread out; the square of the standard deviation; the average squared difference between the value of a random variable and its mean


Covariance

Measures how two variables are related: 0 if the variables are independent or no clear relation between the two, + if there is a direct relationship,  if there is an inverse relationship; DOES NOT PROVIDE INFORMATION AS TO THE STRENGTH OF THE RELATIONSHIP OF TWO VARIABLES, JUST THE DIRECTION OF THE RELATIONSHIP


Standard Deviation

The square root of the variance; a measure of the dispersion of a set of data from its mean; commonly reported because it is measured in the same units as the random variable


Correlation

Measures the strength of the relationship between two variables; can only identify linear relationships (other techniques available for nonlinear relationships)


Statistical Inference

Focuses on the process of making generalizations for a population from sample information


Descriptive Statistics

Measurements that can be used to summarize your sample data and subsequently make predictions about your population of interest; quantitatively describe the main features of a collection of data


Parameter

A numerical characteristic of a population, as distinct from a statistic of a sample


Estimators

Calculating descriptive measures using sample data


Point Estimate

Calculating a statistic with data from a random sample produces this; a single estimate of a population parameter


Unbiased Estimator

If in repeated estimations using the same calculation method, the mean value of the estimator coincides with the true parameter value


Efficient Estimator

An estimator that achieves the smallest variance among estimators of its kind


Linearity Estimators

An estimator has this property if a statistic is a linear function of the sample observations


Consistent Estimator

An estimator that approaches the true parameter value as the sample size gets larger and larger; this is known as an asymptotic property it gradually approaches the true parameter value as the sample size approaches infinity


Linear Combination of Normally Distributed Random Variables

If a random variable is a linear combination of another normally distributed random variable(s), it also has a normal distribution


Standard Normal Distribution

A normal distribution with a mean of 0 and a standard deviation of 1; useful because any normally distributed random variable can be converted to this scale allowing for quick and easy computation of probabilities; denoted by the letter "Z"


Z Score

The Z value or Z score is obtained by dividing the difference of a measurement and the mean by the standard deviation; this translates the variable into an easily measurable form and the probability will now be obtainable using a table


Sampling Distribution

A probability distribution or density of a statistic when random samples of size "n" are repeatedly drawn from a population it is not the distribution of the sample measurements


Central Limit Theorem (CLT)

A statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample's size.
Distributions of sample means can thus be converted to standard normals 

Chisquared Distribution

A probability density function that gives the distribution of the sum of the squares of several independent random variables each with a normal distribution with zero mean and unit variance; the higher the degrees of freedom (or more observations), the less skewed (and more symmetrical) the distribution
The sum of the squares of several independent standard normal random variables is distributed according to the chisquared distribution with "k" degrees of freedom (or "k" number of variables) 

Uses of Chisquared Distribution

Used for comparing estimated variance values from a sample to those values based on theoretical assumptions; used to develop confidence intervals and hypothesis tests for population variance


t Distribution

Probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown
If we take a sample of "n" observations from a normal distribution with fixed unknown mean and variance, and if we compute the sample mean and sample variance of these "n" observations, the the tdistribution can be defined as the distribution of the location of the true mean, relative to the sample mean and divided by the sample standard deviation after multiplying by the normalizing term (SQRT("n")). Used to estimate how likely it is that the true mean lies in any given range 

Characteristics of tDistribution

Bell shaped, symmetrical around zero, approaches a normal distribution as the degrees of freedom (number of observations) increases; the ratio of the standard normal to the square root of your chisquared distribution


F Distribution

A ratio of two chisquared distributions divided by their respective degrees of freedom; as degrees of freedom in the numerator and denominator increase, the distribution approaches normal
A probability density function that is used especially in analysis of variance, and is a function of the ratio of two independent random variables each of which has a chisquare distribution and is divided by its number of degrees of freedom 

Methods of Hypothesis Test

(1) Z distribution
(2) t distribution (3) Chisquared distribution (4) F distribution 

Null Hypothesis (Ho)

General or default position no relationship between two measured phenomenon; an assumption or prior belief about a population parameter to be tested; attempted to be overturned by our hypothesis test


Alternative Hypothesis

Reflects that there will be an observed effect for our experiment; tested against the null hypthesis


Steps to Performing a Hypothesis Test

(1) Estimate the population parameters using sample data
(2) Determine the appropriate distribution (3) Calculate an interval estimate or test statistic (4) Determine the hypothesis test outcome 

Point Estimation

Involves the use of sample data to calculate a single value (statistic) which is to serve as a best estimate of an unknown population parameter; a single estimate of your parameter of interest


Confidence Interval Approach to Testing Hypotheses

A type of interval estimate of a population parameter that is used to indicate the reliability of an estimate; an observed interval that frequently includes the parameter of interest if the experiment is repeated


Significance Test Approach to Testing Hyptheses

Allows an analyst to estimate how reliably the results derived from a study based on a randomly selected sample can be generalizable to the population from which the sample was drawn; a result that is statistically significant is a result not likely to occur randomly, but likely to be attributable to a specific cause


Measurements of Significance and Confidence

Alpha and (1Alpha), respectively


Examples of Hypothesis Tests Situations

Value of one mean: Z
Value of one mean with unknown pop variance: t Value of variance: Chisquared Comparing two means: t Comparing two variances: F 

Details on the Confidence Interval Approach

Calculate a lower limit and an upper limit for a random interval and attach some likelihood that the interval contains the true parameter value; if the hypothesized value for your parameter of interest ins in the critical region (outside of the confidence interval 1Alpha), then you reject the null hypthesis


Details on the Significance Test Approach

Calculate a test statistic and then compare the calculated value to the critical value from one of the probability distributions to determine the outcome of your test; if the calculated test statistic is in the critical region, you reject the null hypothesis and you can also say that your test is statistically significant


pvalue

The level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event; the smaller the pvalue, the more strongly the test rejects the null hypothesis
The lowest level of significance at which you could reject the null hypothesis given your calculated statistic 

Type I Error

Rejecting a null hypothesis that is in fact true
Increasing the value of alpha (the level of significance) increases the chance of rejecting the null hypothesis and the chance of committing a Type I error 

Type II Error

Failing to reject a null hypothesis that is in fact false
Reducing the value of alpha (the level of significance) increases the chance of failing to reject the null hypothesis and the chance of committing a Type II error 

Use of Econometrics Techniques

Determining the magnitude of various relationships that economics introduces; used to predict or forecast future events and explaining how one or more factors affect some outcome of interest


Model Specification

Selecting an outcome of interest or dependent variable (Y) and one or more independent factors (or explanatory variables X); the determination of which independent variables should be included in or excluded from a regression equation


Overspecification

Using or including numerous irrelevant variables in the model


Spurious Correlation

A correlation between two variables that does not result from any direct relation between them but from their relation to other variables; variables coincidentally have a statistical relationship but one doesn't cause the other; correlation can never be proven by statistical results in any circumstance


Population Regression Function (PRF)

Defines in a mathematical function perception of reality


Setting up a PRF Model

(1) Provide the general mathematical specification of the model
Denotes dependent variable and all independent variables (2) Derive the econometric specification of your model Develop a function that can be used to calculate econometric results (3) Specify the random nature of your model Introduce an error variable 

Conditional Mean Operator

Indicates that the relationship is expected to hold, on average, for given values of independent variables


Constant

The expected value of the dependent variable (Y) when all independent variables (X) are equal to 0


Stochastic Population Regression Function

A function that introduces a random error term associated with the observation


Stochastic

Random; involving a random variable; involving chance or probabilty


Random Error

Results from:
(1) insufficient or incorrectly measured data (2) A lack of theoretical insights to fully account for all the factors that affect the dependent variable (3) Applying the incorrect functional form (4) Unobservable characteristics (5) Unpredictable elements of behavior 

Pooled CrossSectional Data

Combines independent crosssectional data that has been collected overtime; an event study is pooled cross sectional data


Regression Analysis

Techniques that allow for estimation of economic relationships using data; used for estimating the relationships among variables


Least Squares Principle

The Sample Regression Function should be constructed (with the constant and slope values) so that the sum of the squared distance between the observed values of your dependent variable and the values estimated from your SRF is minimized.


Why OLS is most popular method for estimating regressions

(1) Easier than other alternatives
(2) Sensibility (3) Has desirable characteristics 

Numerical Properties of OLS

(1) The regression line always passes through the sample means of Y and X
(2) The mean of the estimated (predicted) Y is equal to the mean value of the actual Y (3) The mean of the residuals is 0 (4) The residuals are uncorrelated with the predicted Y (5) The residuals are uncorrelated with observed values of the independent variable 

Residuals

The difference between the observed value and the estimated function value; distance from data points to the regression line


Multiple Regression

A regression model that contains more than one explanatory variable


Slope Coefficients

Tell the estimated direction of the impacts that the independent variables have on the dependent variables, and also show by how much the dependent variable changes (value or magnitude) when one of the independent variables increases or decreases


Partial Slope Coefficients

The coefficients of the independent variables in a multiple regression; provide an estimate of the change in the dependent variable for a 1unit change in the explanatory variable, assuming the value of all other variables in the regression model hold constant


Issues when Comparing Regression Coefficients

(1) In standard OLS regression, the coefficient with the largest magnitude is not necessarily associated with the "most important" variable
(2) Coefficient magnitudes can be affected by changing units of measurement; scale matters (3) Even variables measured on similar scales can have different amounts of variability 

Standardized Regression Coefficients

The calculation of standardized regression coefficients allows for the comparison of coefficient magnitudes in a multiple regrssion


Methods of Calculating Standardized Regression Coefficients

(1) Calculating a Zscore for every variable of every observation and then performing OLS with the Z values rather than the raw data
(2) Obtaining the OLS regression coefficients using the raw data and then multiplying each coefficient by the standard deviation of the X variable over the standard deviation of the y variable 

Beta Coefficients

Standardized regression coefficients; not to be confused with Beta in finance; unfortunate name as the Greek letter Beta is also used for regular OLS coefficients
Estimates the standard deviation change in the dependent variable for a 1standard deviation change in the independent variable, holding other variables constant 

Goodness of Fit

Describes how well a statistical model first a set of observations; generally requires the decomposing of the variation in the dependent variable into explained and unexplained (residual) parts, then using Rsquared to make measurement


Coefficient of Determination

Rsquared; indicates how well data points fit a line or curve; the measure of fit most commonly used with OLS regression


Explained and Residual Variation

Explained variation is the difference between the regression line and the mean value. Residual/unexplained variation is the difference between the observed value and the regression line.


RSquared

Measures the proportion of variation in the dependent variation in the independent variables; a ratio between 0 and 1; equals the explained sum of squares over the total sum of squares; maximizing RSquared means the line has a good fit because we seek to minimize the residual sum of squares (the closer to 1, the better); can only remain the same or increase as more explanatory values are added


Adjusted RSquared

This is an attempt to take account of the phenomenon of the RSquared automatically increasing hen extra explanatory variables are added to the model; this variable includes a "degrees of freedom penalty" which maintains a reputable value considering the number of explanatory values used; may increase, decrease, or remain the same as more explanatory values are added


Reasons to Avoid using RSquared as the only measure of a Regression's Quality

(1) A regression may have a high RSquared but have no meaningful interpretation because the model equation isn't supported by economic theory or common sense
(2) Using a small data set or one that includes inaccuracies can lead to a high RSquared value but deceptive results (3) Obsessing over RSquared may cause you to overlook important econometric problems High RSquared values may be associated with regressions that violate assumptions; in econometric settings, RSquared values too close to 1 often indicate something is wrong 

OLS Assumptions (or the Classical Linear Regression Model)

(1) The model is linear in parameters and has an additive error term
(2) The values for the independent variables are derived from a random sample of the population and contain variability (3) No independent variable is a perfect linear function of any other independent variable(s) (no perfect collinearity) (4) The model is correctly specified and the error term has a zero conditional MEAN (not necessarily sum) (5) The error term has a constant variance (no heteroskedasticity) (6) The values of the error term aren't correlated with each other (no autocorrelation or no serial correlation) 

Linear Functions vs. Linear in Parameters

A function doesn't need to be linear in order to be applicable to the OLS, but the parameters must be; in other words, the formula can have exponents, but the parameters (Betas) cannot be the exponents (a log transformation may be used in order to linearize this type of function)


Multicollinearity

When two or more predictor variables in a multiple regression are highly correlated, meaning that one can be linearly predicted from the others with a nontrivial degree of accuracy; in this case, the CLRM and OLS should not be used


Misspecification

When you fail to include a relevant independent variable or you use an incorrect functional form; along with restricted dependent variables (qualitative or percent scale data) may lead to the failing of a CLRM assumption


Homoskedasticity

A situation in which the error has the same variance regardless of the value(s) taken by the independent variable(s)


GaussMarkov Theorem

States that the OLS estimators are the Best Linear Unbiased Estimators (BLUE) given the assumptions of the CLRM


Biased and Unbiased Statistics

A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest; unbiased statistics represent the population fairly well


Factors Influencing Variance of OLS Estimators

(1) The variance of the error term the larger the variance of the error, the larger the variance of the OLS estimates
(2) The variance of X the larger the sample variance of X, the smaller the variance of the OLS estimates (3) Multicollinearity As the correlation between two or more independent variables approaches 1, the variance of the OLS estimates becomes increasingly large and approaches infinity (less efficient) 

Efficient Estimators

The lower the variance of a variable, the more efficient; sometimes, a balance must be struck between inefficient vs efficient and biased vs unbiased estimators (it may be better to accept an biased estimator if it is more efficient than an unbiased estimate)


Best Linear Unbiased Estimators (BLUE)

Best Achieving the smallest possible variance among all similar estimators
Linear Estimates are derived using linear combinations of the data values Unbiased Estimators (coefficients) on average equal their true parameter values Given the assumptions of CLRM, the OLS estimators are "BLUE": This is the GaussMarkov Theorem 

Consistency

An asymptotic property as the sample size approaches infinity, the variance of the estimator gets smaller and the value of the estimator approaches the true population parameter value
Used when CLRM assumptions fail and the alternative method doesn't produce a BLUE 

Assumption of Normality in Econometrics

For any given X value, the error term follows a normal distribution with a zero mean and constant variance; for large sample sizes, normality is not a major issue because the OLS estimators are approximately normal even if the errors are not normal
If you assume that the error term is normally distributed, that translates to a normal distribution of OLS estimators 

Mean Square Error

A value that provdies an estimate for the ture variance of the error; the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom


Confidence Interval Approach to Statistical Significance

Provides a range of possible values for the estimator in repeated sampling, and the range of values would contain the true value (parameter) a certain percentage of the time; interveals commonly used are 90, 95, and 99; if a hypothesized value is not contained in your calculated confidence interval, then your coefficient is statistically significant


Test of Significance Approach to Statistical Significance

Provides a test statistic that's used to determine the likelihood of the hypothesis; a ttest is generally performed if the tstatistic is in the critical region, then the coefficient is statistically significant


Notation for Statistically Significance

(*) Significant at 10% level
(**) Significant at 5% level (***) Significant at 1% level Reporting of pvalues: the lowest level of significance at which the null hypothesis could be rejected 

Overall (Joint) Significance

Determine if the variation in your Y variable explained by all or some of your variables is nontrivial; uses the Fstatistic


FStatistic

Tests overall (joint) significance; in order to see how changes to your model affect explained variation, you want to comapre the different components of variance can be done by using the Fstatistic and generating an FDistribution


Specification Issues

In regression analysis and related fields such as economics, this is the process of changing a theory into a regression model. This method consists of choosing an appropriate functional form for the model and choosing which variables to add. This is one of the first basic steps in regression analysis. If an estimated model is not specified, it will be inconsistent and biased.


Types of Nonlinear models

(1) Quadratic Functions
(2) Cubic Functions (3) Inverse Functions 

Quadratic Functions in Econometrics

Allows the effect of the independent varaible on the dependent variable to change; as the value of X increases, the impact on the dependent variable increases or decreases; best for finding maximums and minimums; observable in total variable cost and total cost curves


Cubic Functions in Econometrics

Allows for the effect of the independent variable (X) on the dependent variable (Y) to change, but this relationship changes at some unique value of X; often observed in total variable cost curves and total cost curves


Inflexion Point

The point at which a decreasing effect becomes increasing or a decreasing effect becomes increasing; observed in a cubic function


Inverse Functions

Used when the outcome (dependent variable Y) is likely to approach some value asymptotically (as independent variable approaches 0 or infinity); Observable in economic phenomena where the variables are related inversly (inflation and unemployment, price and quantity demanded)


LogLog Model

Useful model when the relationship is nonlinear in parameter because the log transformation generates the desired linearity in parameters; may be used to transform a model that's nonlinear in parameters to one that is linear


Elasticity in a Log Model

The coefficients of a linear model that was derived from a nonlinear model using logarithms represent the elasticity of the dependent variable with respect to the independent variable; the coefficient is the estimated percent change in Y for a percent change in X


LogLinear Model

An econometric specification whereby the natural log values are used for the depedendent varaible Y and the independent variable X is kept in its original scale; often used when the variables are expected to have an exponential growth relationship


Coefficients in a LogLinear Model

The coefficients represent the estimated percent change in your dependent variable for a unit change in your independent variable; the regression coefficients in a loglinear model don't represent slope


LinearLog Model

An economic specification whereby the natural log values are used for the independent variable X and the dependent variable Y is kept in its original scale; typically used when the impact of the indepedent variable on the dependent variable decreases as the value of the indepedent variable increases (similar to a quadratic, but it never reaches a maximum of minimum value for Y; used to model diminishing marginal returns


Coefficients in a LinearLog Model

Coefficients represent the estimated unit change in the dependent variable for a percentage change in the independent variable


Misspecification in Econometric Models

(1) Omitting Relevant Varaibles
(2) Including Irrelevant Variables Note: just because an estimated coefficient doesn't have statistical significance doesn't mean it is irrelevant a well specified model includes both significant and nonsignificant variables 

Regression Specification Error Test (RESET)

A test that can be used to detect specification issues related to omitted variables and certain functional forms; allows you to identify if there is misspecification in yout model, but it doesn't identify the source


Chow Test

A misspecification test that checks for the structural stability of the model; used when the parameters in the model aren't stable or they change


Robustness

Robustness refers to the sensititvity (or rather, the lack therof) of the estimated coefficients when you make changes to your model's specification; misspecification less problematic when the results are robust


Core Variables

Indepedent variables of primary interest


Using mutliple dummy variables

If you J groups of variables, you need J1 dummy variables with 1s and 0s to capture all the qualitative information; thr group that does not hav ea dummy variable is identified when all other dummy variables are 0 (known as the reference or base group)


Interaction Term and Interacted Econometric Model

The product of two independent variables; an interacted econometric model includes a term that is the product of the dummy and quantitative variable for any given observation
This model useful if the qualitative characteristic only contains two groups The inclusion of the interaction term allows the regression function to have a different intercept and slope for each group of indentified dummy variables 

Results of an Econometric Model with a Dummy, Quantitative, and Interaction Terms

(1) One Regression Line The dummy and interaction coefficients are zero and not statistically significant
(2) Two Regression Lines with Different Intercepts, but the Same Slope The coefficient for the dummy variable is significant, but the interaction coefficient is zero (not statistically significant) (3) Two Regression Lines with the Same Intercept but different Slopes: Dummy coefficient zero, interaction coefficient significant (4) Two Regression Lines with Different Intercepts and Slopes Both dummy and interaction coefficients are significant 

Interactive Qualitative Characteristic

An interaction (product) of two dummy variables if you have reason to believe that the simultaneous presence of two (or more) characteristics has an additional influence on your dependent variable


Results of an Econometric Model with Two Dummy Variables and an Interaction Between those Two Characteristics

(1) One Regression Line: Dummy and Interaction are zero
(2) Two Regression Lines: One coefficent significant, the other zero (3) Three Regression Lines: Both Dummies significant, but the interaction coefficient is 0 (4) Four Regression Lines: The dummy coefficients and the interaction coefficients are all significant 

Methods of Testing for Joint Significance Among Dummy Variables

(1) FTest
(2) ChowTest 

Violations of the Classical Regression Model Assumption

(1) Multicollinearity
(2) Heteroskedasticity (3) Autocorrelation 

Types of Multicollinearity

(1) Perfect Multicollinearity (rare)
(2) High Multicollinearity (much more common) 

Perfect Mutlicollinearity

Two or more independent variables in a regression model exhibit a deterministic linear relationship (meaning it is perfectly predictable and contains no randomness)
In a model with perfect multicollinearity, the regression coefficients are indeterminate and their standard errors are infinite 

High Multicollinearity

Results from a linear relationship between your independent variables with a high degree of correlation, but they aren't completely deterministic


Variables Leading to Multicollinearity

(1) Variables that are lagged values of one another
(2) Variables that share a common time trend component (3) Variables that capture similar phenomena 

Typical Consequences of High Multicollinearity

(1) Larger Standard Errors and Insignificant tstatistics
(2) Coefficient estimates that are sensitive to changes in specification (3) Nonsensical coefficient signs and magnitudes 

Measuring the Degree or Severity of Multicollinearity

(1) Pairwise Correlation Coefficients
(2) Variance Inflation Factors (VIF) 

Pairwise Correlation Coefficients

The value of sample correlation for every pair of independent varaibles; for a general rule of thumb, correlation coefficientsaround 0.8 or above may signal a multicollinearity problem
Note that just because the corrleation coefficient isn't near 0.8 or above doesn't mean that you are clear of multicollinearity problems 

Variance Inflation Factor (VIF)

Measures the linear association between an independent variable and all the other independent variables; VIFs greater than 10 signal a highly likely multicollinearity problem, and VIFs between 5 and 10 signal a somewhat likely multicollinearity issue


Resolving Multicollinearity

(1) Acquire more data
(2) Apply a new model (3) Cut the problem variable loose 

"Acquiring More Data" Solution to Multicollinearity

(1) Ensures that multicollinearity not just in your sample
(2) Make sure the population doesn't change (3) For crosssectional data, use more specific data at either current time or future time (4) For timeseries data, increase the frequency of the data 

"Using a New Model" Solution to Multicollinearity

(1) Respecify by log transformations, or reciprocal functions
(2) Use "firstdifferencing" (3) Create a composite index variable 

FirstDifferencing

Involves subtracting the previous value from the current periods value; requires that the variables have variation over time
Disadvantages (1) Losing observations (2) Losing variation in the independent variables (3) Changing the specification (possibly resulting in misspecification bias) 

Composite Index Variable

Combine collinear variables with similar characteristics into one variable; requires that the association between the two variables is logical


Detecting Heteroskedasticity

(1) Examining the residuals graphically
(2) BreuschPagan Test (3) White Test (4) GoldfieldQuandt Test (5) Park Test 

BreuschPagan Test

Allows the heteroskedasticity process to be a function of one or more of the independent variables, and its usually applied by assuming that heteroskedasticity may be a linear function of all the independent variables in the model
Failing to find evidence of heteroskedasticity with BP doesn't rule out a nonlinear relationship between the independent variables and the error variance 

White Test

Allows the heteroskedasticity process to be a function of one or more independent variables; allows the independent variable to have a nonlinear and interactive effect on the error variance
Useful for identifying nearly any pattern of heteroskedasticity, but not useful in showing how to correct the model 

GoldfeldQuandt Test

Assumes that a defining point exists and can be used to differentiate the variance of the error term
Result is dependent on the criteria chosen; often an arbitrary process, so failing to find evidence of hetero doesn't rule out hetero 

Park Test

Assumes that the heteroskedastic process may be proportional to some power of an independent variable
Assumes heteroskedasticity has a particular functional form 

Correcting the Regression Model for the Presence of Heteroskedasticity

(1) Weighted Least Squares
(2) Robust (WhiteCorrected) Standard Errors 

Weighted Least Squares (WLS) Technique

Transforms the heteroskedastic model into a homoskedastic model by using info about the nature of heteroskedasticity; divides both side of the model by the component of heteroskedasticity that gives the error term a constant variance
Corrected coefficients should be near the OLS coefficients or the problem may have been something other than heteroskedasticity 

Robust (WhiteCorrected) Standard Errors

The most popular remedy for heteroskedasticity; uses the OLS coefficient estimates but adjusts the OLS standard errors for hetero without transforming the model being estimated; makes no assumptions about the functional form of the heteroskedasticity


No Autocorrelation

A situation where no identifiable relationship exists between the values of the error terms among data, or the correlation and covariance among error terms are 0; the positive and negative error values are random


Positive Autocorrelation

A situation where the correlation from one error term to the next is positive; more common and more likely than negative autocorrelation


Negative Autocorrelation

A situation where the correlation from one error term to the next is negative; an unlikely situation


Sequencing

An autocorrelation situtation where most positive error terms are followed or preceded by additional positive errors or when most negative errors are followed or preceded by other negative terms


Analyzing Residuals to Test for Autocorrelation

(1) Graphical inspection of the residuals
(2) The "run test" (or "Geary test") (3) DurbinWatson (4) BreuschGodfrey 

Run Test

A "run" is defined as a sequence of positive or negative residuals
This test involves observing the number of runs within your data and determining whether this is an acceptable number of runs based on your confidence interval If the number of observed runs is below the expected interval, there is evidence of positive autocorrelation; if above expected interval, evidence of negative autocorrelation 

dStatistic

A test statistic developed in the DurbinWatson test in order to detect the presence of autocorrelation (but only identifying first order autoregression)


Remedying Harmful Autocorrelation

(1) Feasible Generalized Least Squares (FGLS)
(2) Serial correlation robust standard errors 

Feasible Generalized Least Squares (FGLS) Techniques and Description

(1) CochraneOrcutt Transformation
(2) PraisWinsten Transformation The goal of these models is to make the error term in the original model uncorrelated; involves quasidifferencing which subtracts the previous value of each variable scaled by the autocorrelation parameter 

Serial Correlation Robust Standard Errors

Allows for the biased estimates to be adjusted while the unbiased estimates are untouched, thus no model transformation is required
Adjusting the OLS standard errors for autocorrelation produces serial correlation robust standard errors (NeweyWest standard errors) 

Linear Probability Model (LPM)

Using the OLS technique to estimate a model with a dummy dependent variable creates this model


Three Main LPM Problems

(1) Nonnormality of the error term
(2) Heteroskedastic errors (3) Potentially nonsensical predictions 

Nonnormality of the error term in LPM

The error term of an LPM has a binomial distribution; implies that the ttests and Ftests are invalid; because the error term will be the point on the line to either 1 or 0, it cannot have a normal distribution


Heteroskedasticity in LPM

The variance of the LPM error term isn't constant; the variance of an LPM error term depends on the value of the independent variables


Probit and Logit Models

Used instead of OLS for situations involving a qualitative independent variable; the conditional probabilities are nonlinearly related to the independent variable(s); both models asymptotically approach 0 and 1, so the predicted probabilities are always sensible, unlike the OLS for qualitative variables which has probabilities extending beyond 0 and 1
Probit is based off the standard normal function while the logit is based off the logistic CDF 

Maximum Likelihood (ML) Estimation

Chooses values for the estimated parameters that would maximize the probability of observing the Y values in the sample with the given X values; calculates the joint probability of observing all values of the dependent variable assumes each observation is drawn randomly and independently from the population


Limited Dependent Variables

Arise when some minimum threshold value must be reached before the values of the dependent variable are observed and/or when some maximum threshold value restricts the observed values of the dependent variable
Ex: ticket sales cease after a stadium sells out even if demand is still high; people drop out of the labor force if wages become too low 

Censored Dependent Variables

Information is lost because some of the acutal values for the dependent variable are limited to a minimum and/or maximum threshold value
Examples: (1) Number of hours worked in a week (2) Income earned (3) Sale of tickets to an event (4) Exam scores 

Truncated Dependent Variables

Information is lost because some of the values for the variables are missing, meaning that they aren't observed if they are above or below some threshold; common when the selection of a sample is nonrandom (i.e. people below the poverty line)


Difference between Censored and Truncated Dependent Variables

Censored Observed, but suppressed
Truncated Not observed 

Methods of Dealing with Limited Dependent Variables

(1) Tobin's Tobit
(2) Truncated Normal (3) Heckman Selection 

Static Model

If your dependent variable reacts instantaneously to changes in the independent variable(s), then the model is static and will estimate a contemporaneous relationship at time t


Dynamic

If your dependent variable doesn't fully react to a change in the independent variable(s) during the period in which the change occurs, then your model is dynamic and will estimate both a contemporaneous relationship at time t and lagged relationship at time t1


Autoregressive Model

A type of dynamic model that seeks to fix the estimation issues associated with distributed lag models by replacing the lagged values of the independent variable with a lagged value of the dependent variable


Spurious Correlation Problem

Exists when a regression model contains dependent and independent variables that are trending; may appear to show that X has a strong effect on Y when this may not be the case: it is the trend of the data causing the observed results


Detrending TimeSeries Data

Removing trending patters from data in order to derive the explanatory power of the independent variables; helps to solve the spurious correlation problem


Dealing with Seasonal Adjustments

Seasonality can be correlated with both your dependent and independent variables; it is necessary to explicitly control for season in which measurements occur: use dummy variables for the seasons
Data that has been stripped of its seasonal patterns is referred to as seasonally adjusted or deseasonalized data 

Panel Dataset vs. Pooled CrossSectional Measurements

Both contain crosssectional measurements in multiple periods, but a panel dataset includes the same crosssectional units in each time period rather than being randomly selected in each period as is the case with pooled crosssectional data


True Experiment

Subjects are randomly assigned to two (or more) groups; one group from the population of interest is randomly assigned to the control group and the remainder is assigned to the treatment group(s)


Natural (Quasi) Experiment

Subjects are assigned to groups based on conditions beyond the control of the researcher, such as public policy


Difference in Difference (DinD)

A technique used that measures the effect of a treatment in a given period of time; identifies and separates a preexisting difference from a data point from the difference that exists after the introduction of a treatment (or event or public policy change)


Heterogenity Bias

Occurs if you ignore characteristics that are unique to your crosssectional units and they're correlated with any of your independent variables


Estimation Methods when Using Panel Data for Unobservable Factors

(1) First Difference (FD) Transformation
(2) Dummy Variable (DV) Regression (3) Fixed Effects (FE) Estimator 

First Difference Transformation

Subtract the previous value of a variable from the current value of that variable for a particular crosssectional unit and repeat the process for all variables in the analysis


Dummy Variable Regression

Involves the inclusion of dummy variables in the model for each crosssectional unit, making it a straightforward extension to the basic use of dummy variables


Composite Error

Found by estimating a model for panel data by using OLS so that you're essentially ignoring the panel nature of the data
The composite error term includes individual fixed effects (unobservable factors associated with the individual subjects) and idiosyncratic error (represents truly random element associated with a particular subject at a point in time) 

Fixed Effect Estimator

The most common method of dealing with fixed effects of crosssectional units; applied by time demeaning the data, essentially calculating the average value of a variable over time for each crosssectional unit and subtracting this mean from all observed values of a given cross sectional unit, repeating the procedure for all units
This deals with unobservable factors because it takes out any component constant over time 

Random Effects (RE) Model

An econometric model that allows for all unobserved effects to be relegated to the error term; this provides more efficient estimates of the regression parameter


Hausman Test

Examines the differences in the estimated parameters, and the result is used to determine whether the RE and FE estimates are significantly different


Ten Components of a Good Research Project

(1) Introducing your topic and posing the primary question of interest
(2) Discussing the relevance and importance of your topic (3) Reviewing the existing literature (4) Describing the conceptual or theoretical framework (5) Explaining your econometric model (6) Discussing the estimation method(s) (7) Providing a detailed description of the data (8) Constructing tables and graphs to display the results (9) Interpreting the reported results (10) Summarizing what was learned 

Ten Common Mistakes in Applied Econometrics

(1) Failing to use common sense and knowledge of economic theory
(2) Asking the wrong questions first (3) Ignoring the work and contributions of others (4) Failing to familiarize yourself with the data (5) Making it too complicated (6) Being inflexible to real world complications (7) Looking the other way when you see bizarre results (8) Obsessing over measures of fit and statistical significance (9) Forgetting about economic significance (10) Assuming your results are robust 