Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Econometrics for Dummies

Econometrics For Dummies

by maxyork9, Oct. 2014

Subjects: Econometrics for Dummies, Farizo

Favorite

Add to folder

Flag

Related Essays

Hypergeometric And Negative Binomial Distribution
Different statistics variables such as random, discrete, and expected are helpful in the study of statistics. These different types of distributions use th...
Economics Extended Essay Survey: Glenforest Secondary School
Economics Extended Essay Survey Thank you for participating in this survey. My name is Sharon Lu. I am a student currently attending Glenforest Secondary Sch...
Which Tribe Has The Comparative Advantage In Spaghetti Production?
(2013). Microeconomics (Third ed.). New York, NY: Worth Unit _2_ Practice Assignment Grading Rubric | | | | Content | Percent Possible | Poin...
Economic Glossary Terms Research Paper
Economic Vocabulary Terms Define the following terms to as part of your economic glossary terms for this course. For this assignment define the term in your ...
The Great Recession
The aforementioned decline in economic activity is determined through numerous forms of economic data that allow economists to clarify the state of the econo...
Essay On Joint Probability
Therefore, it is the probability of two or more specific events occurring together. Within my current assignment, I am looking at the probability of event ...
Robert J. Stonebraker's The Joy Of Economics
Economics is the study of human’s decision-making. Why do we have to make decisions? We constantly have problems that need to be solved. For example, inequal...
Analysis: Habitat For Humanity
These indicators include the Gross Domestic Product, Personal Consumption Expenditure, Employment Indicators, Durable Goods Orders, Consumer and Producer Pri...
Applications Of Probability In Probability
Fourthly, the unconditional probability is the likelihood of one event occurring in that, for instance, the probability of event 1 will be an unconditional...
Qualitative Vs Quantitative Essay
Therefore, discrete variables have the potential of being used precisely and without error; and although counting errors could still occur, precise measureme...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/200

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

200 Cards in this Set

Front
Back

	Econometrics	The application for statistical and mathematical theories to economics for the purpose of testing hypotheses and forecasting future trends; takes economic models and tests them through statistical trials; the branch of economics concerned with the use of mathematical methods in describing economic systems
	Tools of Econometrics	Econometrics that uses tools such as frequency distributions, probability and probability distributions, statistical inference, simple and multiple regression analysis, simultaneous equations models, and time series methods
	Limitations to the Collection of Data Process in Econometrics	(1)Aggregation of Data (2)Statstically correlated but economically irrelevant variables (3)Qualitative Data (4)Classical Linear Regression Model Assumption Failures
	Linear Regression	A statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables; linear regression uses one independent variable to explain and/or predict the outcome of a dependent variable
	Ordinary Least Squares	A statistical technique to determine the line of best fit for a mode; a straight line is sought to be fitted through a number of points to minimize the sum of the squares of the distances from the points to the line of best fit
	Typical Data Types in Econometrics	(1) Cross Sectional (2) Time Series (3) Panel/Longitudinal
	Cross Sectional Data	Consists of measurements for individual observations at a given point in time; tend to be popular in labor economics, industrial organization, urban economics, and other micro-based fields; data typically collected through surveys
	Time Series Data	Consists of measurements on one or more variables over time; a sequence of data points measured in successive points in time at uniform time intervals; often used for examining seasonal trends and adjustments; data often collected by government agencies
	Panel/Longitudinal Data	Consist of time series for each cross-sectional unit in a sample; involves repeated observations of the same variables over a period of time; data typically collected through surveys
	2 CLRM Assumption Failures	(1) Inability to account for heteroskedasticity (2) Inability to account for autocorrelation
	Heteroskedasticity	The standard deviations of a variable, monitored over a specific amount of time, are non-constant; volatility isn't constant
	Autocorrelation	Also known as serial correlation, may exist in a regression model when the order of the observations in the data is relevant; refers to the correlation of a time series with its own past and future values; also known as lagged correlation; complicates teh application of statistical tests by reducing the number of independent observatiosn
	Positive Autocorrelation	A form of "persistence" whereby a system has a tendency to remain in the same state from one observation to the next
	Data Mining	Using statistics to find models that fit data well; approached viewed unfavorably in economics
	Dummy Variables	A variable that takes on the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome; a true/false variable
	Random Variables	A variable whose value is subject to variations due to chance; conceptually does not have a single, fixed value (even if unknown), rather it can take on a set of possible different values, each with an associated probability; uncertain values
	Discrete Variable	Variables that only take on a finite number of values(thus, all qualitative variables and some quantitative variables); can be described by integers and the outcomes are countable
	Continuous Variable	Variables that can take on any value in a certain range; infinite and non-coutable
	Probability Density Functions (PDF)	Shows the probabilities of a random variable for all its possible values; a function that describes the relative likelihood for this random variable to take on a five value
	Normal Distribution	Also known as the Gaussian or continuous probability distribution; plots all values in a symmetrical fashion and most of the results are situated around the probability'smean
	Normal Distribution Characteristics	(1) Total area under the curve equals 1 (2) About 68% of the density is within one standard deviation of the mean (3) About 95% of the density is within two standard deviations of the mean (4) About 99.7% of the density is within three standard deviations of the mean (5) Because a continuous random variable can take on infinitely many values, the probability that a specific value occurring is zero
	Cumulative Density Function (CDF)	The sum or accrual of probabilities up to some value; gives the area under the probability density function (PDF) from -infinity to X.
	Integration (Calculus)	Allows us to find densities under nonlinear functions
	Differential (Calculus)	Concerns the rates of change and slopes of curves
	Bivariate or Joint Probability Density	Provides the relative frequencies or chances that events with more than one random variable will occur; the probability that two events will occur simultaneously
	Marginal (Unconditional) Probability	The probability of the occurrence of a single event; the probability of one variable taking a specific variable irrespective of the values of the others
	Conditional Probabilities	Calculate the chance that a specific value for a random variable will occur given that another random variable has already taken a value; requires both joint and marginal probabilities in order to calculate
	Independent Events	One event has no statistical relationship with the other event; to check for independence observe that the probability of one event is unaffected by the occurrence of another event; independent events if the conditional and unconditional probabilities are equal
	Expected Value	Mean of a random variable, provides a measure of central tendency or one measurement of where the data tends to cluster; the sum of all variables and their respective probabilities (with continuous variables, it is the derivative of the sum of all variables and their probabilities)
	Variance	A measure of dispersion, or how far a set of numbers is spread out; the square of the standard deviation; the average squared difference between the value of a random variable and its mean
	Covariance	Measures how two variables are related: 0 if the variables are independent or no clear relation between the two, + if there is a direct relationship, - if there is an inverse relationship; DOES NOT PROVIDE INFORMATION AS TO THE STRENGTH OF THE RELATIONSHIP OF TWO VARIABLES, JUST THE DIRECTION OF THE RELATIONSHIP
	Standard Deviation	The square root of the variance; a measure of the dispersion of a set of data from its mean; commonly reported because it is measured in the same units as the random variable
	Correlation	Measures the strength of the relationship between two variables; can only identify linear relationships (other techniques available for non-linear relationships)
	Statistical Inference	Focuses on the process of making generalizations for a population from sample information
	Descriptive Statistics	Measurements that can be used to summarize your sample data and subsequently make predictions about your population of interest; quantitatively describe the main features of a collection of data
	Parameter	A numerical characteristic of a population, as distinct from a statistic of a sample
	Estimators	Calculating descriptive measures using sample data
	Point Estimate	Calculating a statistic with data from a random sample produces this; a single estimate of a population parameter
	Unbiased Estimator	If in repeated estimations using the same calculation method, the mean value of the estimator coincides with the true parameter value
	Efficient Estimator	An estimator that achieves the smallest variance among estimators of its kind
	Linearity Estimators	An estimator has this property if a statistic is a linear function of the sample observations
	Consistent Estimator	An estimator that approaches the true parameter value as the sample size gets larger and larger; this is known as an asymptotic property- it gradually approaches the true parameter value as the sample size approaches infinity
	Linear Combination of Normally Distributed Random Variables	If a random variable is a linear combination of another normally distributed random variable(s), it also has a normal distribution
	Standard Normal Distribution	A normal distribution with a mean of 0 and a standard deviation of 1; useful because any normally distributed random variable can be converted to this scale allowing for quick and easy computation of probabilities; denoted by the letter "Z"
	Z Score	The Z value or Z score is obtained by dividing the difference of a measurement and the mean by the standard deviation; this translates the variable into an easily measurable form and the probability will now be obtainable using a table
	Sampling Distribution	A probability distribution or density of a statistic when random samples of size "n" are repeatedly drawn from a population- it is not the distribution of the sample measurements
	Central Limit Theorem (CLT)	A statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample's size. Distributions of sample means can thus be converted to standard normals
	Chi-squared Distribution	A probability density function that gives the distribution of the sum of the squares of several independent random variables each with a normal distribution with zero mean and unit variance; the higher the degrees of freedom (or more observations), the less skewed (and more symmetrical) the distribution The sum of the squares of several independent standard normal random variables is distributed according to the chi-squared distribution with "k" degrees of freedom (or "k" number of variables)
	Uses of Chi-squared Distribution	Used for comparing estimated variance values from a sample to those values based on theoretical assumptions; used to develop confidence intervals and hypothesis tests for population variance
	t- Distribution	Probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown If we take a sample of "n" observations from a normal distribution with fixed unknown mean and variance, and if we compute the sample mean and sample variance of these "n" observations, the the t-distribution can be defined as the distribution of the location of the true mean, relative to the sample mean and divided by the sample standard deviation after multiplying by the normalizing term (SQRT("n")). Used to estimate how likely it is that the true mean lies in any given range
	Characteristics of t-Distribution	Bell shaped, symmetrical around zero, approaches a normal distribution as the degrees of freedom (number of observations) increases; the ratio of the standard normal to the square root of your chi-squared distribution
	F- Distribution	A ratio of two chi-squared distributions divided by their respective degrees of freedom; as degrees of freedom in the numerator and denominator increase, the distribution approaches normal A probability density function that is used especially in analysis of variance, and is a function of the ratio of two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom
	Methods of Hypothesis Test	(1) Z distribution (2) t distribution (3) Chi-squared distribution (4) F distribution
	Null Hypothesis (Ho)	General or default position- no relationship between two measured phenomenon; an assumption or prior belief about a population parameter to be tested; attempted to be overturned by our hypothesis test
	Alternative Hypothesis	Reflects that there will be an observed effect for our experiment; tested against the null hypthesis
	Steps to Performing a Hypothesis Test	(1) Estimate the population parameters using sample data (2) Determine the appropriate distribution (3) Calculate an interval estimate or test statistic (4) Determine the hypothesis test outcome
	Point Estimation	Involves the use of sample data to calculate a single value (statistic) which is to serve as a best estimate of an unknown population parameter; a single estimate of your parameter of interest
	Confidence Interval Approach to Testing Hypotheses	A type of interval estimate of a population parameter that is used to indicate the reliability of an estimate; an observed interval that frequently includes the parameter of interest if the experiment is repeated
	Significance Test Approach to Testing Hyptheses	Allows an analyst to estimate how reliably the results derived from a study based on a randomly selected sample can be generalizable to the population from which the sample was drawn; a result that is statistically significant is a result not likely to occur randomly, but likely to be attributable to a specific cause
	Measurements of Significance and Confidence	Alpha and (1-Alpha), respectively
	Examples of Hypothesis Tests Situations	Value of one mean: Z Value of one mean with unknown pop variance: t Value of variance: Chi-squared Comparing two means: t Comparing two variances: F
	Details on the Confidence Interval Approach	Calculate a lower limit and an upper limit for a random interval and attach some likelihood that the interval contains the true parameter value; if the hypothesized value for your parameter of interest ins in the critical region (outside of the confidence interval 1-Alpha), then you reject the null hypthesis
	Details on the Significance Test Approach	Calculate a test statistic and then compare the calculated value to the critical value from one of the probability distributions to determine the outcome of your test; if the calculated test statistic is in the critical region, you reject the null hypothesis and you can also say that your test is statistically significant
	p-value	The level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event; the smaller the p-value, the more strongly the test rejects the null hypothesis The lowest level of significance at which you could reject the null hypothesis given your calculated statistic
	Type I Error	Rejecting a null hypothesis that is in fact true Increasing the value of alpha (the level of significance) increases the chance of rejecting the null hypothesis and the chance of committing a Type I error
	Type II Error	Failing to reject a null hypothesis that is in fact false Reducing the value of alpha (the level of significance) increases the chance of failing to reject the null hypothesis and the chance of committing a Type II error
	Use of Econometrics Techniques	Determining the magnitude of various relationships that economics introduces; used to predict or forecast future events and explaining how one or more factors affect some outcome of interest
	Model Specification	Selecting an outcome of interest or dependent variable (Y) and one or more independent factors (or explanatory variables X); the determination of which independent variables should be included in or excluded from a regression equation
	Overspecification	Using or including numerous irrelevant variables in the model
	Spurious Correlation	A correlation between two variables that does not result from any direct relation between them but from their relation to other variables; variables coincidentally have a statistical relationship but one doesn't cause the other; correlation can never be proven by statistical results in any circumstance
	Population Regression Function (PRF)	Defines in a mathematical function perception of reality
	Setting up a PRF Model	(1) Provide the general mathematical specification of the model -Denotes dependent variable and all independent variables (2) Derive the econometric specification of your model -Develop a function that can be used to calculate econometric results (3) Specify the random nature of your model -Introduce an error variable
	Conditional Mean Operator	Indicates that the relationship is expected to hold, on average, for given values of independent variables
	Constant	The expected value of the dependent variable (Y) when all independent variables (X) are equal to 0
	Stochastic Population Regression Function	A function that introduces a random error term associated with the observation
	Stochastic	Random; involving a random variable; involving chance or probabilty
	Random Error	Results from: (1) insufficient or incorrectly measured data (2) A lack of theoretical insights to fully account for all the factors that affect the dependent variable (3) Applying the incorrect functional form (4) Unobservable characteristics (5) Unpredictable elements of behavior
	Pooled Cross-Sectional Data	Combines independent cross-sectional data that has been collected overtime; an event study is pooled cross sectional data
	Regression Analysis	Techniques that allow for estimation of economic relationships using data; used for estimating the relationships among variables
	Least Squares Principle	The Sample Regression Function should be constructed (with the constant and slope values) so that the sum of the squared distance between the observed values of your dependent variable and the values estimated from your SRF is minimized.
	Why OLS is most popular method for estimating regressions	(1) Easier than other alternatives (2) Sensibility (3) Has desirable characteristics
	Numerical Properties of OLS	(1) The regression line always passes through the sample means of Y and X (2) The mean of the estimated (predicted) Y is equal to the mean value of the actual Y (3) The mean of the residuals is 0 (4) The residuals are uncorrelated with the predicted Y (5) The residuals are uncorrelated with observed values of the independent variable
	Residuals	The difference between the observed value and the estimated function value; distance from data points to the regression line
	Multiple Regression	A regression model that contains more than one explanatory variable
	Slope Coefficients	Tell the estimated direction of the impacts that the independent variables have on the dependent variables, and also show by how much the dependent variable changes (value or magnitude) when one of the independent variables increases or decreases
	Partial Slope Coefficients	The coefficients of the independent variables in a multiple regression; provide an estimate of the change in the dependent variable for a 1-unit change in the explanatory variable, assuming the value of all other variables in the regression model hold constant
	Issues when Comparing Regression Coefficients	(1) In standard OLS regression, the coefficient with the largest magnitude is not necessarily associated with the "most important" variable (2) Coefficient magnitudes can be affected by changing units of measurement; scale matters (3) Even variables measured on similar scales can have different amounts of variability
	Standardized Regression Coefficients	The calculation of standardized regression coefficients allows for the comparison of coefficient magnitudes in a multiple regrssion
	Methods of Calculating Standardized Regression Coefficients	(1) Calculating a Z-score for every variable of every observation and then performing OLS with the Z values rather than the raw data (2) Obtaining the OLS regression coefficients using the raw data and then multiplying each coefficient by the standard deviation of the X variable over the standard deviation of the y variable
	Beta Coefficients	Standardized regression coefficients; not to be confused with Beta in finance; unfortunate name as the Greek letter Beta is also used for regular OLS coefficients Estimates the standard deviation change in the dependent variable for a 1-standard deviation change in the independent variable, holding other variables constant
	Goodness of Fit	Describes how well a statistical model first a set of observations; generally requires the decomposing of the variation in the dependent variable into explained and unexplained (residual) parts, then using R-squared to make measurement
	Coefficient of Determination	R-squared; indicates how well data points fit a line or curve; the measure of fit most commonly used with OLS regression
	Explained and Residual Variation	Explained variation is the difference between the regression line and the mean value. Residual/unexplained variation is the difference between the observed value and the regression line.
	R-Squared	Measures the proportion of variation in the dependent variation in the independent variables; a ratio between 0 and 1; equals the explained sum of squares over the total sum of squares; maximizing R-Squared means the line has a good fit because we seek to minimize the residual sum of squares (the closer to 1, the better); can only remain the same or increase as more explanatory values are added
	Adjusted R-Squared	This is an attempt to take account of the phenomenon of the R-Squared automatically increasing hen extra explanatory variables are added to the model; this variable includes a "degrees of freedom penalty" which maintains a reputable value considering the number of explanatory values used; may increase, decrease, or remain the same as more explanatory values are added
	Reasons to Avoid using R-Squared as the only measure of a Regression's Quality	(1) A regression may have a high R-Squared but have no meaningful interpretation because the model equation isn't supported by economic theory or common sense (2) Using a small data set or one that includes inaccuracies can lead to a high R-Squared value but deceptive results (3) Obsessing over R-Squared may cause you to overlook important econometric problems High R-Squared values may be associated with regressions that violate assumptions; in econometric settings, R-Squared values too close to 1 often indicate something is wrong
	OLS Assumptions (or the Classical Linear Regression Model)	(1) The model is linear in parameters and has an additive error term (2) The values for the independent variables are derived from a random sample of the population and contain variability (3) No independent variable is a perfect linear function of any other independent variable(s) (no perfect collinearity) (4) The model is correctly specified and the error term has a zero conditional MEAN (not necessarily sum) (5) The error term has a constant variance (no heteroskedasticity) (6) The values of the error term aren't correlated with each other (no autocorrelation or no serial correlation)
	Linear Functions vs. Linear in Parameters	A function doesn't need to be linear in order to be applicable to the OLS, but the parameters must be; in other words, the formula can have exponents, but the parameters (Betas) cannot be the exponents (a log transformation may be used in order to linearize this type of function)
	Multicollinearity	When two or more predictor variables in a multiple regression are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy; in this case, the CLRM and OLS should not be used
	Misspecification	When you fail to include a relevant independent variable or you use an incorrect functional form; along with restricted dependent variables (qualitative or percent scale data) may lead to the failing of a CLRM assumption
	Homoskedasticity	A situation in which the error has the same variance regardless of the value(s) taken by the independent variable(s)
	Gauss-Markov Theorem	States that the OLS estimators are the Best Linear Unbiased Estimators (BLUE) given the assumptions of the CLRM
	Biased and Unbiased Statistics	A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest; unbiased statistics represent the population fairly well
	Factors Influencing Variance of OLS Estimators	(1) The variance of the error term- the larger the variance of the error, the larger the variance of the OLS estimates (2) The variance of X- the larger the sample variance of X, the smaller the variance of the OLS estimates (3) Multicollinearity- As the correlation between two or more independent variables approaches 1, the variance of the OLS estimates becomes increasingly large and approaches infinity (less efficient)
	Efficient Estimators	The lower the variance of a variable, the more efficient; sometimes, a balance must be struck between inefficient vs efficient and biased vs unbiased estimators (it may be better to accept an biased estimator if it is more efficient than an unbiased estimate)
	Best Linear Unbiased Estimators (BLUE)	Best- Achieving the smallest possible variance among all similar estimators Linear- Estimates are derived using linear combinations of the data values Unbiased- Estimators (coefficients) on average equal their true parameter values Given the assumptions of CLRM, the OLS estimators are "BLUE": This is the Gauss-Markov Theorem
	Consistency	An asymptotic property- as the sample size approaches infinity, the variance of the estimator gets smaller and the value of the estimator approaches the true population parameter value Used when CLRM assumptions fail and the alternative method doesn't produce a BLUE
	Assumption of Normality in Econometrics	For any given X value, the error term follows a normal distribution with a zero mean and constant variance; for large sample sizes, normality is not a major issue because the OLS estimators are approximately normal even if the errors are not normal If you assume that the error term is normally distributed, that translates to a normal distribution of OLS estimators
	Mean Square Error	A value that provdies an estimate for the ture variance of the error; the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom
	Confidence Interval Approach to Statistical Significance	Provides a range of possible values for the estimator in repeated sampling, and the range of values would contain the true value (parameter) a certain percentage of the time; interveals commonly used are 90, 95, and 99; if a hypothesized value is not contained in your calculated confidence interval, then your coefficient is statistically significant
	Test of Significance Approach to Statistical Significance	Provides a test statistic that's used to determine the likelihood of the hypothesis; a t-test is generally performed- if the t-statistic is in the critical region, then the coefficient is statistically significant
	Notation for Statistically Significance	() Significant at 10% level () Significant at 5% level (**) Significant at 1% level Reporting of p-values: the lowest level of significance at which the null hypothesis could be rejected
	Overall (Joint) Significance	Determine if the variation in your Y variable explained by all or some of your variables is nontrivial; uses the F-statistic
	F-Statistic	Tests overall (joint) significance; in order to see how changes to your model affect explained variation, you want to comapre the different components of variance- can be done by using the F-statistic and generating an F-Distribution
	Specification Issues	In regression analysis and related fields such as economics, this is the process of changing a theory into a regression model. This method consists of choosing an appropriate functional form for the model and choosing which variables to add. This is one of the first basic steps in regression analysis. If an estimated model is not specified, it will be inconsistent and biased.
	Types of Non-linear models	(1) Quadratic Functions (2) Cubic Functions (3) Inverse Functions
	Quadratic Functions in Econometrics	Allows the effect of the independent varaible on the dependent variable to change; as the value of X increases, the impact on the dependent variable increases or decreases; best for finding maximums and minimums; observable in total variable cost and total cost curves
	Cubic Functions in Econometrics	Allows for the effect of the independent variable (X) on the dependent variable (Y) to change, but this relationship changes at some unique value of X; often observed in total variable cost curves and total cost curves
	Inflexion Point	The point at which a decreasing effect becomes increasing or a decreasing effect becomes increasing; observed in a cubic function
	Inverse Functions	Used when the outcome (dependent variable Y) is likely to approach some value asymptotically (as independent variable approaches 0 or infinity); Observable in economic phenomena where the variables are related inversly (inflation and unemployment, price and quantity demanded)
	Log-Log Model	Useful model when the relationship is nonlinear in parameter because the log transformation generates the desired linearity in parameters; may be used to transform a model that's nonlinear in parameters to one that is linear
	Elasticity in a Log Model	The coefficients of a linear model that was derived from a non-linear model using logarithms represent the elasticity of the dependent variable with respect to the independent variable; the coefficient is the estimated percent change in Y for a percent change in X
	Log-Linear Model	An econometric specification whereby the natural log values are used for the depedendent varaible Y and the independent variable X is kept in its original scale; often used when the variables are expected to have an exponential growth relationship
	Coefficients in a Log-Linear Model	The coefficients represent the estimated percent change in your dependent variable for a unit change in your independent variable; the regression coefficients in a log-linear model don't represent slope
	Linear-Log Model	An economic specification whereby the natural log values are used for the independent variable X and the dependent variable Y is kept in its original scale; typically used when the impact of the indepedent variable on the dependent variable decreases as the value of the indepedent variable increases (similar to a quadratic, but it never reaches a maximum of minimum value for Y; used to model diminishing marginal returns
	Coefficients in a Linear-Log Model	Coefficients represent the estimated unit change in the dependent variable for a percentage change in the independent variable
	Misspecification in Econometric Models	(1) Omitting Relevant Varaibles (2) Including Irrelevant Variables Note: just because an estimated coefficient doesn't have statistical significance doesn't mean it is irrelevant- a well specified model includes both significant and non-significant variables
	Regression Specification Error Test (RESET)	A test that can be used to detect specification issues related to omitted variables and certain functional forms; allows you to identify if there is misspecification in yout model, but it doesn't identify the source
	Chow Test	A misspecification test that checks for the structural stability of the model; used when the parameters in the model aren't stable or they change
	Robustness	Robustness refers to the sensititvity (or rather, the lack therof) of the estimated coefficients when you make changes to your model's specification; misspecification less problematic when the results are robust
	Core Variables	Indepedent variables of primary interest
	Using mutliple dummy variables	If you J groups of variables, you need J-1 dummy variables with 1s and 0s to capture all the qualitative information; thr group that does not hav ea dummy variable is identified when all other dummy variables are 0 (known as the reference or base group)
	Interaction Term and Interacted Econometric Model	The product of two independent variables; an interacted econometric model includes a term that is the product of the dummy and quantitative variable for any given observation This model useful if the qualitative characteristic only contains two groups The inclusion of the interaction term allows the regression function to have a different intercept and slope for each group of indentified dummy variables
	Results of an Econometric Model with a Dummy, Quantitative, and Interaction Terms	(1) One Regression Line- The dummy and interaction coefficients are zero and not statistically significant (2) Two Regression Lines with Different Intercepts, but the Same Slope- The coefficient for the dummy variable is significant, but the interaction coefficient is zero (not statistically significant) (3) Two Regression Lines with the Same Intercept but different Slopes: Dummy coefficient zero, interaction coefficient significant (4) Two Regression Lines with Different Intercepts and Slopes- Both dummy and interaction coefficients are significant
	Interactive Qualitative Characteristic	An interaction (product) of two dummy variables if you have reason to believe that the simultaneous presence of two (or more) characteristics has an additional influence on your dependent variable
	Results of an Econometric Model with Two Dummy Variables and an Interaction Between those Two Characteristics	(1) One Regression Line: Dummy and Interaction are zero (2) Two Regression Lines: One coefficent significant, the other zero (3) Three Regression Lines: Both Dummies significant, but the interaction coefficient is 0 (4) Four Regression Lines: The dummy coefficients and the interaction coefficients are all significant
	Methods of Testing for Joint Significance Among Dummy Variables	(1) F-Test (2) Chow-Test
	Violations of the Classical Regression Model Assumption	(1) Multicollinearity (2) Heteroskedasticity (3) Autocorrelation
	Types of Multicollinearity	(1) Perfect Multicollinearity (rare) (2) High Multicollinearity (much more common)
	Perfect Mutlicollinearity	Two or more independent variables in a regression model exhibit a deterministic linear relationship (meaning it is perfectly predictable and contains no randomness) In a model with perfect multicollinearity, the regression coefficients are indeterminate and their standard errors are infinite
	High Multicollinearity	Results from a linear relationship between your independent variables with a high degree of correlation, but they aren't completely deterministic
	Variables Leading to Multicollinearity	(1) Variables that are lagged values of one another (2) Variables that share a common time trend component (3) Variables that capture similar phenomena
	Typical Consequences of High Multicollinearity	(1) Larger Standard Errors and Insignificant t-statistics (2) Coefficient estimates that are sensitive to changes in specification (3) Nonsensical coefficient signs and magnitudes
	Measuring the Degree or Severity of Multicollinearity	(1) Pairwise Correlation Coefficients (2) Variance Inflation Factors (VIF)
	Pairwise Correlation Coefficients	The value of sample correlation for every pair of independent varaibles; for a general rule of thumb, correlation coefficientsaround 0.8 or above may signal a multicollinearity problem Note that just because the corrleation coefficient isn't near 0.8 or above doesn't mean that you are clear of multicollinearity problems
	Variance Inflation Factor (VIF)	Measures the linear association between an independent variable and all the other independent variables; VIFs greater than 10 signal a highly likely multicollinearity problem, and VIFs between 5 and 10 signal a somewhat likely multicollinearity issue
	Resolving Multicollinearity	(1) Acquire more data (2) Apply a new model (3) Cut the problem variable loose
	"Acquiring More Data" Solution to Multicollinearity	(1) Ensures that multicollinearity not just in your sample (2) Make sure the population doesn't change (3) For cross-sectional data, use more specific data at either current time or future time (4) For time-series data, increase the frequency of the data
	"Using a New Model" Solution to Multicollinearity	(1) Respecify by log transformations, or reciprocal functions (2) Use "first-differencing" (3) Create a composite index variable
	First-Differencing	Involves subtracting the previous value from the current periods value; requires that the variables have variation over time Disadvantages (1) Losing observations (2) Losing variation in the independent variables (3) Changing the specification (possibly resulting in misspecification bias)
	Composite Index Variable	Combine collinear variables with similar characteristics into one variable; requires that the association between the two variables is logical
	Detecting Heteroskedasticity	(1) Examining the residuals graphically (2) Breusch-Pagan Test (3) White Test (4) Goldfield-Quandt Test (5) Park Test
	Breusch-Pagan Test	Allows the heteroskedasticity process to be a function of one or more of the independent variables, and its usually applied by assuming that heteroskedasticity may be a linear function of all the independent variables in the model Failing to find evidence of heteroskedasticity with BP doesn't rule out a nonlinear relationship between the independent variables and the error variance
	White Test	Allows the heteroskedasticity process to be a function of one or more independent variables; allows the independent variable to have a nonlinear and interactive effect on the error variance Useful for identifying nearly any pattern of heteroskedasticity, but not useful in showing how to correct the model
	Goldfeld-Quandt Test	Assumes that a defining point exists and can be used to differentiate the variance of the error term Result is dependent on the criteria chosen; often an arbitrary process, so failing to find evidence of hetero doesn't rule out hetero
	Park Test	Assumes that the heteroskedastic process may be proportional to some power of an independent variable Assumes heteroskedasticity has a particular functional form
	Correcting the Regression Model for the Presence of Heteroskedasticity	(1) Weighted Least Squares (2) Robust (White-Corrected) Standard Errors
	Weighted Least Squares (WLS) Technique	Transforms the heteroskedastic model into a homoskedastic model by using info about the nature of heteroskedasticity; divides both side of the model by the component of heteroskedasticity that gives the error term a constant variance Corrected coefficients should be near the OLS coefficients or the problem may have been something other than heteroskedasticity
	Robust (White-Corrected) Standard Errors	The most popular remedy for heteroskedasticity; uses the OLS coefficient estimates but adjusts the OLS standard errors for hetero without transforming the model being estimated; makes no assumptions about the functional form of the heteroskedasticity
	No Autocorrelation	A situation where no identifiable relationship exists between the values of the error terms among data, or the correlation and covariance among error terms are 0; the positive and negative error values are random
	Positive Autocorrelation	A situation where the correlation from one error term to the next is positive; more common and more likely than negative autocorrelation
	Negative Autocorrelation	A situation where the correlation from one error term to the next is negative; an unlikely situation
	Sequencing	An autocorrelation situtation where most positive error terms are followed or preceded by additional positive errors or when most negative errors are followed or preceded by other negative terms
	Analyzing Residuals to Test for Autocorrelation	(1) Graphical inspection of the residuals (2) The "run test" (or "Geary test") (3) Durbin-Watson (4) Breusch-Godfrey
	Run Test	A "run" is defined as a sequence of positive or negative residuals This test involves observing the number of runs within your data and determining whether this is an acceptable number of runs based on your confidence interval If the number of observed runs is below the expected interval, there is evidence of positive autocorrelation; if above expected interval, evidence of negative autocorrelation
	d-Statistic	A test statistic developed in the Durbin-Watson test in order to detect the presence of autocorrelation (but only identifying first order autoregression)
	Remedying Harmful Autocorrelation	(1) Feasible Generalized Least Squares (FGLS) (2) Serial correlation robust standard errors
	Feasible Generalized Least Squares (FGLS) Techniques and Description	(1) Cochrane-Orcutt Transformation (2) Prais-Winsten Transformation The goal of these models is to make the error term in the original model uncorrelated; involves quasi-differencing which subtracts the previous value of each variable scaled by the autocorrelation parameter
	Serial Correlation Robust Standard Errors	Allows for the biased estimates to be adjusted while the unbiased estimates are untouched, thus no model transformation is required Adjusting the OLS standard errors for autocorrelation produces serial correlation robust standard errors (Newey-West standard errors)
	Linear Probability Model (LPM)	Using the OLS technique to estimate a model with a dummy dependent variable creates this model
	Three Main LPM Problems	(1) Non-normality of the error term (2) Heteroskedastic errors (3) Potentially nonsensical predictions
	Non-normality of the error term in LPM	The error term of an LPM has a binomial distribution; implies that the t-tests and F-tests are invalid; because the error term will be the point on the line to either 1 or 0, it cannot have a normal distribution
	Heteroskedasticity in LPM	The variance of the LPM error term isn't constant; the variance of an LPM error term depends on the value of the independent variables
	Probit and Logit Models	Used instead of OLS for situations involving a qualitative independent variable; the conditional probabilities are nonlinearly related to the independent variable(s); both models asymptotically approach 0 and 1, so the predicted probabilities are always sensible, unlike the OLS for qualitative variables which has probabilities extending beyond 0 and 1 Probit is based off the standard normal function while the logit is based off the logistic CDF
	Maximum Likelihood (ML) Estimation	Chooses values for the estimated parameters that would maximize the probability of observing the Y values in the sample with the given X values; calculates the joint probability of observing all values of the dependent variable assumes each observation is drawn randomly and independently from the population
	Limited Dependent Variables	Arise when some minimum threshold value must be reached before the values of the dependent variable are observed and/or when some maximum threshold value restricts the observed values of the dependent variable Ex: ticket sales cease after a stadium sells out even if demand is still high; people drop out of the labor force if wages become too low
	Censored Dependent Variables	Information is lost because some of the acutal values for the dependent variable are limited to a minimum and/or maximum threshold value Examples: (1) Number of hours worked in a week (2) Income earned (3) Sale of tickets to an event (4) Exam scores
	Truncated Dependent Variables	Information is lost because some of the values for the variables are missing, meaning that they aren't observed if they are above or below some threshold; common when the selection of a sample is non-random (i.e. people below the poverty line)
	Difference between Censored and Truncated Dependent Variables	Censored- Observed, but suppressed Truncated- Not observed
	Methods of Dealing with Limited Dependent Variables	(1) Tobin's Tobit (2) Truncated Normal (3) Heckman Selection
	Static Model	If your dependent variable reacts instantaneously to changes in the independent variable(s), then the model is static and will estimate a contemporaneous relationship at time t
	Dynamic	If your dependent variable doesn't fully react to a change in the independent variable(s) during the period in which the change occurs, then your model is dynamic and will estimate both a contemporaneous relationship at time t and lagged relationship at time t-1
	Autoregressive Model	A type of dynamic model that seeks to fix the estimation issues associated with distributed lag models by replacing the lagged values of the independent variable with a lagged value of the dependent variable
	Spurious Correlation Problem	Exists when a regression model contains dependent and independent variables that are trending; may appear to show that X has a strong effect on Y when this may not be the case: it is the trend of the data causing the observed results
	Detrending Time-Series Data	Removing trending patters from data in order to derive the explanatory power of the independent variables; helps to solve the spurious correlation problem
	Dealing with Seasonal Adjustments	Seasonality can be correlated with both your dependent and independent variables; it is necessary to explicitly control for season in which measurements occur: use dummy variables for the seasons Data that has been stripped of its seasonal patterns is referred to as seasonally adjusted or deseasonalized data
	Panel Data-set vs. Pooled Cross-Sectional Measurements	Both contain cross-sectional measurements in multiple periods, but a panel dataset includes the same cross-sectional units in each time period rather than being randomly selected in each period as is the case with pooled cross-sectional data
	True Experiment	Subjects are randomly assigned to two (or more) groups; one group from the population of interest is randomly assigned to the control group and the remainder is assigned to the treatment group(s)
	Natural (Quasi) Experiment	Subjects are assigned to groups based on conditions beyond the control of the researcher, such as public policy
	Difference in Difference (D-in-D)	A technique used that measures the effect of a treatment in a given period of time; identifies and separates a preexisting difference from a data point from the difference that exists after the introduction of a treatment (or event or public policy change)
	Heterogenity Bias	Occurs if you ignore characteristics that are unique to your cross-sectional units and they're correlated with any of your independent variables
	Estimation Methods when Using Panel Data for Unobservable Factors	(1) First Difference (FD) Transformation (2) Dummy Variable (DV) Regression (3) Fixed Effects (FE) Estimator
	First Difference Transformation	Subtract the previous value of a variable from the current value of that variable for a particular cross-sectional unit and repeat the process for all variables in the analysis
	Dummy Variable Regression	Involves the inclusion of dummy variables in the model for each cross-sectional unit, making it a straightforward extension to the basic use of dummy variables
	Composite Error	Found by estimating a model for panel data by using OLS so that you're essentially ignoring the panel nature of the data The composite error term includes individual fixed effects (unobservable factors associated with the individual subjects) and idiosyncratic error (represents truly random element associated with a particular subject at a point in time)
	Fixed Effect Estimator	The most common method of dealing with fixed effects of cross-sectional units; applied by time demeaning the data, essentially calculating the average value of a variable over time for each cross-sectional unit and subtracting this mean from all observed values of a given cross sectional unit, repeating the procedure for all units This deals with unobservable factors because it takes out any component constant over time
	Random Effects (RE) Model	An econometric model that allows for all unobserved effects to be relegated to the error term; this provides more efficient estimates of the regression parameter
	Hausman Test	Examines the differences in the estimated parameters, and the result is used to determine whether the RE and FE estimates are significantly different
	Ten Components of a Good Research Project	(1) Introducing your topic and posing the primary question of interest (2) Discussing the relevance and importance of your topic (3) Reviewing the existing literature (4) Describing the conceptual or theoretical framework (5) Explaining your econometric model (6) Discussing the estimation method(s) (7) Providing a detailed description of the data (8) Constructing tables and graphs to display the results (9) Interpreting the reported results (10) Summarizing what was learned
	Ten Common Mistakes in Applied Econometrics	(1) Failing to use common sense and knowledge of economic theory (2) Asking the wrong questions first (3) Ignoring the work and contributions of others (4) Failing to familiarize yourself with the data (5) Making it too complicated (6) Being inflexible to real world complications (7) Looking the other way when you see bizarre results (8) Obsessing over measures of fit and statistical significance (9) Forgetting about economic significance (10) Assuming your results are robust

Share This Flashcard Set