Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
50 Cards in this Set
 Front
 Back
 3rd side (hint)
Sample Covariance

the average value of the product of the deviations of observations on two random variables from their sample means



Sample Correlation

the measure of linear association; how closely related two data series are
r = 1; perfect positive correlation; 0 no linear correlation; =1 perfect negative correlation 


Limitations of Correlation Analysis

a. Outliers = small numbers of observations at either extreme (small or large)
b. Spurious correlation – correlation between two variables that reflects chance relationship; correlation induced by a calculation; correlation between two variables arising from their relation to a third variable 


Dependent Variable

the "Y" in linear regression
"the variable you are seeking to explain" 


Independent Variable

the "X" in linear regression
"the variable you are using to explain changes in the dependent variable" 


Assumptions of Linear Regression

a. The relationship between X and Y is linear within the parameters of bo and b1 (raised only to the first power)
b. The independent variable, X, is not random c. The expected valuation of the error term, E, is zero d. The variance of the error term is the same for all observations e. The error term, E, is uncorrelated across observations f. The error term, E, is normally distributed 


Standard Error of Estimate (SEE)

determines how well a linear regression model captures the relationship between the dependent and independent variables; how certain we can be about a particular prediction of Y using the regression equation



Coefficient of determination

how well the independent variable explains the variation in the dependent variable; the fraction of the total variation in the dependent variable that is explained by the independent variable
when k = 1, R^2 where R^2 = explained variation / total variation 


Confidence Interval

an interval of values that is believed to include the true parameter value with a given degree of confidence



Type 1 Error

the chance of rejecting the null hypothesis when, in fact, it is true



Type 2 error

failing to reject the null hypothesis when, in fact, it is false



Analysis of Variance (ANOVA) in regression analysis

a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources
the usefulness of the independent variable or variables explaining variation in the dependent variable 


Limitations of regression analysis

a. Regression relations can change over time (just like correlations) = called parameter instability
b. Public knowledge of these regression relationships may negate future usefulness in the market c. If regression assumptions are violated, predictions based on linear regression may not be valid 


Assumptions of multiple regression model

a. Linear relationship between the dependent variable, Y and the independent variables X1, X2 Xn
b. Independent variables are not random c. The expected vale of the error term is 0 d. The error term is uncorrelated across observations e. The error term is normally distributed 


FTest interpretation

tests the regression's overall significance



R squared vs. Adjusted R squared

R squared = the goodness of fit of the model
Adjusted R squared = needed with multiple independent variables, because it doesn't automatically increase upon the addition of another variable 
Adjusted R squared =
1  {(n1)/(nk1) * (1  R squared)} 

Heteroskedasticity definition

when the assumption that variance of errors is constant, is violated (homoscedastic if the assumption is not constant)

a plot of data is heteroskedastic if: its variance from the line of fit differs at an increasing rate vs. a close fit to the line with homoscedasticity


Impact of Heteroskedasticity

will result in unreliable standard errors and therefore unreliable computed ttests; often standard errors will be understated, resulting in inflated tstats, and suggesting significance, when in fact there isn't significance

Test with Breusch and Pagan test = (n*R squared) compared to Chi squared at given significance level and df = to the number of independent variables
Correct with robust standard errors of generalized least squares 

Serial Correlation (autocorrelated)

when regression errors are correlated across observations
positive serial correlation is when a positive error for one observation increases the chance of a positive error for another observation 
Test for Serial Correlation with the durbin watson test


Interpretation of DW (DurbinWatson) test

if the regression has no serial correlation: DW stat =2
If the regression residuals are positively serially correlated, DW is less than 2 If negatively serially correlated, than DW > 2 Inconclusive if DW stat lies between dl and du range 
DW stat = 2(1r)
Correct for serial correlation by adjusting the coefficient standard errors via Hansen's method (does not remove completely, but diminishes its impact) 

Multicollinearity definition

occurs when two or more independent variables are highly, but not perfectly correlated with each other making the interpretation of the regression output problematic – regression coefficients become imprecise and unreliable – cannot distinguish the individual impacts of the independent variables on the dependent variable



Detecting Multicollinearity

A high R squared and a significant Fstat when tstats are NOT significant is an indication of multicollinearity

Correct by excluding one or more of the regression (independent) variables


Model specification

refers to the set of variables included in the regression and the regression equation’s functional form



three types of model misspecification

1) misspecified fuctional form
2) regressors that are correlated with the error term 3) timeseries misspecification: nonstationarity, random walks 


Impact of misspecification

All misspecifications invalidate statistical inference, causing regression coefficients to be inconsistent



TimeSeries misspecification: nonstationarity

when a variable’s properties (mean and variance), are not constant through time



TimeSeries misspecification: random walks

time series for which the best predictor of next period’s value is this period’s value (when b1=1)



Probit model

is based on a normal distribution, estimating the probability that Y = 1

probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome


Logit model

is based on the logistic distribution

probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome


Functional form misspecification

omitting an important variable, may need to transform a variable, pooling data from samples that should not be pooled (can see this graphically)



Regressors that are correlated with the error term  misspecification

including a lagged dependent variable as an independent variable, including a function of a dependent variable as an independent variable, or independent variables measured with error



Limitation of trend models  time series, linear or loglinear

Regression error for one period must be uncorrelated with the regression error for all other periods
this is the definition of serial correlation, which is the main limitation of timeseries analysis 


Covariance Stationary

a key assumption of time series model  states that properties, like the mean and variance, do not change over tim



Requirements for a times series to be covariance stationary

The expected value of the time series must be constant and finite in all periods
The variance of the time series must be constant and finite in all periods The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods 
If a time series is not covariance stationary, the estimation results will have NO ECONOMIC MEANING


Autoregressive model (AR)

is a time series regressed on its own past values;
Xt = bo + b1*(Xt1) + e 


How autocorrelation of the residuals can test whether an AR model fits the time series

a. Can test whether using the correct timeseries model by testing whether the autocorrelations of the error term differ significantly from zero (are tstats of the residuals significant?)
b. If it does differ from zero, the model is not specified correctly, and can use the sample autocorrelation of the residuals and their sample variance to estimate error autocorrelation c. Standard error of the residuals = 1/square root of T (t is the # of observations in the time series) 


Mean Reversion

a time series shows mean reversion if it tends to fall when its level is above its mean and rise when its level is below its mean

Xt = bo / (1b1)


Insample forecast errors

are the residuals from a fitted timeseries model (within the timeframe specified)



Outofsample forecast errors

are the differences between the actual and predicted values – gives a sense for how well it will forecast in the future



Root mean squared error (RMSE)

the square root of the average squared error – the smallest RMSE the more accurate



Why coefficients of timeseries models are unstable

a. Sample period used is crucial for appropriate statistical inference and forecasting accuracy
b. Models are only valid if it is a covariance stationary timeseries 


Random Walk Process

a. Random walk is a time series in which the value of the series in one period is the value of the series in the precious period plus an unpredictable random error
i. Xt = Xt1 + E (all because bo = 0 and b1 = 1) ii. Random walk with a drift is when b1=1 and bo is not equal to 0 


Random Walk "Cure"

With an undefined mean reverting level and no upper bound for variance (it grows with t), resulting in no finite variance, or a time series that is not covariance stationary – means you cannot use standard regression analysis with a random walk, instead need to convert the data to a covariance stationary time series by first differencing (yt = xt – xt1)



Unit Root

If b1 = 1, the timeseries has a unit root, is also by definition a random walk, and therefore not covariance stationary



Impact of a Unit Root

The presence of a unit root makes tstats unreliable, instead use the Dickey Fuller test (subtracting xt1 from each side of the equation) and then reevaluate the tstats of residuals until the model is properly specified



Dickey & Fuller Test

first difference (subtract xt1 from each side) until tstats of the residuals are no longer significant
adjusts/test ARs for unit roots 


How to test and adjust for seasonality in timeseries

Adjust the equation to factor in a seasonal component (the time period, t that is significant when others are not within AR), and first difference until the tstats of the residuals are no long significant



autoregressive conditional heteroskedasticity (ARCH)

the presence of heteroskedasticity within a timeseries, where variance of the error term is NOT constant and depends on the independent variable – this will cause the standard errors to be unreliable, and therefore tstats may indicate significance, when in fact they are not
TO CORRECT: will need to use generalized least squares to correct, or adjust the time period used 
can detect if all the tstats of a timeseries are statistically significant


Analysis of timeseries variables prior to use in a linear regression

Will first need to test for a unit root (dickey fuller test) for each of the time series
i. If neither have a unit root can “safely” use a linear regression ii. If one of the two time series has a unit root, should not use linear regression iii. If both have a unit root and the time series is cointegrated, can use linear regression, if not cointegrated, should not use linear regression Dickey Fuller test is used to determine cointegration 


Cointegration definition

two time series are cointegrated if a longterm financial or economic relationship exists between them such that they do not diverge from each other without bound in the longrun
are cointegrated if they share a common trend 
