Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
50 Cards in this Set
- Front
- Back
- 3rd side (hint)
Sample Covariance
|
the average value of the product of the deviations of observations on two random variables from their sample means
|
|
|
Sample Correlation
|
the measure of linear association; how closely related two data series are
r = 1; perfect positive correlation; 0 no linear correlation; =-1 perfect negative correlation |
|
|
Limitations of Correlation Analysis
|
a. Outliers = small numbers of observations at either extreme (small or large)
b. Spurious correlation – correlation between two variables that reflects chance relationship; correlation induced by a calculation; correlation between two variables arising from their relation to a third variable |
|
|
Dependent Variable
|
the "Y" in linear regression
"the variable you are seeking to explain" |
|
|
Independent Variable
|
the "X" in linear regression
"the variable you are using to explain changes in the dependent variable" |
|
|
Assumptions of Linear Regression
|
a. The relationship between X and Y is linear within the parameters of bo and b1 (raised only to the first power)
b. The independent variable, X, is not random c. The expected valuation of the error term, E, is zero d. The variance of the error term is the same for all observations e. The error term, E, is uncorrelated across observations f. The error term, E, is normally distributed |
|
|
Standard Error of Estimate (SEE)
|
determines how well a linear regression model captures the relationship between the dependent and independent variables; how certain we can be about a particular prediction of Y using the regression equation
|
|
|
Coefficient of determination
|
how well the independent variable explains the variation in the dependent variable; the fraction of the total variation in the dependent variable that is explained by the independent variable
when k = 1, R^2 where R^2 = explained variation / total variation |
|
|
Confidence Interval
|
an interval of values that is believed to include the true parameter value with a given degree of confidence
|
|
|
Type 1 Error
|
the chance of rejecting the null hypothesis when, in fact, it is true
|
|
|
Type 2 error
|
failing to reject the null hypothesis when, in fact, it is false
|
|
|
Analysis of Variance (ANOVA) in regression analysis
|
a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources
the usefulness of the independent variable or variables explaining variation in the dependent variable |
|
|
Limitations of regression analysis
|
a. Regression relations can change over time (just like correlations) = called parameter instability
b. Public knowledge of these regression relationships may negate future usefulness in the market c. If regression assumptions are violated, predictions based on linear regression may not be valid |
|
|
Assumptions of multiple regression model
|
a. Linear relationship between the dependent variable, Y and the independent variables X1, X2 Xn
b. Independent variables are not random c. The expected vale of the error term is 0 d. The error term is uncorrelated across observations e. The error term is normally distributed |
|
|
F-Test interpretation
|
tests the regression's overall significance
|
|
|
R squared vs. Adjusted R squared
|
R squared = the goodness of fit of the model
Adjusted R squared = needed with multiple independent variables, because it doesn't automatically increase upon the addition of another variable |
Adjusted R squared =
1 - {(n-1)/(n-k-1) * (1 - R squared)} |
|
Heteroskedasticity definition
|
when the assumption that variance of errors is constant, is violated (homoscedastic if the assumption is not constant)
|
a plot of data is heteroskedastic if: its variance from the line of fit differs at an increasing rate vs. a close fit to the line with homoscedasticity
|
|
Impact of Heteroskedasticity
|
will result in unreliable standard errors and therefore unreliable computed t-tests; often standard errors will be understated, resulting in inflated t-stats, and suggesting significance, when in fact there isn't significance
|
Test with Breusch and Pagan test = (n*R squared) compared to Chi squared at given significance level and df = to the number of independent variables
Correct with robust standard errors of generalized least squares |
|
Serial Correlation (autocorrelated)
|
when regression errors are correlated across observations
positive serial correlation is when a positive error for one observation increases the chance of a positive error for another observation |
Test for Serial Correlation with the durbin watson test
|
|
Interpretation of DW (Durbin-Watson) test
|
if the regression has no serial correlation: DW stat =2
If the regression residuals are positively serially correlated, DW is less than 2 If negatively serially correlated, than DW > 2 Inconclusive if DW stat lies between dl and du range |
DW stat = 2(1-r)
Correct for serial correlation by adjusting the coefficient standard errors via Hansen's method (does not remove completely, but diminishes its impact) |
|
Multicollinearity definition
|
occurs when two or more independent variables are highly, but not perfectly correlated with each other making the interpretation of the regression output problematic – regression coefficients become imprecise and unreliable – cannot distinguish the individual impacts of the independent variables on the dependent variable
|
|
|
Detecting Multicollinearity
|
A high R squared and a significant F-stat when t-stats are NOT significant is an indication of multicollinearity
|
Correct by excluding one or more of the regression (independent) variables
|
|
Model specification
|
refers to the set of variables included in the regression and the regression equation’s functional form
|
|
|
three types of model misspecification
|
1) misspecified fuctional form
2) regressors that are correlated with the error term 3) time-series misspecification: nonstationarity, random walks |
|
|
Impact of misspecification
|
All misspecifications invalidate statistical inference, causing regression coefficients to be inconsistent
|
|
|
Time-Series misspecification: non-stationarity
|
when a variable’s properties (mean and variance), are not constant through time
|
|
|
Time-Series misspecification: random walks
|
time series for which the best predictor of next period’s value is this period’s value (when b1=1)
|
|
|
Probit model
|
is based on a normal distribution, estimating the probability that Y = 1
|
probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome
|
|
Logit model
|
is based on the logistic distribution
|
probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome
|
|
Functional form misspecification
|
omitting an important variable, may need to transform a variable, pooling data from samples that should not be pooled (can see this graphically)
|
|
|
Regressors that are correlated with the error term - misspecification
|
including a lagged dependent variable as an independent variable, including a function of a dependent variable as an independent variable, or independent variables measured with error
|
|
|
Limitation of trend models - time series, linear or log-linear
|
Regression error for one period must be uncorrelated with the regression error for all other periods
this is the definition of serial correlation, which is the main limitation of time-series analysis |
|
|
Covariance Stationary
|
a key assumption of time series model - states that properties, like the mean and variance, do not change over tim
|
|
|
Requirements for a times series to be covariance stationary
|
The expected value of the time series must be constant and finite in all periods
The variance of the time series must be constant and finite in all periods The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods |
If a time series is not covariance stationary, the estimation results will have NO ECONOMIC MEANING
|
|
Autoregressive model (AR)
|
is a time series regressed on its own past values;
Xt = bo + b1*(Xt-1) + e |
|
|
How autocorrelation of the residuals can test whether an AR model fits the time series
|
a. Can test whether using the correct time-series model by testing whether the autocorrelations of the error term differ significantly from zero (are t-stats of the residuals significant?)
b. If it does differ from zero, the model is not specified correctly, and can use the sample autocorrelation of the residuals and their sample variance to estimate error autocorrelation c. Standard error of the residuals = 1/square root of T (t is the # of observations in the time series) |
|
|
Mean Reversion
|
a time series shows mean reversion if it tends to fall when its level is above its mean and rise when its level is below its mean
|
Xt = bo / (1-b1)
|
|
In-sample forecast errors
|
are the residuals from a fitted time-series model (within the time-frame specified)
|
|
|
Out-of-sample forecast errors
|
are the differences between the actual and predicted values – gives a sense for how well it will forecast in the future
|
|
|
Root mean squared error (RMSE)
|
the square root of the average squared error – the smallest RMSE the more accurate
|
|
|
Why coefficients of time-series models are unstable
|
a. Sample period used is crucial for appropriate statistical inference and forecasting accuracy
b. Models are only valid if it is a covariance stationary time-series |
|
|
Random Walk Process
|
a. Random walk is a time series in which the value of the series in one period is the value of the series in the precious period plus an unpredictable random error
i. Xt = Xt-1 + E (all because bo = 0 and b1 = 1) ii. Random walk with a drift is when b1=1 and bo is not equal to 0 |
|
|
Random Walk "Cure"
|
With an undefined mean reverting level and no upper bound for variance (it grows with t), resulting in no finite variance, or a time series that is not covariance stationary – means you cannot use standard regression analysis with a random walk, instead need to convert the data to a covariance stationary time series by first differencing (yt = xt – xt-1)
|
|
|
Unit Root
|
If b1 = 1, the time-series has a unit root, is also by definition a random walk, and therefore not covariance stationary
|
|
|
Impact of a Unit Root
|
The presence of a unit root makes t-stats unreliable, instead use the Dickey Fuller test (subtracting xt-1 from each side of the equation) and then reevaluate the t-stats of residuals until the model is properly specified
|
|
|
Dickey & Fuller Test
|
first difference (subtract xt-1 from each side) until t-stats of the residuals are no longer significant
adjusts/test ARs for unit roots |
|
|
How to test and adjust for seasonality in time-series
|
Adjust the equation to factor in a seasonal component (the time period, t that is significant when others are not within AR), and first difference until the t-stats of the residuals are no long significant
|
|
|
autoregressive conditional heteroskedasticity (ARCH)
|
the presence of heteroskedasticity within a time-series, where variance of the error term is NOT constant and depends on the independent variable – this will cause the standard errors to be unreliable, and therefore t-stats may indicate significance, when in fact they are not
TO CORRECT: will need to use generalized least squares to correct, or adjust the time period used |
can detect if all the t-stats of a time-series are statistically significant
|
|
Analysis of time-series variables prior to use in a linear regression
|
Will first need to test for a unit root (dickey fuller test) for each of the time series
i. If neither have a unit root can “safely” use a linear regression ii. If one of the two time series has a unit root, should not use linear regression iii. If both have a unit root and the time series is cointegrated, can use linear regression, if not cointegrated, should not use linear regression Dickey Fuller test is used to determine cointegration |
|
|
Cointegration definition
|
two time series are cointegrated if a long-term financial or economic relationship exists between them such that they do not diverge from each other without bound in the long-run
are cointegrated if they share a common trend |
|