• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/50

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

50 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
Sample Covariance
the average value of the product of the deviations of observations on two random variables from their sample means
Sample Correlation
the measure of linear association; how closely related two data series are

r = 1; perfect positive correlation; 0 no linear correlation; =-1 perfect negative correlation
Limitations of Correlation Analysis
a. Outliers = small numbers of observations at either extreme (small or large)
b. Spurious correlation – correlation between two variables that reflects chance relationship; correlation induced by a calculation; correlation between two variables arising from their relation to a third variable
Dependent Variable
the "Y" in linear regression

"the variable you are seeking to explain"
Independent Variable
the "X" in linear regression

"the variable you are using to explain changes in the dependent variable"
Assumptions of Linear Regression
a. The relationship between X and Y is linear within the parameters of bo and b1 (raised only to the first power)
b. The independent variable, X, is not random
c. The expected valuation of the error term, E, is zero
d. The variance of the error term is the same for all observations
e. The error term, E, is uncorrelated across observations
f. The error term, E, is normally distributed
Standard Error of Estimate (SEE)
determines how well a linear regression model captures the relationship between the dependent and independent variables; how certain we can be about a particular prediction of Y using the regression equation
Coefficient of determination
how well the independent variable explains the variation in the dependent variable; the fraction of the total variation in the dependent variable that is explained by the independent variable

when k = 1, R^2 where R^2 = explained variation / total variation
Confidence Interval
an interval of values that is believed to include the true parameter value with a given degree of confidence
Type 1 Error
the chance of rejecting the null hypothesis when, in fact, it is true
Type 2 error
failing to reject the null hypothesis when, in fact, it is false
Analysis of Variance (ANOVA) in regression analysis
a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources

the usefulness of the independent variable or variables explaining variation in the dependent variable
Limitations of regression analysis
a. Regression relations can change over time (just like correlations) = called parameter instability
b. Public knowledge of these regression relationships may negate future usefulness in the market
c. If regression assumptions are violated, predictions based on linear regression may not be valid
Assumptions of multiple regression model
a. Linear relationship between the dependent variable, Y and the independent variables X1, X2 Xn
b. Independent variables are not random
c. The expected vale of the error term is 0
d. The error term is uncorrelated across observations
e. The error term is normally distributed
F-Test interpretation
tests the regression's overall significance
R squared vs. Adjusted R squared
R squared = the goodness of fit of the model

Adjusted R squared = needed with multiple independent variables, because it doesn't automatically increase upon the addition of another variable
Adjusted R squared =
1 - {(n-1)/(n-k-1) * (1 - R squared)}
Heteroskedasticity definition
when the assumption that variance of errors is constant, is violated (homoscedastic if the assumption is not constant)
a plot of data is heteroskedastic if: its variance from the line of fit differs at an increasing rate vs. a close fit to the line with homoscedasticity
Impact of Heteroskedasticity
will result in unreliable standard errors and therefore unreliable computed t-tests; often standard errors will be understated, resulting in inflated t-stats, and suggesting significance, when in fact there isn't significance
Test with Breusch and Pagan test = (n*R squared) compared to Chi squared at given significance level and df = to the number of independent variables

Correct with robust standard errors of generalized least squares
Serial Correlation (autocorrelated)
when regression errors are correlated across observations

positive serial correlation is when a positive error for one observation increases the chance of a positive error for another observation
Test for Serial Correlation with the durbin watson test
Interpretation of DW (Durbin-Watson) test
if the regression has no serial correlation: DW stat =2

If the regression residuals are positively serially correlated, DW is less than 2

If negatively serially correlated, than DW > 2

Inconclusive if DW stat lies between dl and du range
DW stat = 2(1-r)

Correct for serial correlation by adjusting the coefficient standard errors via Hansen's method (does not remove completely, but diminishes its impact)
Multicollinearity definition
occurs when two or more independent variables are highly, but not perfectly correlated with each other making the interpretation of the regression output problematic – regression coefficients become imprecise and unreliable – cannot distinguish the individual impacts of the independent variables on the dependent variable
Detecting Multicollinearity
A high R squared and a significant F-stat when t-stats are NOT significant is an indication of multicollinearity
Correct by excluding one or more of the regression (independent) variables
Model specification
refers to the set of variables included in the regression and the regression equation’s functional form
three types of model misspecification
1) misspecified fuctional form

2) regressors that are correlated with the error term

3) time-series misspecification: nonstationarity, random walks
Impact of misspecification
All misspecifications invalidate statistical inference, causing regression coefficients to be inconsistent
Time-Series misspecification: non-stationarity
when a variable’s properties (mean and variance), are not constant through time
Time-Series misspecification: random walks
time series for which the best predictor of next period’s value is this period’s value (when b1=1)
Probit model
is based on a normal distribution, estimating the probability that Y = 1
probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome
Logit model
is based on the logistic distribution
probit and logit models estimate the probability of a discrete outcome given the values of the independent variables used to explain the outcome
Functional form misspecification
omitting an important variable, may need to transform a variable, pooling data from samples that should not be pooled (can see this graphically)
Regressors that are correlated with the error term - misspecification
including a lagged dependent variable as an independent variable, including a function of a dependent variable as an independent variable, or independent variables measured with error
Limitation of trend models - time series, linear or log-linear
Regression error for one period must be uncorrelated with the regression error for all other periods

this is the definition of serial correlation, which is the main limitation of time-series analysis
Covariance Stationary
a key assumption of time series model - states that properties, like the mean and variance, do not change over tim
Requirements for a times series to be covariance stationary
The expected value of the time series must be constant and finite in all periods

The variance of the time series must be constant and finite in all periods

The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods
If a time series is not covariance stationary, the estimation results will have NO ECONOMIC MEANING
Autoregressive model (AR)
is a time series regressed on its own past values;

Xt = bo + b1*(Xt-1) + e
How autocorrelation of the residuals can test whether an AR model fits the time series
a. Can test whether using the correct time-series model by testing whether the autocorrelations of the error term differ significantly from zero (are t-stats of the residuals significant?)

b. If it does differ from zero, the model is not specified correctly, and can use the sample autocorrelation of the residuals and their sample variance to estimate error autocorrelation

c. Standard error of the residuals = 1/square root of T (t is the # of observations in the time series)
Mean Reversion
a time series shows mean reversion if it tends to fall when its level is above its mean and rise when its level is below its mean
Xt = bo / (1-b1)
In-sample forecast errors
are the residuals from a fitted time-series model (within the time-frame specified)
Out-of-sample forecast errors
are the differences between the actual and predicted values – gives a sense for how well it will forecast in the future
Root mean squared error (RMSE)
the square root of the average squared error – the smallest RMSE the more accurate
Why coefficients of time-series models are unstable
a. Sample period used is crucial for appropriate statistical inference and forecasting accuracy

b. Models are only valid if it is a covariance stationary time-series
Random Walk Process
a. Random walk is a time series in which the value of the series in one period is the value of the series in the precious period plus an unpredictable random error
i. Xt = Xt-1 + E (all because bo = 0 and b1 = 1)
ii. Random walk with a drift is when b1=1 and bo is not equal to 0
Random Walk "Cure"
With an undefined mean reverting level and no upper bound for variance (it grows with t), resulting in no finite variance, or a time series that is not covariance stationary – means you cannot use standard regression analysis with a random walk, instead need to convert the data to a covariance stationary time series by first differencing (yt = xt – xt-1)
Unit Root
If b1 = 1, the time-series has a unit root, is also by definition a random walk, and therefore not covariance stationary
Impact of a Unit Root
The presence of a unit root makes t-stats unreliable, instead use the Dickey Fuller test (subtracting xt-1 from each side of the equation) and then reevaluate the t-stats of residuals until the model is properly specified
Dickey & Fuller Test
first difference (subtract xt-1 from each side) until t-stats of the residuals are no longer significant

adjusts/test ARs for unit roots
How to test and adjust for seasonality in time-series
Adjust the equation to factor in a seasonal component (the time period, t that is significant when others are not within AR), and first difference until the t-stats of the residuals are no long significant
autoregressive conditional heteroskedasticity (ARCH)
the presence of heteroskedasticity within a time-series, where variance of the error term is NOT constant and depends on the independent variable – this will cause the standard errors to be unreliable, and therefore t-stats may indicate significance, when in fact they are not

TO CORRECT: will need to use generalized least squares to correct, or adjust the time period used
can detect if all the t-stats of a time-series are statistically significant
Analysis of time-series variables prior to use in a linear regression
Will first need to test for a unit root (dickey fuller test) for each of the time series
i. If neither have a unit root can “safely” use a linear regression
ii. If one of the two time series has a unit root, should not use linear regression
iii. If both have a unit root and the time series is cointegrated, can use linear regression, if not cointegrated, should not use linear regression

Dickey Fuller test is used to determine cointegration
Cointegration definition
two time series are cointegrated if a long-term financial or economic relationship exists between them such that they do not diverge from each other without bound in the long-run

are cointegrated if they share a common trend