• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/85

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

85 Cards in this Set

  • Front
  • Back

Sample Covariance

Statistical measure of the degree to which two variables move together




[Sum of (Xi - X mean)(Yi - Y mean)]/(n-1)

Sample Correlation Coefficient

Measure of the strength of the linear relationship (correlation) between two variables. Ranges from -1 to 1. Measured by (r)




(Covariance of X and Y)/[(Sample Standard Deviation of X)(Sample Standard Deviation of Y)

Limitations on Correlation Analysis

1. Outliers can greatly influence


2. Nonlinear relationships are not measured.


3. Spurious correlations that have no logical backing can occur.

Testing Statistical Significance of Correlation Coefficient

t = (r*sqrt(n-2)) / sqrt (1-r^2)

General Form of Simple Linear Regression

Dependent Variable = Intercept + Slope Coefficient * Independent Variable + Residual Error

Independent Variable vs. Dependent Variable for Linear Regression

Independent (X): Variable used to explain the variation in the dependent variable (Y)

Assumptions Made With Simple Linear Regression
1. There is a linear relationship between dependent and independent variable.

2. Independent variable is uncorrelated with residuals


3. The expected value of the residual term is zero


4. The variance of the residual term is constant for all variables


5. The residual term is independently and normally distributed



Sum of Squared Errors


(SSE)

The sum of the squared vertical distances between the estimated and actual Y-Values

Slope Coefficient

b1 The change in Y (Dependent variable) for a one unit change in X. Beta of a stock is the slope coefficient.

Degrees of Freedom in Linear Regression

(n - k - 1)




k: the number of independent variable used in a linear regression.

Test of Statistical Significance

Test to see if correlation does not equal 0.

Conducting Hypothesis Tests for Linear Regressions

If the test does not yield 0 when calculating


slope coefficient (b1) +/- ( t * standard error of coefficient)


then do not reject the hypothesis.

Standard Error of Estimate (SEE)

Measures the variability of the actual Y-values relative to estimated Y-Values from a regression equation, smaller error, the better the line "fits". The standard deviation of of the regression error terms.




for simple linear regression:


SEE = sqrt ((SSE)/(n-2))

Coefficient of Determintion

Proportion of the total variation of the dependent variable explained by the independent variable. An R^2 of .63 indicates that 63% of the variance of the variation in the dependent variable is explained by the independent variable.




R^2 = RSS/SST

Regression Sum of Squares

RSS: Measures the variation in the dependent variable explained by the independent variable. Sum of the squared distances between the predicted Y-Values and the mean of Y

Total Sum of Squares

Measures the total variation in the dependent variable. SST is equal to the sum of the squared differences between the actual Y-values and the mean of Y:




Sum of (Yi - Mean Y)^2




SST = Regression sum of squares + Sum of Squared Errors




Not the same as variance, which is SST/n-1

ANOVA Tables

Mean Regression Sum of Squares

MSR: Regression sum of squares/k

Mean Squared Error

Sum of Squared Errors/(n - k - 1)

Calculating Coefficient of Determination (R^2)

R^2 = (Total Variation (SST) - Unexplained Variation (SSE))/ Total Variation (SST)

Simple Linear Regression (R^2) Calculation

covXY ^ 2

F-Statistic Calculation

MSR/MSE

Limitations of Linear Regression Testing

1. Parameter instability (Relationships change over time)


2. Public knowledge of relationships eliminate usefulness to traders


3. Assumptions can be violated.

Coefficient of Variation Next to Dependent Variable

What percent increase/decrease in a one unit change in the independent variable.

Alpha in Regression Lines

The intercept is a measure of alpha

F-Statistic

Always one-tailed. Assesses how well the set of independent variables, as a group, explains the variation in the dependent variable.



MSR/MSE

Standard Error of Forecast

Value to be used for confidence testing.

t-Statistic For Multiple Linear Regression Formula

(Estimated Slope Coefficient - Hypothesized Slop e Coefficient)/ Standard Error



If you are testing to make sure null does not equal 0, then hypothesized slope is 0 for this calculation. Reject if absolute value of t-stat is is greater than the absolute value of the critical stat.

p-Values for Multiple Linear Regression

The probability to the right of the computed value (the confidence interval). Reject the null if the p-value is less than the significance value. Compare p-values to alpha. Will reflect whether it was a 1 or two tail test.




p-values default to a two tailed test. To test significance in a 1 tail, divide p value by 2.



High p-values in Multiple Linear Regression

If p-value is above significance level, then it is not significantly significant

Critical t-Stat Depends On:

1. Significance (alpha) = 1 - confidence level


2. Degrees of freedom: n - k - 1


3. Confidence internals are always two tailed

ANOVA Tables: Sum of Squares

RSS (explained variance) + SSE (unexplained variance) = SST (total variance)

F-Statistics in Multiple Linear Regressions

Tests the ratio of the average explained variation over the average unexplained variation. Tests whether any of the independent variables explain variation in the dependent variable. One tailed test.


Reject null if F-statistic exceeds critical value.

Determining Degrees of Freedom for F-Statistics

Numerator: explained variation per independent variable (RSS/k or MSR)




Denominator: (SSE/(n - k - 1) Amount of unexplained variation.

F-Statistic Testing

MSR/MSE = F-statistic




High F statistic means that there is a lot of average explained variation for every unit of unexplained variation.

Adjusted R^2

Unadjusted R^2 will always go up as more variables are added, adjusted R^2 corrects this issue. The model with higher adjusted R^2 is always the preference over one with higher unadjusted R^2




Adjusted R^2 = 1 - [(n-1)/(n - k - 1) * (1 - r^2)]

R^2

Explained Variance (RSS)/ Total Variance (SST)




Always goes up as new variables are added.

Dummy Variables

Can only take the values of 0 or 1. An example is calendar studies.

Dummy Variable Trap

Always use n-1 dummy variables to avoid multicollinearity. ex. 3 dummies for 4 quarters in a year, or 11 dummies for 12 months in the year.




Need at least one as a base.


Intercept is equal to the omitted dummy.

Heteroskedasticity Definition/Assumption Violation

Violates assumption that error term has constant variance




Heteroskedasticity means that the error term variance is non-constant.

Serial Correlation Definition/Assumption Violation

Violates assumption error terms are not correlated with each other




Each term tends to move with the previous term. Positive Autocorrelation means they move in the same direction (more common than negative autocorrelation)

Multicollinearity Definition/Assumption Violation

Violates assumption that their is no exact linear relationship among "X" variables.




Two or more X variables are highly correlated with each other.

Effects of Heteroskedasticity

Estimated value of slope is unimpacted, but the standard error of the slope tend to be too small, causing t-stat to rise, reject null (type I error)

Effects of Serial Correlation

Slope is unaffected, standard errors are too low, t-statistic is inflated, increasing the chances of type I errors.

Effects of Multicollinearity

Inflates standard errors, reducing t-stats, and increases your chance of type II errors. Variables appear falsely unimportant.

How to Detect Heteroskedasticity

Either look at a scatter plot or conduct a Breush-Pagan test. Regress squared residuals against X. Compute R^2, hoping the value is low since you want errors to 0. Test null that R^2 of this second regression is 0. Compare against critical chi-squared variable.

How to Detect Serial Correlation

Use a scatter plot




Use the Durbin-Watson Statistic (roughly 2(1 - r)

How to Detect Multicollinearity

F-stat says the model is valid and R^2 is high, yet all t-stats are insignificant.




If correlation between two X values is high, then there is likely multicollinearity. Only applies to regressions with 2 independent variables.

How to Correct Heteroskedasticity

1. Use robust standard errors (white corrected standard errors). This inflates standard errors, lowers t-stats, and makes conclusions more accurate.




2. Use generalized least squares (change the model)

How to Correct Serial Correlation

Use the Hansen method, inflates standard errors and allows for recalculation of t-statistics. Chance of type I error declines. Hansen method can solve conditional heteroskedacity as well.

How to Correct Multicollinearity

Omit one or more X variables.

Conditional vs. Unconditional Heteroskedasticity

Unconditional: Variance of the error is not constant, but it is not dependent on independent variable. Does not cause major problems.




Conditional: Variance is not constant and is related to the value of the independent variable. ex. the higher the value of X, the higher the variance of errors. Does cause issues.

End Points of Durbin-Watson Model

0: perfect negative correlation


2: no correlation


4: perfect positive correlations.




If Durbin watson computed is less than the Durbin lower, then there is positive serial correlation.

Functional Form Misspecifications

1. Important variables are omitted


2. Variables are not transformed properly


3. Data may be pooled improperly

Time-Series Misspecifications

1. X is lagged Y with serial correlation present

2. Forecasting the past


3. Measurement error.

Probit Model

Estimate the probability of default given values of X based on the normal distribution

Logit Models

Estimate probability of default based on logistic distribution

Discriminant Model

Produce a score or ranking used to classify into categories (credit score)

Standard Error of Estimate Calculation

sqrt ( SSE/(1 - n - k))

Log-Linear Trend Model Formula

ln y(t) = b0 + b1(t) + error




Appropriate for exponential data.

Covariance Stationary

Requirement for using Autoregressive Models.




Requires:


1. Constant and finite expected value


2. Constant and finite variance


3. Constant and finite covariance with leading or lagged variables

Autoregressive Models

The dependent variable is regressed on prior values of itself. Requires Covariance Stationarity.

Testing for Serial Correlation in AR Models

Use t-statistic. If t-stat is above 2, the model is not correctly specified. Add a lag.

Autocorrelation Tables

Autocorrelation column shows correlation between x at lag one and x at lag t - 1.




Standard Error (SE): 1/sqrt(n)

Testing if AR time-series Model is Correctly Specified

1. Estimate the AR model being evaluated using linear regression.


2. Calculate the autocorrelations's of the model's residuals


3. Test whether autocorrelations are significant.

Mean Reverting Level for AR(1)

b0/ (1 - b1)



A time series is mean reverting if the dependent variable falls to a mean. If b1 = 1, then you are dividing by zero and essentially saying:


x(t) = b0 + 1*x(t-1) + error


Since error is impossible to estimate, this would be useless.




Forecast values will always revert to MRL

Root Mean Squared Error (RMSE)

Variable used to compare accuracy of AR models in forecasting out-of-sample values. Use lowest RMSE model when predicting out of sample values.




Square root of MSE = RMSE = SEE = Standard deviation of the error term.

Instability of Time-Series Models

Estimated coefficients can change over time. Larger sample sizes are more dependable, but also contain data that may no longer be relevant.

Random Walk

Value in time series data that is equal to the value of another period + error. Does not have mean reverting level and is not stationary.

Unit Roots

Occur when the coefficient of a lagged dependent variable is equal to 1. This will not be covariance stationary. Data must be differenced before using in time series model.




Unit roots can be tested for using Dickey-Fuller Test

Seasonality

Tested by calculating the autocorrelations of the error terms. Add seasonal coefficient to counteract.

Dickey-Fuller Test

Tests to see if there is a unit root (b1 = 0). If the null is rejected (b1 does not = 0) the series does not have a unit root.

Cointegration

Two variables are linked to the same same trend and that relationship is not expected to change. Model can still be good under this circumstance, ex. a stock and the overall market are correlated.

One Period Ahead vs. Two Period Ahead Forecasts

One: x(t+1) = b0 + b1x(t)




Two: x(t+2) = b0 + b1x(t+1)

Standard Error of Autocorrelations Calculation

1/ sqrt(number of observations)

Steps for Running a Simulation

1. Determine Problematic Variables (uncertain inputs that influence the value of an investment)


2. Define probability distributions for these variables


3. Check for correlation among these variables


4. Run the simulation

How To Treat Correlation Across Variables in A Simulation

1. Allow only one variable that has the greatest influence to vary


2. Build the rules of correlation into the model

Three Ways to Define the Probability Distributions for a Simulation's Variables

1. Use historical data

2. Use cross sectional data (data from a similar firm)


3. Pick a distribution and use your discretion to estimate the parameters (ad-hoc)

Advantages of Using Simulations in Decision Making

1. Input accuracy can be greater, since they are manually input


2. Expected value is a distribution, not just a point estimate.

Constraints on Simulations

1. Book Value Constraints (regulatory reasons)

2. Earnings and cash flow constraints


3. Market value contraints

Issues in Simulations

1. Garbage in, garbage out


2. Real data may not follow normal distributions


3. Non-stationarity (what worked yesterday, may not work tomorrow)


4. Dynamic correlations



Scenario Analysis

Defines best case, worst case, and base case. Sum of probabilities <1, ignores all other possibilities. Better suited for discrete outcomes.

Decision Trees

Best suited for sequential and discrete risks. Correlated risks are difficult to model in Decision trees. Best with historical data since you need to estimate probabilities at each node.

Simulations

Considers all possible outcomes. Better suited for continuous risks. Allow for explicitly modelling correlations of input variables.

Simulations vs. Scenario Analysis vs. Decision Trees Chart.

a