• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/58

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

58 Cards in this Set

  • Front
  • Back

Machine Learning

Gives a computer the ability to improve its performance of a task over time

Distributed ledger

A shared database with a consensus mechanism, ensuring identical copies

Simple Linear Regression

Correlation, t-test, estimated slope coefficient, estimated intercept, confidence interval for predicted y-value

Correlation

Rxy = Cov of xy/(Sx)(Sy)

T test for r (n-2df)

t=r * sqrt n-2/ sqrt 1-r^2

Estimated Slope Coefficient

Cov xy/ σ^2 x

Estimated intercept

bo = Y - b1X

Confidence interval for predicated y-value

Y +- tc x SE of forecast

Multiple Regression

Yi= bo + (b1 * X1) + (b2 * X2) + εi

Heteroskedasticity

Non-constant error variance. Detect with Breusch-Pagan test. Correct with White-corrected standard errors.

Autocorrelation

Correlation among error terms.


Detect with Durban - Watson year; position autocorrelation if DW < dL.


Correct by adjusting standard errors using Hansen Method.

Multicollinearity

High correlation among Xs.


Detect if F-test significant, t-tests insignificant.


Correct by dropping X variables.


In order to be considered, both independent variables must be correlated and insignificant

Model Misspecification

Omitting a variable


Variable should be transformed


Incorrectly pooling data


Using lagged dependent variable as independent variable


Forecasting the past


Measuring independent variables with error

Effects of Misspecification

Regression coefficients are biased and inconsistent, lack of confidence in hypothesis tests of the coefficients or in the model predictions.

Supervised machine learning

Inputs, outputs are identified. Relationships modeled from labeled data.

Unsupervised machine learning

Algorithm itself seeks to describe the structure of unlabeled data

Linear Trend Model

yt = bo + b1t + εt

Log - linear trend model

Ln(yt) = bo + b1t +εt

Covariance Stationary

Mean and variance don’t change over time. To determine if a time series is covariance stationary 1) plot data 2) run an AR model and test correlations and or 3) perform Dickey Fuller test

Unit Root

Coefficient I’m lagged dependent variable = 1. Series with unit root is not covariance stationary. First differencing will often eliminate the unit root.

Auto regressive Model

Specified correctly if autocorrelation or residuals is not significant

Mean reverting level for AR

bo/(1-b1)

RMSE

Square root of average squared error

Random Walk Time Series

xt = xt-1 + εt

Seasonality

Indicated by statically significant lagged error term. Correct by adding lagged term.

ARCH

Detected by estimating


ε^2t = a0 + a1ε^2t-1+ μt

Decision Trees

Have a discrete distribution of risk, are sequential and do not accommodate correlated variables

Scenario Analysis

Has a discrete distribution of risk


is not sequential


does accommodate correlated variables

Simulations

Continuous distribution of risk


Does not matter if sequential


Does accommodate correlated variables

Intercept Term

b0 is the lines interesection with the y axis at X=0

SEE^2

Variance of the residuals = the square of the standard error of the estimate

RSS

Regression sum of squares


Measures variation in the dependent variable that is explained by the dependent variable. Sum of squared differences between the predicted y-values and the mean of y.


Sum (Yi-Yμ)^2

SST

Total sum of squares


Measures total variation in the dependent variable. SST is equal to the sum of the squared differences between the actual y-values and the mean or y.


Sum (Yi-Yμ)^2


SST=RSS+SSE

SSE

Sum of Squared Errors


Measures the unexplained variation in the dependent variable. Sum of squares residuals.


Sum (Yi-Yμ)^2

F - statistic

Assessed how well a set of independent variables as a group, explains the variation in the dependent variable. Always a one tailed test.

P-Values

Smallest level of significance for which the null hypothesis can be rejected. Can compare the p-value to the significance level:


If the p-value is less than significance level, the null hypothesis can be rejected.


If p> bull, hypothesis cannot be rejected

F Stat in Multiple Regression

F star is used to test the null hypothesis that all slope coefficients are jointly equal to 0. The higher, the better.

R squared

Explained variation/total variation. High r squared is good, means Regression is explaining alot

Multiple R squared

When you have multiple independent variables. Can also be called multiple coefficient of determination, or coefficient of determination

Error Term in Linear Regression

Difference of the actual outcome and the predicted value

Confidence Interval for Y

Y +- (t x standard error of forecast)

F Stat calc

MSR/MSE

Models Accuracy in confusion matrix

TP FP


FN TN


(TP+TN)/(TP+TN+FN+FP)


Increase accuracy to avoid type I error

Models Recall

TP FP


FN TN


Go down vertically, high recall when cost of a type II error is large


TP/(TP + FN)

Models Precision

TP FP


FN TN


Go Horizontal


TP/(TP+FP)

Models F score

(2xPxR)/(P+R)

A test with high p-values but F-test has small values leading to conflicting conclusions.

Multicollinearity - when the independent variables individually are not statistically significant but the F-test suggests that the variables as a whole do a good job of explaining variation in the dependent variable

Determine sales in following period using an AR(1) model

Find mean reverting level change which is intercept/1-slope. If for example you are forecasting sales and the period change is above the mean reverting level, sales will decline in following periods

Determine whether to add a season lag at 5% level of significance

If p<.05, this is statistically significant and autocorrelation is present.

How to Detect multicollinearity

High R^2 low t values

How to correct seasonality in a covariance stationary AR(1) model that is first differenced of the natural logarithm

(lnt - lnt-1) = bo + b1(lnt-1 - lnt-2) + b2(lnt-4 - lnt-5)

Mean reversion calc

Bo/1-B1

Covariance Stationary

Constant and finite mean and variance and covariance with leading and lagged variables aka the data is static.


Conduct a dickey fuller to test for a unit root.

Covariance Stationary

Constant and finite mean and variance and covariance with leading and lagged variables aka the data is static.


Conduct a dickey fuller to test for a unit root (stationarity)

What will the R^2 of a regression be if it DOES NOT exhibit conditional heteroskedasticity?

Close to 0. The non constant error variance is related to the independent variables for conditional. A low R2 indicates that the slope in the equation is very close to 0.

Misspecified Functional Form

Data is pooled across a time period when it should be split into two parts/two separate model. A pre- and post- models.

What makes a regression unbiased and consistent?

Unbiased if the expected value of estimates equals true population value and consistent if the estimate approaches the true population value as the sample size increases. The ommitted variables (that are correlated with the included variables) destroys both of the mentioned properties.

Dickey Fuller/Engle-Granger Test

Regress one data series on the other and examine the residuals for a unit root. If you reject the null, the error terms of the two data series are covariance stationary and cointegrated.