Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
58 Cards in this Set
- Front
- Back
Machine Learning |
Gives a computer the ability to improve its performance of a task over time |
|
Distributed ledger |
A shared database with a consensus mechanism, ensuring identical copies |
|
Simple Linear Regression |
Correlation, t-test, estimated slope coefficient, estimated intercept, confidence interval for predicted y-value |
|
Correlation |
Rxy = Cov of xy/(Sx)(Sy) |
|
T test for r (n-2df) |
t=r * sqrt n-2/ sqrt 1-r^2 |
|
Estimated Slope Coefficient |
Cov xy/ σ^2 x |
|
Estimated intercept |
bo = Y - b1X |
|
Confidence interval for predicated y-value |
Y +- tc x SE of forecast |
|
Multiple Regression |
Yi= bo + (b1 * X1) + (b2 * X2) + εi |
|
Heteroskedasticity |
Non-constant error variance. Detect with Breusch-Pagan test. Correct with White-corrected standard errors. |
|
Autocorrelation |
Correlation among error terms. Detect with Durban - Watson year; position autocorrelation if DW < dL. Correct by adjusting standard errors using Hansen Method. |
|
Multicollinearity |
High correlation among Xs. Detect if F-test significant, t-tests insignificant. Correct by dropping X variables. In order to be considered, both independent variables must be correlated and insignificant |
|
Model Misspecification |
Omitting a variable Variable should be transformed Incorrectly pooling data Using lagged dependent variable as independent variable Forecasting the past Measuring independent variables with error |
|
Effects of Misspecification |
Regression coefficients are biased and inconsistent, lack of confidence in hypothesis tests of the coefficients or in the model predictions. |
|
Supervised machine learning |
Inputs, outputs are identified. Relationships modeled from labeled data. |
|
Unsupervised machine learning |
Algorithm itself seeks to describe the structure of unlabeled data |
|
Linear Trend Model |
yt = bo + b1t + εt |
|
Log - linear trend model |
Ln(yt) = bo + b1t +εt |
|
Covariance Stationary |
Mean and variance don’t change over time. To determine if a time series is covariance stationary 1) plot data 2) run an AR model and test correlations and or 3) perform Dickey Fuller test |
|
Unit Root |
Coefficient I’m lagged dependent variable = 1. Series with unit root is not covariance stationary. First differencing will often eliminate the unit root. |
|
Auto regressive Model |
Specified correctly if autocorrelation or residuals is not significant |
|
Mean reverting level for AR |
bo/(1-b1) |
|
RMSE |
Square root of average squared error |
|
Random Walk Time Series |
xt = xt-1 + εt |
|
Seasonality |
Indicated by statically significant lagged error term. Correct by adding lagged term. |
|
ARCH |
Detected by estimating ε^2t = a0 + a1ε^2t-1+ μt |
|
Decision Trees |
Have a discrete distribution of risk, are sequential and do not accommodate correlated variables |
|
Scenario Analysis |
Has a discrete distribution of risk is not sequential does accommodate correlated variables |
|
Simulations |
Continuous distribution of risk Does not matter if sequential Does accommodate correlated variables |
|
Intercept Term |
b0 is the lines interesection with the y axis at X=0 |
|
SEE^2 |
Variance of the residuals = the square of the standard error of the estimate |
|
RSS |
Regression sum of squares Measures variation in the dependent variable that is explained by the dependent variable. Sum of squared differences between the predicted y-values and the mean of y. Sum (Yi-Yμ)^2 |
|
SST |
Total sum of squares Measures total variation in the dependent variable. SST is equal to the sum of the squared differences between the actual y-values and the mean or y. Sum (Yi-Yμ)^2 SST=RSS+SSE |
|
SSE |
Sum of Squared Errors Measures the unexplained variation in the dependent variable. Sum of squares residuals. Sum (Yi-Yμ)^2 |
|
F - statistic |
Assessed how well a set of independent variables as a group, explains the variation in the dependent variable. Always a one tailed test. |
|
P-Values |
Smallest level of significance for which the null hypothesis can be rejected. Can compare the p-value to the significance level: If the p-value is less than significance level, the null hypothesis can be rejected. If p> bull, hypothesis cannot be rejected |
|
F Stat in Multiple Regression |
F star is used to test the null hypothesis that all slope coefficients are jointly equal to 0. The higher, the better. |
|
R squared |
Explained variation/total variation. High r squared is good, means Regression is explaining alot |
|
Multiple R squared |
When you have multiple independent variables. Can also be called multiple coefficient of determination, or coefficient of determination |
|
Error Term in Linear Regression |
Difference of the actual outcome and the predicted value |
|
Confidence Interval for Y |
Y +- (t x standard error of forecast) |
|
F Stat calc |
MSR/MSE |
|
Models Accuracy in confusion matrix |
TP FP FN TN (TP+TN)/(TP+TN+FN+FP) Increase accuracy to avoid type I error |
|
Models Recall |
TP FP FN TN Go down vertically, high recall when cost of a type II error is large TP/(TP + FN) |
|
Models Precision |
TP FP FN TN Go Horizontal TP/(TP+FP) |
|
Models F score |
(2xPxR)/(P+R) |
|
A test with high p-values but F-test has small values leading to conflicting conclusions. |
Multicollinearity - when the independent variables individually are not statistically significant but the F-test suggests that the variables as a whole do a good job of explaining variation in the dependent variable |
|
Determine sales in following period using an AR(1) model |
Find mean reverting level change which is intercept/1-slope. If for example you are forecasting sales and the period change is above the mean reverting level, sales will decline in following periods |
|
Determine whether to add a season lag at 5% level of significance |
If p<.05, this is statistically significant and autocorrelation is present. |
|
How to Detect multicollinearity |
High R^2 low t values |
|
How to correct seasonality in a covariance stationary AR(1) model that is first differenced of the natural logarithm |
(lnt - lnt-1) = bo + b1(lnt-1 - lnt-2) + b2(lnt-4 - lnt-5) |
|
Mean reversion calc |
Bo/1-B1 |
|
Covariance Stationary |
Constant and finite mean and variance and covariance with leading and lagged variables aka the data is static. Conduct a dickey fuller to test for a unit root. |
|
Covariance Stationary |
Constant and finite mean and variance and covariance with leading and lagged variables aka the data is static. Conduct a dickey fuller to test for a unit root (stationarity) |
|
What will the R^2 of a regression be if it DOES NOT exhibit conditional heteroskedasticity? |
Close to 0. The non constant error variance is related to the independent variables for conditional. A low R2 indicates that the slope in the equation is very close to 0. |
|
Misspecified Functional Form |
Data is pooled across a time period when it should be split into two parts/two separate model. A pre- and post- models. |
|
What makes a regression unbiased and consistent? |
Unbiased if the expected value of estimates equals true population value and consistent if the estimate approaches the true population value as the sample size increases. The ommitted variables (that are correlated with the included variables) destroys both of the mentioned properties. |
|
Dickey Fuller/Engle-Granger Test |
Regress one data series on the other and examine the residuals for a unit root. If you reject the null, the error terms of the two data series are covariance stationary and cointegrated. |