Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
69 Cards in this Set
- Front
- Back
Covariance measure which two variables move together. Positive move together. Negative move oppsite |
. |
|
Sample covariance |
(Observation X - mean of X) *( observation of Y - mean of Y) / (sample size) n -1 |
|
Covariance is not meaningful since its very sensitive and has large range with squared units. So we find correlation coefficient. Which measures strength of linear relation(correlation)
Sample correlation coefficient |
Covariance of x and y / (sd of x * sd of y) *Used covariance of stock and sd of stock and index |
|
For corr coefficient +1 perfectly correlated -1 perfectly negatively correlated 0 no relationship On Scatter plot upward right is positively correlated. Down is negatively. 1 and -1 correlation does not necessarily mean a slope of 1 or -1. |
. |
|
Limitations to correlation analysis 1 outliers 2. Spurious correlation 3. Non linear relationships |
2. Appearence of linear relationship but there none. Like correlation by chance 3.could have non lonear relationship but no correlation. So this wouldn't detect |
|
Test of signifigance to test strength of correlation. *2nd way to test if correlqtion sig diff In oreac Q just gievn corr, observ, and critical value T test used for normally distributed to test if null should be rejected Test statistic : |
(Sample corrleation * SqRt (n-2)) / SqRt 1-samp correlation ^2 |
|
To make a decision with the t test. The test statistic is compared with critical t value for appropriate degrees of freedom and level of signifigance. For 2 test the decision rule is : |
Reject H0 if t critical is < t or if t< -1
If -t critical <= t <= t critical then cannot reject null |
|
Simple linear regression |
Explain the degree in which a variable differs from its mean in a dependent variable in terms of the variation in an independent variable |
|
Dependent variable |
Variable whose variation is explained by inDependent variable |
|
InDependent variable |
Variable used to EXplain the variation of Dependent variable |
|
If you want to explain stock return with GDP growth which is dep and indep variable |
Gdp- independent
Stock- dependent |
|
Linear regression requires assumptions los 7e |
. |
|
Regression line one of many possible lines drawn on scatter plot |
. |
|
Sum of squared errors (sse) For multiple regression how to calcualte |
Sum of squared vertical distances between estimated and actual y values SqRt mean squared error(MSE) |
|
Regression line is line that minimizes SSE and referred to as ordinary least squares |
. |
|
Slope efficient |
Describes change in y for one unit change in x on regression line |
|
Slope term calculated: |
COV xy / variance or Sd^2 |
|
Intercept |
Lines intersection with Y axis at X=0 |
|
Slope intercept: |
Variance - coefficient* cov |
|
Residual is difference between actual and predicted return |
. |
|
Intercept is estimate of dependent variable when independent variable is zero |
. |
|
Slope coefficient in a regression line is the stocks beta and it measures... |
.relative amt of systematic risk |
|
Intercept called ex post alpha. Measure of excess risk adjusted returns |
. |
|
Standard error of estimate SEE |
Best measure of variability in actual y and estimated y and indicator of strength between dep and independ. Sd of error terms in regression. It will be low if strong relationship and high if week
Measures variability in estimated y value and actual y. OR actual dependent variable about the estimated regression line
SD of error terms on regression. The smaller SEE the better fit of regression line |
|
Coefficient of determination R2 |
Percentage of total variation in dependent variable explained by independent variable
An R2 of .63 means variation of independent variable explains 63% of the dependent variation |
|
For simple linear regressions R2= |
Correlation coefficient ^2 |
|
A FAQ is whether a slope coefficient is diff from zero. The null H =0 and alt is H not equal to 0. If confidence interval doesnt include 0 the null is rejected and coeeficient is said to be diff from 0. |
. |
|
Confidemce interval 7.3 but dont need to memorize |
. |
|
Predicted values |
Values predicted by regression equation of dependent variable given an estimate of the independent variable.
|
|
For a simple regression predicted value of y(dependent); |
Intercept + slope coeeficient * forecasted value of independent variable |
|
Confidence interval for the predicted values of a dependent variable are calculated similar to conf interval for regression coeeficient. |
. |
|
Conf interval for predicted value 7i |
. |
|
Calculate 95% prediction interval(for t table) |
simple regression predicted value +- 2.03* standard error |
|
Analysis of variance ANOVA |
Statistical procedure to analyze Total variability of dependent variable Attributes variation to 1 of 2 sources Regression model or Residuals(error term) |
|
Tot sum of squares SST |
Total variation of dependent variable Sum of squared differences between actual Y and mean of Y Ie nit the same as variance |
|
Regression of sum of squares RSS |
Variation in dependent variable that is explained by independent
RSS is sum of squared differences between predicted Y values and mean of Y |
|
Sum of squared errors SSE |
Unexplained variation in dependent variable
Sum of squared residuals. Sum of squared vertical distances between y values and predicted y values on regression line |
|
Sst= |
Rss + SSE |
|
Mean regression sum of squares and mean squared error |
Appropriate sum if squares divided by degrees of freedom |
|
R2 off Anova table |
Total variation(STT) - unexplained variation(SSE)/ Total variation (sst) = explained variation( RSS)/ tot variation (SST) |
|
SEE is the sd of the regression error and is equal to |
SqRt of MSE mean squared error Or Sqrt SSE/n-2 |
|
An F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable. In multiple regression, the F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.
F stat= Always a 1 tailed test*** |
Msr mean regression sum of squares/ Mse mean squared error |
|
For simple linear regression, there is only one independent variable, so the F-test tests the same hypothesis as the t-test for statistical significance of the slope coefficient: |
. |
|
To determine whether b1 is statistically significant using the F-test, the calculated F-statistic is compared with the critical F-value, Fc , at the appropriate level of significance. The degrees of freedom for the numerator and denominator with one independent variable are:dfnumerator = k = 1dfdenominator = n – k – 1 = n – 2where:n = number of observationsThe decision rule for the F-test is:Decision rule: reject H0 if F > Fc |
. |
|
Rejection of the null hypothesis at a stated level of significance indicates that the independent variable is significantly different than zero, which is interpreted to mean that it makes a significant contribution to the explanation of the dependent variable. In simple linear regression, it tells us the same thing as the t-test of the slope coefficient |
. |
|
The bottom line is that the F-test is not as useful when we only have one independent variable because it tells us the same thing as the t-test of the slope coefficient. Make sure you know that fact for the exam, and then concentrate on the application of the F-test in multiple regression. |
. |
|
Limitations of regression analysis include the following:Linear relationships can change over time. This means that the estimation equation based on data from a specific time period may not be relevant for forecasts or predictions in another time period. This is referred to as parameter instability.Even if the regression model accurately reflects the historical relationship between the two variables, its usefulness in investment analysis will be limited if other market participants are also aware of and act on this evidence.If the assumptions underlying regression analysis do not hold, the interpretation and tests of hypotheses may not be valid |
. |
|
(From 2 sections later) Data collection as part of business intelligence often produces very large sets of data, both observations and attributes. Multiple regression |
. |
|
Big Data simply refers to these very large data sets which may include both structured (e.g., spreadsheet) data and unstructured (e.g., emails, text, or pictures) data. |
. |
|
Data analytics uses computer-based algorithms to analyze Big Data and obtain meaningful information about patterns and relationships in the data. |
. |
|
Machine learning (ML) refers to computer programs that learn from their errors and refine predictive models to improve their predictive accuracy over time. ML is one method used to extract useful information from Big Data. |
. |
|
ML terms:Target variable or tag variable
|
.is the dependent variable (i.e., the y-variable). Target variables can be continuous, categorical, or ordinal. |
|
Features |
are the independent variables (i.e., the x-variables). |
|
Feature engineering |
is curating a dataset of features for ML processing. |
|
Supervised learning uses labeled training data to guide the ML program in achieving superior forecasting accuracy. To forecast earnings manipulators, for example, a large collection of attributes could be provided for known manipulators and for known non-manipulators. A computer program could then be used to identify patterns that identify manipulators in another data set. |
. |
|
Typical data analytics tasks for supervised learning include classification and prediction |
. |
|
In unsupervised learning has no tag vatiable*, the ML program is not given labeled training data. Instead, inputs are provided without any conclusions about those inputs. In the absence of any tagged data, the program seeks out structure or interrelationships in the data. |
. |
|
Clustering is one example of the output of an unsupervised ML program. |
. |
|
Supervised learning algorithms are used for prediction (i.e., regression) and classification. When the y-variable is continuous, the appropriate approach is that of regression (used in a broad, ML context). When the y-variable is categorical (i.e., belonging to a category or classification) or ordinal (i.e., ordered or ranked), a classification model is used. |
. |
|
Linear (previously discussed in our coverage of multiple regressions) and nonlinear regression models can be used to generate forecasts. A special case of generalized linear model (GLM) is penalized regression.
Penalized regression models =
|
.seek to minimize forecasting errors by reducing the problem of overfitting. |
|
Penalized regression cont In summary, penalized regression models seek to reduce the number of features included in the model while retaining as much predictive information in the data as possible. Overfitting results when a large number of features (i.e., independent variables) are included in the data sample. The resulting model can use the “noise” in the dependent variables to improve the model fit. Overfitting the model in this way will decrease the accuracy of model forecasts on other (out-of-sample) data. To reduce the problem of overfitting, researchers may impose a penalty based on the number of features used by the model. Penalized regression models seek to minimize the sum of square errors (same as in multiple regression models) as well as a penalty value |
. |
|
Classification trees are appropriate when the target variable is categorical while regression trees are appropriate when the target is continuous. More typically, classification trees are used when the target is binary (e.g., IPO will be successful vs. not successful). Logit and probit models are used when the target is binary but are ill-suited when there are significant nonlinear relationships among variables. In such cases, classification trees may be used |
. |
|
A variant of a classification tree is a random forest. A random forest is a collection of randomly generated classification trees from the same data set. The process of using multiple classification trees uses crowdsourcing (majority wins) in determining the final classification. Because each tree only uses a subset of features, random forests can mitigate the problem of overfitting. Using random forests can increase the signal-to-noise ratio because errors across different trees tend to cancel each other out. |
. |
|
Neural networks are constructed with nodes connected by links. The input layer is the nodes with values for the features (independent variables). Each node uses an activation function, typically a nonlinear function, to generate a value from the weighted average of the input values from those nodes linked as inputs to each hidden node hyperparameters of the neural network Is the strucutre of the network like specifying a single hidden layer woth 4 nodes and an outer layer with 1. |
. |
|
Additional layers can improve the predictive accuracy of neural networks. Deep learning nets (DLNs) are neural networks with many hidden layers (often more than 20 |
. |
|
Recall that in case of unsupervised learning, there is no target variable; the task is to find a pattern in the features (independent or x-variables). Clustering Given a data set, clustering is the process of grouping observations into categories based on similarities in their attributes. For example, stocks can be assigned to different categories based on their past performances, rather than standard sector classifiers (e.g., finance, healthcare, technology, etc.). Clustering can be bottom-up or top-down. In the case of bottom-up clustering, we start with one observation as its own cluster and add other similar observations to that cluster, or form another non-overlapping cluster. Top-down clustering starts with one giant cluster and then partitions that cluster into smaller and smaller clusters. |
. |
|
unsupervised learning contd
Dimension reduction seeks to remove the noise when number of features in dataset is excessive
One method is principal component analysis (PCA) which |
.summarizes the information in a large number of CORrelated factors into a much smaller set of UNCorrelated factors. The first factor in PCA would be the most important factor in explaining the variation across observations. The second factor would be the second most important and so on, |
|
Penalized regression models |
seek to minimize forecasting errors by reducing the problem of overfitting. |
|
O correlation can still have relationship just not a LINEAR one +1 is strong direct relatiinship -1 is strong OFFSetting relationship |
. |