• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/69

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

69 Cards in this Set

  • Front
  • Back

Covariance measure which two variables move together.


Positive move together. Negative move oppsite

.

Sample covariance

(Observation X - mean of X) *( observation of Y - mean of Y)


/ (sample size) n -1

Covariance is not meaningful since its very sensitive and has large range with squared units. So we find correlation coefficient. Which measures strength of linear relation(correlation)



Sample correlation coefficient

Covariance of x and y / (sd of x * sd of y)



*Used covariance of stock and sd of stock and index

For corr coefficient


+1 perfectly correlated -1 perfectly negatively correlated


0 no relationship



On Scatter plot upward right is positively correlated. Down is negatively. 1 and -1 correlation does not necessarily mean a slope of 1 or -1.

.

Limitations to correlation analysis



1 outliers



2. Spurious correlation



3. Non linear relationships

2. Appearence of linear relationship but there none. Like correlation by chance



3.could have non lonear relationship but no correlation. So this wouldn't detect

Test of signifigance to test strength of correlation.


*2nd way to test if correlqtion sig diff


In oreac Q just gievn corr, observ, and critical value




T test used for normally distributed to test if null should be rejected


Test statistic :

(Sample corrleation * SqRt (n-2)) /


SqRt 1-samp correlation ^2

To make a decision with the t test.


The test statistic is compared with critical t value for appropriate degrees of freedom and level of signifigance. For 2 test the decision rule is :

Reject H0 if t critical is < t or if t< -1



If -t critical <= t <= t critical then cannot reject null

Simple linear regression

Explain the degree in which a variable differs from its mean in a dependent variable in terms of the variation in an independent variable

Dependent variable

Variable whose variation is explained by inDependent variable

InDependent variable

Variable used to EXplain the variation of Dependent variable

If you want to explain stock return with GDP growth which is dep and indep variable

Gdp- independent



Stock- dependent

Linear regression requires assumptions los 7e

.

Regression line one of many possible lines drawn on scatter plot

.

Sum of squared errors (sse)



For multiple regression how to calcualte

Sum of squared vertical distances between estimated and actual y values



SqRt mean squared error(MSE)

Regression line is line that minimizes SSE and referred to as ordinary least squares

.

Slope efficient

Describes change in y for one unit change in x on regression line

Slope term calculated:

COV xy / variance or Sd^2

Intercept

Lines intersection with Y axis at X=0

Slope intercept:

Variance - coefficient* cov

Residual is difference between actual and predicted return

.

Intercept is estimate of dependent variable when independent variable is zero

.

Slope coefficient in a regression line is the stocks beta and it measures...

.relative amt of systematic risk

Intercept called ex post alpha. Measure of excess risk adjusted returns

.

Standard error of estimate SEE

Best measure of variability in actual y and estimated y and indicator of strength between dep and independ. Sd of error terms in regression. It will be low if strong relationship and high if week



Measures variability in estimated y value and actual y.


OR actual dependent variable about the estimated regression line



SD of error terms on regression. The smaller SEE the better fit of regression line

Coefficient of determination R2

Percentage of total variation in dependent variable explained by independent variable



An R2 of .63 means variation of independent variable explains 63% of the dependent variation

For simple linear regressions R2=

Correlation coefficient ^2

A FAQ is whether a slope coefficient is diff from zero. The null H =0 and alt is H not equal to 0. If confidence interval doesnt include 0 the null is rejected and coeeficient is said to be diff from 0.

.

Confidemce interval 7.3 but dont need to memorize

.

Predicted values

Values predicted by regression equation of dependent variable given an estimate of the independent variable.



For a simple regression predicted value of y(dependent);

Intercept + slope coeeficient * forecasted value of independent variable

Confidence interval for the predicted values of a dependent variable are calculated similar to conf interval for regression coeeficient.

.

Conf interval for predicted value 7i

.

Calculate 95% prediction interval(for t table)

simple regression predicted value +- 2.03* standard error



Analysis of variance ANOVA

Statistical procedure to analyze Total variability of dependent variable



Attributes variation to 1 of 2 sources


Regression model or


Residuals(error term)

Tot sum of squares SST

Total variation of dependent variable



Sum of squared differences between actual Y and mean of Y



Ie nit the same as variance

Regression of sum of squares RSS

Variation in dependent variable that is explained by independent



RSS is sum of squared differences between predicted Y values and mean of Y

Sum of squared errors SSE

Unexplained variation in dependent variable



Sum of squared residuals. Sum of squared vertical distances between y values and predicted y values on regression line

Sst=

Rss + SSE

Mean regression sum of squares and mean squared error

Appropriate sum if squares divided by degrees of freedom

R2 off Anova table

Total variation(STT) - unexplained variation(SSE)/


Total variation (sst)


=


explained variation( RSS)/ tot variation (SST)

SEE is the sd of the regression error and is equal to

SqRt of MSE mean squared error


Or


Sqrt SSE/n-2

An F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable. In multiple regression, the F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.



F stat=


Always a 1 tailed test***

Msr mean regression sum of squares/ Mse mean squared error

For simple linear regression, there is only one independent variable, so the F-test tests the same hypothesis as the t-test for statistical significance of the slope coefficient:

.

To determine whether b1 is statistically significant using the F-test, the calculated F-statistic is compared with the critical F-value, Fc , at the appropriate level of significance. The degrees of freedom for the numerator and denominator with one independent variable are:dfnumerator = k = 1dfdenominator = n – k – 1 = n – 2where:n = number of observationsThe decision rule for the F-test is:Decision rule: reject H0 if F > Fc

.

Rejection of the null hypothesis at a stated level of significance indicates that the independent variable is significantly different than zero, which is interpreted to mean that it makes a significant contribution to the explanation of the dependent variable. In simple linear regression, it tells us the same thing as the t-test of the slope coefficient

.

The bottom line is that the F-test is not as useful when we only have one independent variable because it tells us the same thing as the t-test of the slope coefficient. Make sure you know that fact for the exam, and then concentrate on the application of the F-test in multiple regression.

.

Limitations of regression analysis include the following:Linear relationships can change over time. This means that the estimation equation based on data from a specific time period may not be relevant for forecasts or predictions in another time period. This is referred to as parameter instability.Even if the regression model accurately reflects the historical relationship between the two variables, its usefulness in investment analysis will be limited if other market participants are also aware of and act on this evidence.If the assumptions underlying regression analysis do not hold, the interpretation and tests of hypotheses may not be valid

.

(From 2 sections later) Data collection as part of business intelligence often produces very large sets of data, both observations and attributes. Multiple regression

.

Big Data simply refers to these very large data sets which may include both structured (e.g., spreadsheet) data and unstructured (e.g., emails, text, or pictures) data.

.

Data analytics uses computer-based algorithms to analyze Big Data and obtain meaningful information about patterns and relationships in the data.

.

Machine learning (ML) refers to computer programs that learn from their errors and refine predictive models to improve their predictive accuracy over time. ML is one method used to extract useful information from Big Data.

.

ML terms:Target variable or tag variable


.is the dependent variable (i.e., the y-variable). Target variables can be continuous, categorical, or ordinal.

Features

are the independent variables (i.e., the x-variables).

Feature engineering

is curating a dataset of features for ML processing.

Supervised learning uses labeled training data to guide the ML program in achieving superior forecasting accuracy.


To forecast earnings manipulators, for example, a large collection of attributes could be provided for known manipulators and for known non-manipulators. A computer program could then be used to identify patterns that identify manipulators in another data set.

.

Typical data analytics tasks for supervised learning include classification and prediction

.

In unsupervised learning has no tag vatiable*, the ML program is not given labeled training data. Instead, inputs are provided without any conclusions about those inputs. In the absence of any tagged data, the program seeks out structure or interrelationships in the data.

.

Clustering is one example of the output of an unsupervised ML program.

.

Supervised learning algorithms are used for prediction (i.e., regression) and classification. When the y-variable is continuous, the appropriate approach is that of regression (used in a broad, ML context). When the y-variable is categorical (i.e., belonging to a category or classification) or ordinal (i.e., ordered or ranked), a classification model is used.

.

Linear (previously discussed in our coverage of multiple regressions) and nonlinear regression models can be used to generate forecasts.


A special case of generalized linear model (GLM) is penalized regression.



Penalized regression models =


.seek to minimize forecasting errors by reducing the problem of overfitting.

Penalized regression cont




In summary, penalized regression models seek to reduce the number of features included in the model while retaining as much predictive information in the data as possible.



Overfitting results when a large number of features (i.e., independent variables) are included in the data sample. The resulting model can use the “noise” in the dependent variables to improve the model fit. Overfitting the model in this way will decrease the accuracy of model forecasts on other (out-of-sample) data. To reduce the problem of overfitting, researchers may impose a penalty based on the number of features used by the model. Penalized regression models seek to minimize the sum of square errors (same as in multiple regression models) as well as a penalty value

.

Classification trees are appropriate when the target variable is categorical while regression trees are appropriate when the target is continuous.



More typically, classification trees are used when the target is binary (e.g., IPO will be successful vs. not successful).



Logit and probit models are used when the target is binary but are ill-suited when there are significant nonlinear relationships among variables. In such cases, classification trees may be used

.

A variant of a classification tree is a random forest. A random forest is a collection of randomly generated classification trees from the same data set. The process of using multiple classification trees uses crowdsourcing (majority wins) in determining the final classification. Because each tree only uses a subset of features, random forests can mitigate the problem of overfitting. Using random forests can increase the signal-to-noise ratio because errors across different trees tend to cancel each other out.

.

Neural networks are constructed with nodes connected by links. The input layer is the nodes with values for the features (independent variables).



Each node uses an activation function, typically a nonlinear function, to generate a value from the weighted average of the input values from those nodes linked as inputs to each hidden node



hyperparameters of the neural network Is the strucutre of the network like specifying a single hidden layer woth 4 nodes and an outer layer with 1.

.

Additional layers can improve the predictive accuracy of neural networks. Deep learning nets (DLNs) are neural networks with many hidden layers (often more than 20

.

Recall that in case of unsupervised learning, there is no target variable; the task is to find a pattern in the features (independent or x-variables).



Clustering


Given a data set, clustering is the process of grouping observations into categories based on similarities in their attributes. For example, stocks can be assigned to different categories based on their past performances, rather than standard sector classifiers (e.g., finance, healthcare, technology, etc.). Clustering can be bottom-up or top-down. In the case of bottom-up clustering, we start with one observation as its own cluster and add other similar observations to that cluster, or form another non-overlapping cluster. Top-down clustering starts with one giant cluster and then partitions that cluster into smaller and smaller clusters.

.

unsupervised learning contd



Dimension reduction seeks to remove the noise when number of features in dataset is excessive



One method is principal component analysis (PCA) which

.summarizes the information in a large number of CORrelated factors into a much smaller set of UNCorrelated factors. The first factor in PCA would be the most important factor in explaining the variation across observations. The second factor would be the second most important and so on,

Penalized regression models

seek to minimize forecasting errors by reducing the problem of overfitting.

O correlation can still have relationship just not a LINEAR one



+1 is strong direct relatiinship



-1 is strong OFFSetting relationship

.