• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/26

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

26 Cards in this Set

  • Front
  • Back
Measures of Association
Allow us to talk about how strongly our variables are related (< .3 is weak, .3 to .6 is moderate, > .6 is strong). Some of these are PRE measures.
Proportionate Reduction in Error (PRE) Measures
Tell us how much more likely we are to guess the value of the dependent variable when we know the value of the independent variable (ex. If we know height, we can kind of guess weight).
Measures of Association for Nominal Variables (non-parametric)
Phi Coefficient, and Cramer's V
Measure of Association for Ordinal Values
Goodman and Kruskal's Gamma, Kendall's Tau-b, Kendall's Tau-c, and Spearman's Rank Order Correlation
Bivariate Regression
Still looking at the relationship between two variables, but unlike correlation, you are specifying that the IV is affecting the DV. • Still looking at the relationship between two variables, but unlike correlation, you are specifying that the IV is affecting the DV. Bivariate linear regression is all about finding the "best fit line," the regression line that goes through the scatterplot. The equation for this line is:
• y^ = a + bx
• a is the intercept
• b is the slope

How far away points are from the line...Pearson's correlation (r) is a measure of how well these data fit this line.
Ordinary Least Squares (OLS) Regression
Sometimes we write the equation as y = a + bx + e, where e = error. In order to find the best fitting line, we select the line that minimizes the sum of squared errors. This is called Ordinary Least Squares (OLS) regression.

Difference between where actual point is and where the line is, trying to minimize the sum of squares error
In order to find the best fitting line, we select the line that minimizes the sum of squared errors.
Multiple Regression
The difference between Bivariate and Multiple Regression is the number of variables. It is no longer possible to visualize the regression model after adding more than two independent variables. Mathematically, we are still creating the best fitting line.
5 Steps for Interpreting Multivariate Regression
1. Is the regression model statistically significant?
The F – Test tells us that our model predicts the DV better than the mean of the DV alone. Statistical significance.

2. What percentage of the variance in the dependent variable does the model explain?
The Model r2 is a PRE measure that tells us how well our model predicts the DV. It ranges from 0-1. If you multiply r2 by 100, it can be directly interpreted as the percentage of variance in the DV explained by all the IVs.

3. Which independent variables have a statistically significant effect on the dependent variable?
The T – Test. The Standard Error of the regression coefficient gives us the spread of the sampling distribution. In OLS regression, we use the t – test to test the significance of regression coefficients. A significant t value tells us that the b coefficient for the variable is significantly different than 0.

4. How do we interpret the effect of each significant independent variable?
The B Coefficient. “For every one unit change in X, we will have a b unit change in Y holding all other independent variables constant.” The sign of b tells us whether we have a positive or negative effect

5. Which independent variables have the strongest effect on the dependent variable?
The Beta Coefficient - A unit free (standardized) measure; a standardized version of the B coefficient.
Assumptions of OLS Regression
Explicit: Residuals are Independent; residuals are normally distributed; residuals have a mean of 0 at all values of X; residuals have a constant variance (homoskadisticity).

Implicit: All X are fixed and measured without error; The model is linear in the parameters; the predictors are specified correctly.

Two main consequences of violating regression assumptions: Bias and Inefficiency
Bias
A “biased” coefficient is one where the mean of all sample coefficients is not equal to the true population coefficient. Bias is very serious (it is “pulling” the regression).
Inefficiency
A coefficient is inefficient if the standard error of the coefficient is inflated. Inefficiency does not entail bias. While bias is usually worse than inefficiency, it is possible that biased estimators are more consistent than very inefficient ones.
Multicollinearity
Perfect multicollinearity is a situation where any combination of IVs is perfectly correlated with another IV. Most frequently this is seen with dummy variables when you forget to exclude a reference category.

When you have perfect multicollinearity, your model simply won’t run (singularity).

High multicollinearity can also cause problems:
o Coefficient standard errors are inflated
o Significance levels of IVs fluctuate substantially between samples
o Hard to identify unique contributions of IVs
Univariate versus Multivariate Outliers
Univariate – A particular score on one variable is far away and separate from the distribution

Multivariate – The relationship between variables is away from the other relationships (a lack of consistency)
Detecting Outliers (how)
Residual Plots (quick and dirty, can also detect heteroskedacity); Leverage; Cook's D; Standardized Residuals; and Studentized Residuals.
Handling Influential Cases (outliers)
1. Exclude (can lead to biased coefficients)
2. Transform the Variable to shrink the variance in the error term.
3. Separate Regressions Approach (estimate two models)
Dummy Variables
Variables that we are theoretically interested in and thus code as being a 0 or a 1 (yes or no essentially), which indicates the absence or presence of a characteristic or trait. We use dummy variables to include nominal level data in our regression analysis such as race, gender, and even whether a country is under a dictatorship or not. We cannot just stick nominal level data into a regression.
High Multicollinearity
Applied to any form of regression, no matter the DV. Essentially an overly high relationshop between DVs that messes up the effects. Ways to screen: look at relationships between DVs. For instance, a correlation in excess of .7

Can be detected using tolerance. One of the most common tests for multicollinearity is Variance Inflation Factors, or VIF. Simply 1/Tolerance. VIF > 4 as a cutoff (.25 of variance). The best tool for finding collinear affected variables.
Dealing with Multicollinearity
1. Get rid of any particular variable with a VIF > 4 (eliminate based upon conceptualization)
2. Combine variables that look to be obverlapping
Mediation
The reason why two variables (IV and DV) are connected.
Ex. The reason why Strain and Delinquency are connected is because of Anger.
• Full mediation: The mediator fully explains relationship.
• Partial mediation: Part of the relationship is still not connected to mediator.

How to test:
1. Regress variable C into Variable A (A= IV, C = DV)
2. Regress B (the mediator) onto A
3. Regress C onto A and B
We know if we have mediation because:
• A will predict C (significant Beta weight in SPSS)
• A will predict B (significant Beta weight in SPSS)
• When B is in the model, A will no longer directly predict C (A will not have a significant beta weight)
• If A still predicts C and the other two conditions are met, then partial mediation
Moderation
How a third variable can impact how other two variables are related. I think a third variable impacts how X is related to Y. In other words B is not the reason A is connected to C, but it changed the relationship.

How to test:
1. Make two new variables, a centered or standardized (z score) variable A and B
- value - mean OVER sd ; center = value - mean
- Centering leaved the same metric, but standardizing is easier
- Helps prevent multicollinearity
2. Multiply z_A by z_B (z_A*z_B – the interaction term)
3. Run a regression of A, B, and z_A*z_B, predicting C
4. If z_A*z_B are significant, you have moderation, AKA the moderator has a direct effect on C, and predicts the relationship

The B weight does not tell you the effect. You must run a slope analysis or compare the B-weights across different levels of the moderating variable.
Hierarchical verses Stepwise Regression
One is sequential, inputting variables in a sequential order, and the other is computer-driven.
Logistic Regression
Running a linear regression with a DV that is not continuous and unbounded it will violate one of the assumptions. A logistic regression is less statistically powerful thus a linear regression is likely to find a larger effect.

Linear regressions are better suited for continuous outcomes. Logistic Regression on the other hand is used to ascertain the probability of an event. And this event is captured in binary format, i.e. 0 or 1.
5 Steps for Logistic Regression
1. Establishing Statistical Significance
• We use an F test in OLS, here we look at the Model significance in the Omnibus table, p < .05. Ex. The obtained Chi-square (…) with _ degrees of freedoms is statistically significant at the .05 level. In other words, does the model do better than chance?

2. Determining the Strength
• We use a pseudo- R2. In linear regression, r2 is how much variance is accounted in the DV. You can't really do that with dichotomous variables. There are a number of pseudo r2 measures:
- Cox and Snell R2 – also aims to be interpreted like regular R2, but also has the problem of not being able to reach one (based on LL difference)
- Nag R2 – adjusts Cox and Snell so it can max out at 1. Usually higher than Pseudo R2 and Cox and Snell
•“We can account for __% of the error in predicting if…”

3. Examining Specific Variables
The "Wald Statistic" compares the equation with the particular variable to the equation without the particular variable. In other words it looks at what independent variables are statistically significant. Exp(B) is the odds ratio, with the significance to its left in the table.
Either way, statistical significance means the same thing as does the sign of the coefficient.

4. Determining the Size of Effects for Significant Independent Variables
We explain the results in terms of the change in the log odds given by the coefficient instead. With each increased level of ___, the odds of (DV) decreases (or increases based on sign of B) by Exp(B).
• The exponential of the b coefficient is the odds ratio (multiplier)
• This interpretation is the most popular interpretation of the logit coefficient.
• When we exponentiate the b-coefficient, the result is equal to the ratio of the odds that are one unit in x apart.
• Another way to say this is that the result is equal to the change in the odds for a one unit change in x.
o If we subtract 1 and then multiply by 100, we get he percentage change in odds for 1 unit change in x.
o Ex. 1.30 - 1 = .30 x 100 = 30%5. Determining Standardized Effects
• Can't really do this as last step, don't have an equivalent.
Factoral Analysis
• Conceptually: A date reduction technique that lets us take a bunch of stuff and condense them into factors. A lot of ambiguity compared to an ANOVA, for instance.

Ex. Intelligence Testing
Two types of Factoral Analysis
1. Exploratory Factor Analysis (EFA) – A set of procedures that extract one or more factors from the observed relationships between a set of variables. Factors are not defined a priori, but are generated through the process of condensing variables.
2. Confirmatory Factor Analysis (CFA) – Factors are defined based on theory and the analysis conducted to assess the “fit” of the data to theory using SEM.
Principal Components Analysis
The most common method of factor analysis.