Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Statistics Final Study #2:

Statistics Final Study #2:

by mark.k.tanabe9, Nov. 2011

Subjects: ancova linear multiple regression

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/61

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

61 Cards in this Set

Front
Back

	In the pretest-posttest with control group experimental design, what does it mean if the interaction is significant?	It means that the change in the two groups from pre to post was different - this is what we are interested in. On a graph, it would look like the control group was flat (no change) and the treatment group depression index dropped from pre to post.
	What does analysis of covariance do?	It reduces the error in order to make the difference between group means a significant difference. IOW: An ANCOVA can give us a more powerful test of the hypothesis that all group means are equal. This is because even when group means appear to be different, they may not be significantly different because error (within-group variability) is large.
	How does an ANCOVA work?	It uses information provided by a continuous variable, called a covariate, to remove some of the variability due to individual differences from the error term.
	What is a covariate?	The covariate is a variable that is correlated with the DV. It can be useful when the magnitude of its correlation with the DV is .3 or greater.
	ANCOVA example: Compare two methods of teaching algebra to 8th graders. IV: method DV: final exam score Data for 10 participants: Scores vary within methods (within group variability) Mean scores for two methods: 50, 65 ANCOVA allows us to reduce this error and make the F ratio larger.	In ANCOVA, each score on the DV (Y), is adjusted for the covariate (X), and then an ANCOVA is run on the adjusted Y scores. Within-group variability for the adjusted Y scores is smaller than the within-group variability of the original Y scores when the covariate is correlated with the DV.
	In ANOVA: Y = grand mean + group effect + error In ANCOVA: ?	Y = grand mean + group effect + effect of covariate + error In ANCOVA, variability due to the covariate is removed from the error term and attributed to the covariate.
	In ANCOVA, the Y score is adjusted as follows:	Adjusted Y = Y - b (X - Xbar) b is the regression coefficient in the regression of Y on X
	What does the ANCOVA procedure accomplish?	It changes the Y scores to what they would be if the participants were more similar on the covariate (X). Ex. in a math aptitude test, final exam scores (Y scores) would be adjusted so that they would represent the scores the participants would have received if they had been more similar in math aptitude (covariate) - making them less variable than the actual scores listed.
	How to use ANCOVA:	1. Once we know the regression coefficient, e.g. b = .5 2. Compute the mean scores, e.g., X = 100 3. Compute the adjusted Y score for each individual using the formula: Y = Y - b (X - Xbar) i.e., plug in X and Y scores Ex. (X = 60, Y = 30) Adj Y = 30 - .5(60 - 100) = 50 (X = 140, Y = 70) Adj Y = 70 - .5(140 - 100) = 50 Higher scores get reduced, lower scores get increased = variability is reduced.
	In regression terms, 1. In ANOVA, a specific score Y is predicted from: 2. In ANCOVA, a specific score Y is predicted from:	1. The grand mean and group only 2. The grand mean, group and covariate
	Assumptions of ANCOVA (5)	1. All assumptions for regular ANOVA 2. Covariate should be normally distributed 3. The regression of the DV (Y) on the covariate (X) is linear in each group. Check by examining a scatterplot. 4. Homogeneity of regression coefficients. In the regression of Y on X for each group separately, the coefficients (i.e., Beta's for the populations, "b's" for the sample) are all equal. This means that on a graph all groups regression lines are parallel, just starting at different heights on the Y axis 5. The groups do not differ on the covariate, i.e., group means are equal (or not too unequal)
	Another ANCOVA example: Comparing three treatments (diet, exercise and medication) to see which is most effective in reducing blood pressure.	Because we know blood pressure is related to age, we can include age as a covariate. So, group IV is treatment group; DV is blood pressure; covariate is age
	Other ANCOVA facts: 1. Contrasts and multiple comparisons can also be run on?	1. ANCOVA adjusted scores
	Other ANCOVA facts: 2. ANCOVA can also be used with _____ designs, and there can be more than one _____.	2. Factorial designs, one Covariate
	What is the predictor variable?	The variable from which the prediction is made, e.g., verbal ability
	What is the outcome variable?	The variable which is predicted, e.g., improvement in therapy
	In computing the correlation coefficient you see the strength and direction of the relationship between the two variables. What are we doing with linear regression?	1. Fitting a straight line to the data. 2. Predict from a particular value of verbal ability (ex.) to a particular value of improvement in therapy.
	Once we have the regression line, what do we know?	For an increase of 1 point in X, you will know how much of an increase/decrease in Y.
	Equation of a straight line:	Y = a + bX a: the value of Y when X = 0 Slope = b = Y2 - Y1/X2 - X1
	What does the slope tell us?	The change is Y relative to X. More specifically, b = the amount that Y changes when X changes by 1 (ex. 2 = 2/1)
	The technique of linear regression is used to find what?	The equation of the line that best fits the data.
	What does the method of least squares find and how does it do it?	Using linear regression, we find a and b by using the data collected on the variables X and Y for a number of participants. We use the method of least squares to find the line. This method minimizes the average squared distance from the points to the regression line. The distance between each point (Y) and the line (Y-hat) is squared; then the squares are summed.
	What is the regression equation for Y-hat?	Y-hat = a + bX
	What are the two ways to find the predicted value of Y?	1. Substitute a particular value of X into the regression equation and solve for Y 2. Draw the regression line on a graph and read the predicted value of Y from the graph
	With standard scores, the slope equals what?	The correlation coefficient.
	Linear regression is appropriate only when the relationship between X and Y is:	Linear
	The higher the correlation between X and Y:	The better the regression line will fit the data. The correlation is an index of fit.
	It is also possible to predict X from Y. The resulting regression equation is slightly different because why?	The sums of squares are minimized horizontally rather than vertically.
	What is the difference between linear regression and multiple regression?	More than one predictor may be used to predict Y.
	Another name for predictors is ____ and there is a relationship between several predictors and one single ____ of interest.	Factors; Criterion
	Multiple regression is just a simple and accurate way to:	Combine information about several factors (predictors) to make the best possible predictions.
	What is the multiple regression equation?	Y-hat = a + (bsub1)(Xsub1) + (bsub2)(Xsub2) + (bsub3)(Xsub3) +...
	Example equation: Y-hat = -3 + 5(Xsub1) + 4(Xsub2)	To get a predicted Y score for Carol, we would subsititute her scores: Y-hat = -3 + 5(0) + 4(2) =-3 + 0 + 8 = 5
	Notation 1. Y-hat	1. Y-hat: the predicted Y score
	Notation 2. R-squared	2. R-squared: The multiple coefficient of determination
	Notation: 3. R	3. R: + square root(R-squared) = The multiple correlation coefficient or coefficient of multiple correlation
	Notation: 4. Y - Y-hat	4. Y - Y-hat: The residual, i.e., how far off the predicted Y is from the actual Y
	T or F: Regression coefficients are generally not comparable	True. Only when all the predictors all have the same SD is this possible.
	What is another name for Standardized Regression Coefficients?	Beta Weights
	What is the purpose of Standardized Regression Coefficients?	To compare the strength of predictors - Beta Weights can be compared to find out the relative strengths of the different predictors.
	How do Standardized Regression Coefficients allow you to compare the strength of predictors?	Standardizing each of the variables guarantees that all variables have the same SD of 1, and that the intercept in the regression equation is 0. Indicated by "STD"
	Example Al's regression equation: STD Y-hat = 0 + .218(STD Xsub1) + .436(STD Xsub2) + .873(STD Xsub3)	To estimate Al's standard score on Y, you add .218 times Al's standard score on Xsub1 and .436 times his standard score on Xsub2 and .873 times his standard score on Xsub3. IN this example, we can compare the .436 beta coefficient for Xsub2 with the .218 beta coefficient for Xsub1 and conclude that "in the multiple regression using X1, X2 and X3 to predict Y, the contribution of the variable X2 is twice that of the variable X1."
	What is the best estimate of population R-squared?	Adjusted R-squared based on sample data.
	R-squared-adjusted equation:	See Multiple Regression handout p.5
	The multiple regression technique is a powerful way to do what?	Predict scores on a dependent variable.
	In a small sample, if you use several independent variables, you will automatically obtain a high multiple correlation in the sample, even if what?	There is no relationship in the population.
	What does R-squared-adjusted do?	It adjusts for the "ad hoc" aspect of multiple correlation, reducing the estimate, which makes it more conservative and more realistic.
	How can you guarantee that your sample R-squared is fairly close to R-squared-adjusted?	Have at least 10 cases in your sample for each independent variable. i.e., if you make "n" equal to at least 10 times "k", the sample R-squared will almost never be unrealistically high - no "over-fitting"
	How do you use an ANOVA to test hypotheses about R?	After computing sample R, you want to ask, "Is this convincing evidence that the population R is greater than zero?" Or, "Is this convincing evidence that in the population, X1 and X2 are related to Y?" Null Hyp: Population R = 0 Alt Hyp: Population R not = 0 Obtain the F-ratio for the regression and compare it to a critical value to determine significance.
	What is "Part Correlation" or "Semipartial Correlation"?	When we are only interested in how much additional explanation we will get if we add another independent variable.
	Part Correlation Example: To assess the incremental validity of a new scale, we ask how much additional predictive power will we gain if we add the scale to an existing test battery. If we first tried to predict Y using X only, what proportion of the variation in Y could we explain using X alone? R-squared = (-.160)squared = .026; this leaves about .974 of the variation unexplained.	The variable Z can explain all of the remaining variation - Z can explain the remaining .974 of the variation in Y that couldn't be explained by X. So it makes sense that the "Part Correlation" for Z is the square root of .974, or about +-.987
	What is Simultaneous (Standard) Multiple Regression	The researcher specifies one set of predictors (IV's) and they are all used to produce the single best multiple regression equation for predicting scores on the criterion (DV).
	What is a Hierarchical/Sequential Multiple Regression?	Two or more multiple regressions are run, with an increasing number of IV's.
	How is a Hierarchical/Sequential Multiple Regression run?	The order in which IV's are added is determined in advance by the researcher. A common strategy is to first specify that "nuisance" variables be entered first. Then the researcher can add the actual IV's of interest to see if an increase in prediction (a change in R-squared) is obtained.
	Why is a Hierarchical/Sequential Multiple Regression a good way to statistically control for the effect of nuisance variables?	If the IV's produce a strong increase in prediction, after the nuisance variables have already been included in the regression equation, we can be sure that the relationship that is discovered is not really due to the effect of the nuisance variables.
	Example: Hierarchical/Sequential Multiple Regression	If we wanted to determine whether stress level and coping will predict illness, once age and SES have been controlled, we would: 1. Run a multiple regression to predict illness from age and SES 2. Then add stress level and coping, and see of the prediction is significantly improved
	What is Stepwise/Statistical Regression?	Two or more multiple regression are run, with an increasing number of IV's.
	How is a Stepwise/Statistical Regression run?	The order in which the IV's are added is determined by statistical criteria. On each step, the variable is added to the regression equation that contributes most (and significantly) to improving prediction, i.e., produces the largest increase in R-squared. A variable can also be deleted if it no longer significantly contributes to prediction.
	In multiple regression, what types of IV's can there be?	Interval/ratio or Nominal (e.g., religion - must be Dummy coded)
	What is Multicollinearity (Collinearity)	If one or more IV's could have been predicted very well using the other IV's, multicollinearity is said to exist, i.e., redundancy or highly correlated predictor variables.
	Why is Multicollinearity problematic?	It causes problems in the interpretation of regression coefficients and may make it possible to carry out the regression.

Share This Flashcard Set

Set the Language

Statistics Final Study #2:

Add to Folders

Upgrade to Cram Premium

Card Range To Study

61 Cards in this Set