• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/68

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

68 Cards in this Set

  • Front
  • Back
Regression is best used for?
Prediction
The coefficient of determination is?
The percentage of variance in Y accounted for by X. Normally referred to as R squared.
After examining data you see that the error variance is not independent. You have?
Violated an Assumption.
What is the regression equation
Y = a + bX
What is a mediator?
Explains the relationship between an IV and a DV.
In a path analysis, what is a recursive model?
Casual linkages run one way. No feedback loops or reciprocal paths
In a path analysis, what is a fully recursive model?
There is a direct link between each variable and all variables further down the casual chain.
What would indices would you look for to find multicollinearity?
The bivariate correlations between all pairs of predictors.
How many dummy codes would you if you have 5 levels of IVs and you are using categorical variables.
There is one less number than the original number of categories in the original scale.
What is the definition of Multiple Regression?
A linear relationship between a set of predictors and a single criterion. Which also determines the best set of predictors for that criterion.
Multiple Correlation Coefficient
Indicates the degree of linear relationship
General Linear Model (GLM)
An assumption that the relationship between pairs of variables is linear and that Y= a + bx can desirable the relationship
Bivariate Regression Correlation (BRC)
The relationship between two variables and their prediction upon each other.
Correlation
Association between two variables that takes on a value between -1 and 1. A correlation is not a casual relationship.
What is a pearson correlation
A correlation where both variables are continuous and on at least an interval scale of measurement
What is a beta weight?
Represents the changes in the dependent variable associated with each change of one unit in the independent variable.
What is an Omnibus Test?
An F score. determines if any group mean is significantly different from any other group mean.
What is Multicollinearity?
Intercorrelations among predictors in a MRC analysis.
What is an intercept?
The value of the criterion when all predictors are 0. Signified by the a in the equation.
What is the slope?
The rate of change of y as x changes.
Coefficient of multiple determinations
The proportion of variance in the criterion that is shared by the combination of predictors in a MR. R squared
What is a path analysis?
An extension of multiple regression where there is more than one depends variable. The concern is with the predictive ordering of variables. X causes Y and Y causes
What is an Endogenous Variable?
A variable that is effected by one or more variables pointing to it. One more straight errors point towards it.
What is a stepwise multiple regression?
A series of simultaneous MR analsis. At each step a new predictor is either added (F) or subtracted (B). This decision is based on empiracle analysis.
b weights
Partial regression coefficients
What is the concept that is carried out by calculating an F-ration?
By calculating an F-ratio; for the numerator you are finding a single number that describes how big the differences are among all of the sample means (b/w). The varience in the denominator measures the mean differences that would be expected if there is no treatment effect (w/n).
What advantage does the ANOVA have over the t-test?
T tests are limited to situations in which thre are only two treatments to compare. The major advantage of ANOVA is that it can be used to compare two or more treatments. *This provides researchers with greater flexibility in designing experiments*
What is a factor? What are levels of the factor?
In analysis of variance, this is the variable (independent or quasi-independent) that designates the groups being compared. -these are the individual conditions or values that make up a factor. (ex. a study that examined performance under 3 diff temp conditions would have 3 levels of temp.
What is it called when a study combines two factors?
It is called a two-factor design or a factorial design.
What do the ANOVA and the t-test have in common?
-both tests use sample data to test hypotheses about population means. -The basic relationship between t statistics and F ratios can be stated in the equation F=t^2. variances in the F ratio are just measures of squared distance. -with only two treatments the hypotheses for either test are Ho: M1=M2 or H1: M1 is not = to M2. -the df for the t statistic and the df for the denominator of the F-ratio are identical. (N-k). -the distribution of t and F-ratios match perfectly. for example if the t values are squared thus creating all positive values, than this creates a positively skewed distribution, which is the F distribution!! -However the F-ration is based on variance instead of sample mean difference (used for the t-test)
Is the F-value related to the t-score? How?
Of course!! In both cases, the numerator of the ratio measures the actual diff obtained from the sample data, and the denominator measures the diff that would be expected if there were no treatment effect. -With both, a large value would provide evidence that the sample mean diff is more than would be expected by chance alone.
What does the MSwn represent theoretically and comuptationally?
-MSwn measurest the variance of scores inside each treatment. (This is the denominator of the F-ratio so the larger the sample variacnes the smaller the F-ratio) -MSwn= SSwn/dfwn
What does the MSbw represent theoretically and computationally?
-This is the numerator of the F-ratio and it measures how much diff exists between the treatment means. (The bigger the mean differences, the bigger the F-ratio).
What number will the F-value be close to when there is no treatment effect?
It will be close to 1.
What is the basic formula that is used over and over again when performing ANOVA?
The sum of squares. SS
What does the letter k represent in ANOVA?
This is used to identify the number of treatment conditions, the number of levels of the factor.
How do you calculate SStotal?
SStotal= sumation of x^2 - (G^2)/N
What are two ways of calculating SSb/w?
1. SSb/w = SStotal- SSw/n 2. SSb/w= summation of (T^2/n) - (G^2/N)
Are F values positive or negative?
The F ratio is composed of two variances so F values are always positive. This is b/c variances are always positive.
What is the shape of an F distribution?
the shape depends on the df of the two variances in the F-ratio. - very large df values will caues nearly all of the F-ratios to be clustered near 1.00 while smaller df values cause the F distribution to be more spread out.
What do you know and not know when your ANOVA reveals a significant diff across groups? what do you do next?
When there is a significant diff across groups that means the diff in the sample data is larger than expected by chance. However you still dont know the how large the effect is. -What you would do next is measure the effect size with r^2. (r^2= (SSbw/SStotal). This is also known as the percentage of variance accounted for by the treatment effect. *It is also called n^2 or eta squared.
What is a limitation of ANOVA's F-ratio?
The F-ratio only tells you that a significant diff exists; it doesnt tell you which means are significantly different and which arent.
What is the purpose of a Post hoc test?
These are additional hypothesis tests that are done after an ANOVA to determine exactly which mean differences are significant and which are not.
What is experimentwise alpha level?
This is the overall probability of type 1 error that accumulates over a sereis of separate hypothesis tests. -This value is greater than the value of alpha uses for any one of the individual tests.
What is the function of Tukey's HSD Test?
Tukey's test allows you to compute a single value that determines the minimum diff between treatment means that is necessary for significance. -If the mean diff exceeds Tukey's HSD than you can conclude that there is a significant diff between treatments. HSD= q mulitiplied by sqrt(MSwn/n)
What are the assumptions behind the ANOVA?
1. The observations within each sample must be independent. 2. The populations from which the samples are selected must be normal. 3. The populations from which the samples are selected must have equal variances (homogeneity of variances).
What is the difference between what we want to know for t-tests/ANOVA and correlation and regression?
Correlation and regression are used to measure the relationship between two variables. Not whether one has a direct effect on another.
What are the three characteristics of the realtionship between x and y that correlation describes and measures?
1.The direction of a relationship 2. The form of the relationship. 3. The strength or consistency of the relationship.
What are the uses of correlation?
1. One variable can be used to make accurate predictions about another variable. 2. Could be used to demonstrate validity of a test 3. One way to evaluate reliability is to use correlations to determine the relationship between two sets of measurements. When the reliability is high, the correlation between the two measurements should be strong and positive. 4. Can be used for Thory verification.
What is the Pearson correlation? What features of a relationship does it represent?
The Pearson correlation measures the degree and direction of linear relationship between two variables. -it is identified by the letter r. -r represents the covariability of x and y in the numerator and the variability of x and y separately in the denominator. -can also be represented as r= SP/ sqrt(SSxSSy).
How is the sum of products (SP) calculated? What is its purpose?
It can be calculated by SP= sum (X-Mx)(Y-My) or SP= sum XY- (sum x)(sum y)/n -SP is similar to SS but instead of measuring variability for a single variable it measures the amount of covariability between two variables.
What statistic do we use to describe the strength of a relationship?
we use the value r^2. also known as the coefficient of determination.
In what situations might the r value indicate a strong correlation when one does not exist?
If an outlier is present.
In what situations might the r value indicate no relationship when in fact there is one?
when the data is limited within a restricted range.
If we have an r value from a set of data, we should limit our predictions to what range?
we should not generalize any correlation beyond the range of data represented in the sample.
How do you use correlation in the context of hypothesis testing?
-first you state either the null or alternative hypothesis. - Than you find the critical value with your n and alpha in table B.6 -Than you find your r value - Than except or reject null hypothesis.
When finding a significant correlation we are limited to what conclusions?
We are limited to concluding that there is a significant positive correlation in the population.
How do you obtain a correlation coefficient when you have categories as one set of variables?
By using the Spearman correlation. -This can be used to measure the consistency of the relationship, independent of its form. So to do this you simply convert the scores to ranks and then use the Pearson correlation formula to measure the linear relationship for the ranked data. *after the x and the y's are ranked you can use a special formula to determine the Spearman correlation. rs= 1- 6(sum of D^2)/n(n^2-1).
What correlation coefficient can you use to suppress the effects of outliers? Also what is the function of this?
The point-biserial correlation!! -it is used to measure the relationship between two variables in situations where one variable consists of numerical scores, but the second only has two values. Also known as a dichotomous variable.
How is the Point-biserial correlation computed?
The dichotomous variable is first converted to numerical values by assigning a value of zero (0) to one category and a value of (1) to the other category. Than the regular Pearson correlation formula is used with the converted data.
What is the phi-coefficient used for?
It is used when both variables are dichotomous. can be computed by assigning a 0 to one category and a 1 to the other for each variable. Than use the regular Pearson formula with the converted scores.
What is the purpose of linear regression?
To find the best fitting straight line for a set of data.
What is the least squared error solution?
this defines the best fitting line as the one that has the smallest total squared error. -This can be found by finding the difference between actual y values - predicted y values. This measures the error between the line and the actual data. You would than square these errors to get all positive values and than sum them. This will give you total squared error.
What is the formula for predicting y when x is a particular value?
That is the regression equation for y. Y(hat)= bX+ a -this equation gives you the least squared error between the data points and the line.
What does the slope represent?
The slope is b. It is computed by b= SP/SSx
What is the Y-intercept?
This is represented as a. It is computed by a= My - bMx.
How do we estimate the error between our regression formula and the actual data?
We use the Standard error of estimate!! -this gives a measure of the standard distance between a regression line and the actual data points. - Standard error of estimate= sqrt(SSresidual/df)
What is a residual?
It is the difference of the actual y values and the predicted ones. -SSresidual measures unpredicted variability. SSresidual= (1-r^2)SSy