Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Stats Exam 2

Stats Exam 2

by jessicalbuker9, Nov. 2009

Subjects: 2 exam stats

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/30

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

30 Cards in this Set

Front
Back

	Influence Analysis (definition and 4 types)	One data point may have a significantly larger impact on the obtained statistics than the other data points; one data point may drastically increase or decrease or impact the regression coefficient, t value, F ratio or other phenomenon (that one data point may be more influential than other data points) 4 types: leverage, cooks d, dfbeta, standardized dfbeta
	Leverage	examines the influence of x scores on the obtained statistics All other things being equal the larger the x score deviates from the mean of x scores the larger the leverage The maximum leverage scores is a 1 Higher leverage=more impact on the obtained statistics Shortcoming – leverage is only concerned with x scores… in other words it is only concerned with the IV only can use relative comparisons; only tells you that there was an impact not how much it was
	Cook's D	examines the impact of x scores, y scores, or both on the obtained statistics and there is not a maximum score Cooks D can be above one, but just as with leverage you are only making a relative comparison Even though both Cooks D and Leverage says that we have an impact, but it doesn’t say how much the impact is Problem: you have to calculate a cook’s D at every single data point. Leverage and cooks d is only a measure of relative influence they don’t tell us how much a data point is influencing the results
	DFBETA	indicates changes to the y intercept and to the regression coefficient after deleting a single data point First calculate the regression coefficient with all of the data and then without the 1 data point to see if there is a difference Problem: values influenced by scale of measurement All Data  y’=5.05 +.75x Deleted x=5, y=10  y’ 5.19+.68x Y intercept (old minus new) =5.05-5.19=-.14 = DFBETAa (stands for difference between old a (y-intercept) and new) Reg. Coef= .75 -.68=.07=DFBETAb (stands for difference between old & new b (slope/reg. coef) OLD minus NEW (new=the minus b/c of deleting the data point) Problem with normal DFBETA is that it’s difficult to know what a large change it is b/c the values can be influenced by the scale of measurement (seconds vs. minutes) b/c values of y intercept & reg. coeff. are influenced by the scale of measurement in the study
	Standardized DFBETA	The way you can compare across different scales of measurement is by standardizing DFBETAS ndicates changes to the y intercept and to the regression coefficient after deleting a single data point
	Simple Linear Regression (how many IV/DV; what is equation; what is symbols for correlation and variance)	One IV One DV  Linear Regression …. Prediction is w/o the e Y=a+bx+e Y=score on the DV (y score) A= y-intercept B= slope/regression coefficient/beta wt. X=score on the IV (x score) E= error or residual 1 IV, I DV r = the correlation between one IV and one DV r2 = Proportion of variance accounted of variance accounted for
	Multiple Linear Regression (#of IV/DV; correlation/variance0	Two or more IVs One DV multiple regression (2 DVs is multivariate)… prediction is w/o the e Y=a +b1x1+ b2x2 +b3x3+…… bkxk+e Y=score on the DV (y score) A= y-intercept E= error or residual B1= regression coefficient of the first IV X1=score on the first IV B2= regression coefficient of the second IV X2=score on the second IV B3= regression coefficient of the third IV X3=score on the third IV BK= regression coefficient of the kth IV XK=score on the kth IV 2 or more IVs, 1 DVs R=the correlation among all of the IVs with the DV (total correlation) R2 = Proportion of variance accounted of variance accounted for
	Prediction	Multiple Regression equation maximizes prediction by optimally weighting each IV via the slope/regression coefficient/beta weight It allows us to combine information from 2 or more IVs in the best manner possible
	Proportion of Explained Variance	A Multiple Regression Equation (MRE) can be used to determine the total variance accounted for by all of the IVs (R2y.12)
	Testing for Statistical Significance	MRE can be used to test the statistical significance of the overall R2 Can be used to determine if any of the IVs are significant (r21 and r22 is significant)
	Relative Importance	MRE can be used to determine the relative importance of each IV in explaining the DV; in other words, it can tell us which ones are more important
	b	unstandardized beta weight  use with raw scores and when the research is applied; unstandardized and influence by the scale of measurement (if one study looked at inches and the other study looked at feet the study that looked at inches will yield values that are 12 times larger.. therefore the size of the beta weight is not meaningful only the significance level is meaningful; one cannot infer conclusions about the relative importance of beta weights just b/c one beta weight is twice as large as another doesn’t mean it is twice as important; you also can’t compare beta weights across studies b/c there may be different scales of measurement;
	Fancy B	standardized beta weight  use with standardized scores and when the research is theoretical When dealing with standardized scores the mean of Z scores is 0 and the y intercept is 0 For Standardized Scores Y’=b1x1+b2x2 Zy’=B1Z1+B2Z2 Zy’=predicted standard score of y B=standardized beta weight of x1 Z1=standardized score of x1 used to compare across studies
	Testing R2	Use F equation=msreg/msres=ssreg/dfreg//ssres/dfres dfreg=# of IVs dfres=n-k-1 Its looking at : the combination of all the IVs is related to one dv or all the IV’s together explain significant variance in the DV if F (R2) is significant it means that all of the IVs account for a significant variance in the one DV
	Squared Multiple Correlation Coefficient	Indicates the total variance of the DV accounted word by ALL R2= SSreg SStotal SSreg=b1∑x1y+b2∑x2y = .7046(95.05)+(.519)(58.50)=101.60 SStotal=140.55 R2= SSreg = 101.60 = .72 SStotal 140.55 An R2 of .72 means that these 2 IVs explain 72% of the variance in the DV  R will always range between 0.0 and 1.0.. if you have an neg corr. between 2 variables you still can explain a pos percent of variance… think of it like a circle if there is a corr. There is some overlap between the two circles even if it is a neg corr. The overlap in terms of area would be positive and therefore positive variance… if you have multiple IVs even if they were all negative correlations the variance would still be positive… b/c you’d have a –r and then you’d square it to get r2 for little r it can range from -1 to 1…. The negative correlation is only there to show you direction but the total correlation (r2) is there to show you the total percent of a variance explained
	Testing Regression coefficient	When testing the regression coefficients we test whether one regression coefficient is significant after controlling for all other IVs r (correlation) b/w IV and DV ( \ \ \ part and squiggly part) ; b b/w IV and DV ( \ \ part only) ; r2¬ is squiggly plus /// the partial slope of IV1= \ \ \ the partial slope of IV2= / / / R = \ \ \ + squiggly + / / / Overlap is called partial reg coef The more the IVs are correlated the more difficult it is to get the significance regression coefficient
	Tests of R2 versus tests of b	R2 = all IVs significantly explain variance in one DV b = one IV significantly explain variance in one DV after controlling for all other IVs it is entirely possible to obtain a significant R2 yet not obtain a significant regression coefficient, but you can’t have a R2 that is not significant when each regression coefficient is significant testing b is the same as testing B
	Confidence Intervals	Ex. Amanda takes her 45 min to get to school but sometimes 40 and sometimes 60 .. she is 95 percent sure that itll take her anywhere between 40 and 60 C.I. = b ± (tcritical¬) Sb b=beta weight/regression coef/slope tcritical= critical value of t using a 2-tailed test Sb=standard error of b (tells us the expected range in amanda’s ex. 40 to 60) C.I. = .7046± (2.11)(.1752) = .3349 and 1.0743 (95% confident)
	Violations of Assumptions	1. Measurement Error a. Measurement errors in DV Don’t bias the regression coefficient in either direction they don’t always increase or decrease it, but they do increase error which leads to a less accurate conclusion b. Measurement errors in IV Do bias the regression coefficient they underestimate the true regression coefficient usually this underestimation is due to the unreliability of the measurement the more unreliability the more underestimation.. If your measure has an alpha of .85 its not perfect… the more unreliability then the more the correction will increase the value 2. Specification Errors – “your model is wrong” Involve the components and/or the relationships in a proposed, theoretical model These are: omitting important variables, including unimportant variables, or theorizing that the relationship is linear when it is really curvilinear (omitting important variable leaving out smoking as a cause for lung cancer…. Including unimportant would be like including blue eyes as a cause for lung cancer)
	Regression Model vs. Correlational Model	Regression Model – x values are fixed. If you use 5 point likert scale originally you need to use the same scale in replication Correlational Model – x can be random they don’t have to be the same… you can have a 5 point scale and then a 7 point… allows any analysis between 2 variables without the scale of measurement restriction
	Outliers - def and are they good or bad and types	Definition – something that lies or is situated away from or classed differently a main or related body… in stats an outlier is a data point that is markedly different from the general trend in the data! Most ppl think that it is bad b/c it drags down ur regression coefficient… but the truth is that outliers can increase or decrease the correlation, but they are bad b/c they result in an inaccurate conclusion an inaccurate estimate of the true relationship between two variables… 1. Outliers can increase the correlation Outliers can make it appear that there is more of a linear relationship then there is… if you have a correlation that looks like a circle…. And then you add an extreme outlier it will make it have a more oval shape which means higher linear relationship and correlation! 2. Outliers can decrease the correlation If the data is in a pretty linear shape (oval) and there is an extreme outlier it may make it become more circular which leads to an inaccurate estimate between the 2 variables Methodological and behavioral
	Methodological Outliers	Due to the researchers mistakes... these mistakes include data entry (type in 110 instead of 11) instrument malfunction, inconsistent administrations, and measurement errors
	Behavioral Outliers	Outliers due to the actual performance of the subjects…. (when you test for the effects of alcohol on baseball hitting… and you randomly choose ppl but one of them is from the Xavier baseball team… even really drunk he may hit better than the avg person… .he would be an outlier) sometimes the outliers are more interesting … how can you possible hit 19 out of 20 baseballs when you are wasted…. You may want to modify your model to incorporate studying the outlier
	3 ways of detecting outliers	Standardized Residuals - ZRESID Studentized Residuals - SRESIDs Studentized Deleted Residuals - SDRESID
	ZRESID	ZRESID=Residual Standard deviation of the residuals (all the data points including the alleged outlier) Residual is the difference between what we thought we would get and what we actually got…. Residual is y-y’ which is 10-8.80 so our residual =1.20; ZRESID=1.20/2.446 (from the prev section) = .4906 \|ZRESID\|=>2 if it is great than 2 it is an outlier <2 not an outlier Conceptually you would do this for all ur data points and if any of them are larger than 2 than it’d be an outlier…. Just do it for the sketchy ones…. b/c its unrealistic to run it for each data point.. With ZRESID it’s the numerator not the denominator… it assumes that all the residuals have the same variance; so the denominator is made up of all the data points; standardized residuals looks across all values of x or data points; Problem with this one is that you’d have to calculate this for each outlier; In the ZRESID all the residuals have the same variables
	SRESID	Student residuals look only at one value (e.g. x=5) It makes an adjustment to what we just did. Sei=Sy.x√1-[(1/N)+{(xi-mean of X)2} SSx Sei= td. Deviation of the residuals at a particular value of x Sy.x= Std deviation of the residuals (all of the data pts) Xi=x scores of a specific data point (suspected outlier) N=sample size Mean of X=mean of x Sei=2.446 √1-[(1/20)+[((5-30)2)/40]=2.2551 SRESID=Residual/ Sei=1.20/2.2551=.5321 When x=5 the denominator would always be 2.2551 but if x=something else you’d have to figure out a different denominator number…. But in ZRESID the denominator is always the same; SRESID is a more sensitive more precise measure To determine if a value is an outlier: If our SRESID if it was greater than the critical value of T, it would be an outlier…. In this case our N is 20 our df is 18… so the critical value is 2.101 Problem: again it includes the alleged outlier in the denominator… and having that alleged outlier in there makes it have more difficult to find an outlier
	SDRESID	A shortcoming of both SRESIDs and ZRESIDs is that the denominator is comprised of the expected/suspected outlier if we include the outlier then it makes it more difficult to detect the outlier b/c if you include the outlier than you have more variance and if you get more variance then your sum of squares increases… and you are less likely to find the outlier EX: Sei=3√1 =3….1.20/3=.4 Sei=2√1 =2….1.20/2=.6 So we will base it on 19 data points (instead of 20) and then make this adjustment If we deleted that data point.. and came up with the variance from the 19 data points.. our value will no longer be 2.446.. instead it would 2.497 just computing the standardization of the residual minus 1 data point Everything else is the exact same equation as the SRESID even though you will use a 20 in the N still Sei=2.497 √1-[(1/20)+[((5-30)2)/40]=2.302 SDRESID=Residual/ Sei=1.20/2.302=.5213 You still use the critical value of T to find out if this is an outlier or not (df-k-1 then you look at T value with .05 and if the absolute value is greater than the critical value then you have an outlier) This one is the most sensitive because it is not including the outlier You use same steps looking up SDRESID and SRESID
	Summing the detecting outliers	ZRESID =.4906 (residual for all data points; outlier if value is greater than \|2\|) SRESID=.5321 (residual for each level of x (5); outlier determined by t critical) SDRESID=.5213 (Deletes the outlier under scrutiny (kind of like SRESID but deletes potential outliers; best b/c its more likely to ID data as an outlier if it really is; outlier determined by t critical)
	Random Information	No relationship between x1 and x2 rx1y=.20 and rx2y=.10 R= rx1y+ rx2y just add them up when there is no correlation between the two When x1 and x2 are related (r does not equal 0) rx1y= where x1 and y overlaps rx2y=where x2 and y overlaps They both are including where x1 and x2 overlap with y, but you should only count the overlap once R2 < rx1y + rx2y
	Notation	A note regarding Notation 2 variables = R2x1x2 or R2y.12  1 DV 2 IVs (first 2) 5 variables = R2x1x2x3x4x5 or R2y.12345  1 DV 5 IVs Only want 1 and 3rd variable = R2y.13  1 DV, first IV third IV

Share This Flashcard Set

Set the Language

Stats Exam 2

Add to Folders

Upgrade to Cram Premium

Card Range To Study

30 Cards in this Set