• Shuffle
Toggle On
Toggle Off
• Alphabetize
Toggle On
Toggle Off
• Front First
Toggle On
Toggle Off
• Both Sides
Toggle On
Toggle Off
Toggle On
Toggle Off
Front

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

Play button

Play button

Progress

1/97

Click to flip

97 Cards in this Set

• Front
• Back
 response variable y explanatory variable x population regression line uY=Bo+B1x sample/model regression line equation yhat=bo+b1x bo intercept b1 slope subpopulation the normal curve around each different value of x UY the actual average response for y given a value of x sdeviations of x are the same for all vlaue of x because , x values are part of a subpop on a normal curve. ei residual for a given x , obs-predicted or yi-yhati or yi - (model regression equation, which gives yahti) s^2 is variance (thus s is SD) s2= sum(yi-yhati)^2/n-2 you will not have to calculate this as it is given in summary fit in summary fit the "residual standard error" is the same as "s" or standard deviation when we estimate the standard error using "s" we move to the t dist. how many df does the t dist have n-2 list 4 different types of confidence intervals for t distributions in simple regression b1+-t*seb1 bo+-t*sebo mean resp Uy+-t*seuy (formula and r code ) prediction internval yhat+-t*seyhat critical t in r qt(.95,n-2) regular (not r formula) for se of mean response s*sqrt ( 1/n) + xstar - xbar ^2 / sum (xi-xbar)^2 c level predication interval se formula (interval for yhat) s*sqrt( 1 + 1/n + ((((( xstar - xbar ^2 / sum (xi-xbar)^2 ))))))) why does a small p value for a large n not mean there is a strong corr. because , a large n helps us estimate the true mean, but it does not reduce scatter. formula for sse sum ( yi-yhat) ^2 observed deviation from model formula for ssm sum ( yhat-ybar) ^2 model deviation from actual sst sum ( yi - ybar) ^2 total deviation from obs to actual if Ho: b1=0 were true the variation of y on x would be the same as s2 or MST t dist. dfm 1 t dist. dfe n-2 t dist dft n-1 mean square ss/df list forumas for msm, mse, mst ssm/dfm sse/dfe sst/dft r2 = ssm/sst f = (using anova variables) msm/mse t^2 f how to get t b1/seb1 (given in summary fit) rho ( p ) is what population correlation (r is sample correlation) if rho p=o then there is no linear relation, x and y are independent. sig test for rho is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho. in multiple linear regression p is number of explanatory variables in multiple linear regression I (capital i or j ) is a specific explanatory varible in multiple linear regression i is one case for one explanatory variable in multiple linear regression n is n is the total number of values i in multiple linear regression (one way anova) N is the total number of n's - the sum of n's for each explanatory variable. multiple regression format multiple regression yhat=bo+ b1xij...... how to get t b1/seb1 (given in summary fit) rho ( p ) is what population correlation (r is sample correlation) if rho p=o then there is no linear relation, x and y are independent. sig test for rho is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho. in multiple linear regression p is number of explanatory variables in multiple linear regression I (capital i or j ) is a specific explanatory varible in multiple linear regression i is one case for one explanatory variable in multiple linear regression n is n is the total number of values i in multiple linear regression (one way anova) N is the total number of n's - the sum of n's for each explanatory variable. multiple regression format multiple regression yhat=bo+ b1xij...... in multiple linear regression - explain the total of I j p i n and N sum of j/I = p sume of i = n sum of n = N remember N observations and P explanatory variables sig test for bo in multiple regression same bj/sej given in summary fit but pt(t value, with df n-p-1) not just n-2 hyp test for multiple regression hypotheses Ho: all slopes = o Ha: at least one explanatory variable not = o to execute a hyp test for multiple regression use the ___ test stat f in multiple regression DFM is p in multiple regression DFE is n-p-1 in multiple regression DFT is n-1 ( like in simple regression) how to take f stat msm/mse sqrt( R ^ 2 ) the multiple correlation coefficient - correlation between yi and yhati (is is a corr between yaht and y) why does a low p value not imply a large R^2 a statistically significant result is much different than a large effect of x on y . factor categorical explanatory variable again how to find f msm/mse t^2 is f sig test for rho p involves Ho: p not = 0 use t test (if p=0 then the relation is not linear and x , y are independent) (sometimes this test will be one sided) how to find p value for t 2*(1-pt(tstat, df n-2 for simple and n-p-1 when testing 1 P out of a multiple regression). (when testing multiple P's use the F test) what is a one way anova compares several population means - a single quantitative response variable compared to several populations. 2 sample t test formula (both sample sizes same n) for an individual mean1 t ij = (sqrt(n/2) * x1bar-x2bar )/ sp where sp =sqrt(mse) 2 sample t test formula (both sample sizes same n) for all means t ij = (sqrt(n/2) * x1bar-x2bar )/ sp where sp =sqrt(mse) take t and make it t^2 for f R code to run an F test for 1 way anova juice<-data.frame(vector,vector) juice<-stack(juice) names(juice) oneway.test(values~ind,data=juice,var.equal=T) fit<-anova(lm(values~ind,data=juice) t test for one way anova t.test(vector,vector,alternative=c("two.sided"),mu=0,paired=FALSE,var.equal=TRUE,conf.level=.95) hypotheses in one way anova tests ho: all means = ha: not all means are = in order to assume equal SD's in one way anova the largest sd must not be more than 100% the smallest incase sqrt(mse is too much trouble to find sp, the manual way to find sp^2 is (n1-1)*s1^2 + (n2-1)*s2^2....../(n1-1)+(n2-1)..... ****s can be found in r by using the command "sd" sd(vector) I number of populations to be compared one way anova hypotheses Ho: u1=u2=u3..... Ha: not all Ui are = one way anova DFT n-1 one way anova DFG group I-1 one way anova DFE N-I one way anova R^2 SSG/SST to find the value of a residual given an x or y plug it into the model and take the difference between what the model predicts and what was given r code for getting to summary fit set x and y vectors fit<-lm(y~x) can be +x2+x3... summary(fit) how to find p value for f pf(f, DFM, DFE) dfm and dfe vary depending on weather this is an f test of multiple regression or a 1 way ANOVA test. how to get anova chart in r anova(fit) in a hyp test conclusion: re state hypotheses re state variables how to conclude, if we reject null in a multiple regression F test if we reject null then the , there is a low chance all the variables produce a slope of zero, so at least one x influces y and model is valid 1-r^2 unexplained x on y if p variables are greater than N observations there could be alot of conflict in 1 way ANOVA DFG is I-1 in 1 way ANOVA DFE is N-I in 1 way ANOVA DFT is N-1 in 1 way ANOVA MSG is SSG/DFG in 1 way ANOVA MSE is SSE/DFE in 1 way ANOVA F is MSG/MSE as DF gets bigger for the same groups the resulting p values get small - bc less variability is allowed fo rit to be statistically signifigant. if asked "what are the df for the anova f stat' list both numerator and denomenator DFG and DFE , but default to DFE R code for SE of mean response vx<-var(x) and s is given in summary fit se<-s*sqrt((1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx se for prediction interval yhat vx<-var(x) and s is given in summary fit se<-s*sqrt(1+(1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx)))