Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/97

Click to flip

97 Cards in this Set

  • Front
  • Back
response variable
y
explanatory variable
x
population regression line
uY=Bo+B1x
sample/model regression line equation
yhat=bo+b1x
bo
intercept
b1
slope
subpopulation
the normal curve around each different value of x
UY
the actual average response for y given a value of x
sdeviations of x
are the same for all vlaue of x because , x values are part of a subpop on a normal curve.
ei
residual for a given x , obs-predicted or yi-yhati
or yi - (model regression equation, which gives yahti)
s^2
is variance (thus s is SD)
s2= sum(yi-yhati)^2/n-2
you will not have to calculate this as it is given in summary fit
in summary fit the "residual standard error"
is the same as "s" or standard deviation
when we estimate the standard error using "s" we move to the t dist. how many df does the t dist have
n-2
list 4 different types of confidence intervals for t distributions in simple regression
b1+-t*seb1
bo+-t*sebo
mean resp Uy+-t*seuy (formula and r code )
prediction internval yhat+-t*seyhat
critical t in r
qt(.95,n-2)
regular (not r formula) for se of mean response
s*sqrt ( 1/n) + xstar - xbar ^2 / sum (xi-xbar)^2
c level predication interval se formula (interval for yhat)
s*sqrt( 1 + 1/n + ((((( xstar - xbar ^2 / sum (xi-xbar)^2 )))))))
why does a small p value for a large n not mean there is a strong corr.
because , a large n helps us estimate the true mean, but it does not reduce scatter.
formula for sse
sum ( yi-yhat) ^2
observed deviation from model
formula for ssm
sum ( yhat-ybar) ^2
model deviation from actual
sst
sum ( yi - ybar) ^2

total deviation from obs to actual
if Ho: b1=0 were true the variation of y on x would be the same as
s2 or MST
t dist. dfm
1
t dist. dfe
n-2
t dist dft
n-1
mean square
ss/df
list forumas for msm, mse, mst
ssm/dfm
sse/dfe
sst/dft
r2 =
ssm/sst
f = (using anova variables)
msm/mse
t^2
f
how to get t
b1/seb1 (given in summary fit)
rho ( p ) is what
population correlation (r is sample correlation)
if rho p=o
then there is no linear relation, x and y are independent.
sig test for rho
is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.
in multiple linear regression p is
number of explanatory variables
in multiple linear regression I (capital i or j ) is
a specific explanatory varible
in multiple linear regression i is
one case for one explanatory variable
in multiple linear regression n is
n is the total number of values i
in multiple linear regression (one way anova) N is
the total number of n's - the sum of n's for each explanatory variable.
multiple regression format
multiple regression
yhat=bo+ b1xij......
how to get t
b1/seb1 (given in summary fit)
rho ( p ) is what
population correlation (r is sample correlation)
if rho p=o
then there is no linear relation, x and y are independent.
sig test for rho
is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.
in multiple linear regression p is
number of explanatory variables
in multiple linear regression I (capital i or j ) is
a specific explanatory varible
in multiple linear regression i is
one case for one explanatory variable
in multiple linear regression n is
n is the total number of values i
in multiple linear regression (one way anova) N is
the total number of n's - the sum of n's for each explanatory variable.
multiple regression format
multiple regression
yhat=bo+ b1xij......
in multiple linear regression - explain the total of I j p i n and N
sum of j/I = p
sume of i = n
sum of n = N

remember N observations and P explanatory variables
sig test for bo in multiple regression
same bj/sej given in summary fit but
pt(t value, with df n-p-1) not just n-2
hyp test for multiple regression
hypotheses
Ho: all slopes = o
Ha: at least one explanatory variable not = o
to execute a hyp test for multiple regression use the ___ test stat
f
in multiple regression DFM is
p
in multiple regression DFE is
n-p-1
in multiple regression DFT is
n-1 ( like in simple regression)
how to take f stat
msm/mse
sqrt( R ^ 2 )
the multiple correlation coefficient - correlation between yi and yhati
(is is a corr between yaht and y)
why does a low p value not imply a large R^2
a statistically significant result is much different than a large effect of x on y .
factor
categorical explanatory variable
again how to find f
msm/mse
t^2 is
f
sig test for rho p involves
Ho: p not = 0 use t test (if p=0 then the relation is not linear and x , y are independent) (sometimes this test will be one sided)
how to find p value for t
2*(1-pt(tstat, df n-2 for simple and n-p-1 when testing 1 P out of a multiple regression).
(when testing multiple P's use the F test)
what is a one way anova
compares several population means
- a single quantitative response variable compared to several populations.
2 sample t test formula (both sample sizes same n) for an individual mean1
t ij = (sqrt(n/2) * x1bar-x2bar )/ sp
where sp =sqrt(mse)
2 sample t test formula (both sample sizes same n) for all means
t ij = (sqrt(n/2) * x1bar-x2bar )/ sp
where sp =sqrt(mse)

take t and make it t^2 for f
R code to run an F test for 1 way anova
juice<-data.frame(vector,vector)
juice<-stack(juice)
names(juice)
oneway.test(values~ind,data=juice,var.equal=T)
fit<-anova(lm(values~ind,data=juice)
t test for one way anova
t.test(vector,vector,alternative=c("two.sided"),mu=0,paired=FALSE,var.equal=TRUE,conf.level=.95)
hypotheses in one way anova tests
ho: all means =
ha: not all means are =
in order to assume equal SD's in one way anova
the largest sd must not be more than 100% the smallest
incase sqrt(mse is too much trouble to find sp, the manual way to find sp^2 is
(n1-1)*s1^2 + (n2-1)*s2^2....../(n1-1)+(n2-1).....


****s can be found in r by using the command "sd" sd(vector)
I
number of populations to be compared
one way anova hypotheses
Ho: u1=u2=u3.....
Ha: not all Ui are =
one way anova DFT
n-1
one way anova DFG
group
I-1
one way anova DFE
N-I
one way anova R^2
SSG/SST
to find the value of a residual given an x or y
plug it into the model and take the difference between what the model predicts and what was given
r code for getting to summary fit
set x and y vectors
fit<-lm(y~x) can be +x2+x3...
summary(fit)
how to find p value for f
pf(f, DFM, DFE)

dfm and dfe vary depending on weather this is an f test of multiple regression or a 1 way ANOVA test.
how to get anova chart in r
anova(fit)
in a hyp test conclusion:
re state hypotheses
re state variables
how to conclude, if we reject null in a multiple regression F test
if we reject null then the , there is a low chance all the variables produce a slope of zero, so at least one x influces y and model is valid
1-r^2
unexplained x on y
if p variables are greater than N observations
there could be alot of conflict
in 1 way ANOVA DFG is
I-1
in 1 way ANOVA DFE is
N-I
in 1 way ANOVA DFT is
N-1
in 1 way ANOVA MSG is
SSG/DFG
in 1 way ANOVA MSE is
SSE/DFE
in 1 way ANOVA F is
MSG/MSE
as DF gets bigger for the same groups
the resulting p values get small - bc less variability is allowed fo rit to be statistically signifigant.
if asked "what are the df for the anova f stat'
list both numerator and denomenator DFG and DFE , but default to DFE
R code for SE of mean response
vx<-var(x) and s is given in summary fit
se<-s*sqrt((1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx
se for prediction interval yhat
vx<-var(x) and s is given in summary fit
se<-s*sqrt(1+(1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx)))