Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
97 Cards in this Set
- Front
- Back
response variable
|
y
|
|
explanatory variable
|
x
|
|
population regression line
|
uY=Bo+B1x
|
|
sample/model regression line equation
|
yhat=bo+b1x
|
|
bo
|
intercept
|
|
b1
|
slope
|
|
subpopulation
|
the normal curve around each different value of x
|
|
UY
|
the actual average response for y given a value of x
|
|
sdeviations of x
|
are the same for all vlaue of x because , x values are part of a subpop on a normal curve.
|
|
ei
|
residual for a given x , obs-predicted or yi-yhati
or yi - (model regression equation, which gives yahti) |
|
s^2
|
is variance (thus s is SD)
s2= sum(yi-yhati)^2/n-2 you will not have to calculate this as it is given in summary fit |
|
in summary fit the "residual standard error"
|
is the same as "s" or standard deviation
|
|
when we estimate the standard error using "s" we move to the t dist. how many df does the t dist have
|
n-2
|
|
list 4 different types of confidence intervals for t distributions in simple regression
|
b1+-t*seb1
bo+-t*sebo mean resp Uy+-t*seuy (formula and r code ) prediction internval yhat+-t*seyhat |
|
critical t in r
|
qt(.95,n-2)
|
|
regular (not r formula) for se of mean response
|
s*sqrt ( 1/n) + xstar - xbar ^2 / sum (xi-xbar)^2
|
|
c level predication interval se formula (interval for yhat)
|
s*sqrt( 1 + 1/n + ((((( xstar - xbar ^2 / sum (xi-xbar)^2 )))))))
|
|
why does a small p value for a large n not mean there is a strong corr.
|
because , a large n helps us estimate the true mean, but it does not reduce scatter.
|
|
formula for sse
|
sum ( yi-yhat) ^2
observed deviation from model |
|
formula for ssm
|
sum ( yhat-ybar) ^2
model deviation from actual |
|
sst
|
sum ( yi - ybar) ^2
total deviation from obs to actual |
|
if Ho: b1=0 were true the variation of y on x would be the same as
|
s2 or MST
|
|
t dist. dfm
|
1
|
|
t dist. dfe
|
n-2
|
|
t dist dft
|
n-1
|
|
mean square
|
ss/df
|
|
list forumas for msm, mse, mst
|
ssm/dfm
sse/dfe sst/dft |
|
r2 =
|
ssm/sst
|
|
f = (using anova variables)
|
msm/mse
|
|
t^2
|
f
|
|
how to get t
|
b1/seb1 (given in summary fit)
|
|
rho ( p ) is what
|
population correlation (r is sample correlation)
|
|
if rho p=o
|
then there is no linear relation, x and y are independent.
|
|
sig test for rho
|
is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.
|
|
in multiple linear regression p is
|
number of explanatory variables
|
|
in multiple linear regression I (capital i or j ) is
|
a specific explanatory varible
|
|
in multiple linear regression i is
|
one case for one explanatory variable
|
|
in multiple linear regression n is
|
n is the total number of values i
|
|
in multiple linear regression (one way anova) N is
|
the total number of n's - the sum of n's for each explanatory variable.
|
|
multiple regression format
|
multiple regression
yhat=bo+ b1xij...... |
|
how to get t
|
b1/seb1 (given in summary fit)
|
|
rho ( p ) is what
|
population correlation (r is sample correlation)
|
|
if rho p=o
|
then there is no linear relation, x and y are independent.
|
|
sig test for rho
|
is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.
|
|
in multiple linear regression p is
|
number of explanatory variables
|
|
in multiple linear regression I (capital i or j ) is
|
a specific explanatory varible
|
|
in multiple linear regression i is
|
one case for one explanatory variable
|
|
in multiple linear regression n is
|
n is the total number of values i
|
|
in multiple linear regression (one way anova) N is
|
the total number of n's - the sum of n's for each explanatory variable.
|
|
multiple regression format
|
multiple regression
yhat=bo+ b1xij...... |
|
in multiple linear regression - explain the total of I j p i n and N
|
sum of j/I = p
sume of i = n sum of n = N remember N observations and P explanatory variables |
|
sig test for bo in multiple regression
|
same bj/sej given in summary fit but
pt(t value, with df n-p-1) not just n-2 |
|
hyp test for multiple regression
hypotheses |
Ho: all slopes = o
Ha: at least one explanatory variable not = o |
|
to execute a hyp test for multiple regression use the ___ test stat
|
f
|
|
in multiple regression DFM is
|
p
|
|
in multiple regression DFE is
|
n-p-1
|
|
in multiple regression DFT is
|
n-1 ( like in simple regression)
|
|
how to take f stat
|
msm/mse
|
|
sqrt( R ^ 2 )
|
the multiple correlation coefficient - correlation between yi and yhati
(is is a corr between yaht and y) |
|
why does a low p value not imply a large R^2
|
a statistically significant result is much different than a large effect of x on y .
|
|
factor
|
categorical explanatory variable
|
|
again how to find f
|
msm/mse
|
|
t^2 is
|
f
|
|
sig test for rho p involves
|
Ho: p not = 0 use t test (if p=0 then the relation is not linear and x , y are independent) (sometimes this test will be one sided)
|
|
how to find p value for t
|
2*(1-pt(tstat, df n-2 for simple and n-p-1 when testing 1 P out of a multiple regression).
(when testing multiple P's use the F test) |
|
what is a one way anova
|
compares several population means
- a single quantitative response variable compared to several populations. |
|
2 sample t test formula (both sample sizes same n) for an individual mean1
|
t ij = (sqrt(n/2) * x1bar-x2bar )/ sp
where sp =sqrt(mse) |
|
2 sample t test formula (both sample sizes same n) for all means
|
t ij = (sqrt(n/2) * x1bar-x2bar )/ sp
where sp =sqrt(mse) take t and make it t^2 for f |
|
R code to run an F test for 1 way anova
|
juice<-data.frame(vector,vector)
juice<-stack(juice) names(juice) oneway.test(values~ind,data=juice,var.equal=T) fit<-anova(lm(values~ind,data=juice) |
|
t test for one way anova
|
t.test(vector,vector,alternative=c("two.sided"),mu=0,paired=FALSE,var.equal=TRUE,conf.level=.95)
|
|
hypotheses in one way anova tests
|
ho: all means =
ha: not all means are = |
|
in order to assume equal SD's in one way anova
|
the largest sd must not be more than 100% the smallest
|
|
incase sqrt(mse is too much trouble to find sp, the manual way to find sp^2 is
|
(n1-1)*s1^2 + (n2-1)*s2^2....../(n1-1)+(n2-1).....
****s can be found in r by using the command "sd" sd(vector) |
|
I
|
number of populations to be compared
|
|
one way anova hypotheses
|
Ho: u1=u2=u3.....
Ha: not all Ui are = |
|
one way anova DFT
|
n-1
|
|
one way anova DFG
|
group
I-1 |
|
one way anova DFE
|
N-I
|
|
one way anova R^2
|
SSG/SST
|
|
to find the value of a residual given an x or y
|
plug it into the model and take the difference between what the model predicts and what was given
|
|
r code for getting to summary fit
|
set x and y vectors
fit<-lm(y~x) can be +x2+x3... summary(fit) |
|
how to find p value for f
|
pf(f, DFM, DFE)
dfm and dfe vary depending on weather this is an f test of multiple regression or a 1 way ANOVA test. |
|
how to get anova chart in r
|
anova(fit)
|
|
in a hyp test conclusion:
|
re state hypotheses
re state variables |
|
how to conclude, if we reject null in a multiple regression F test
|
if we reject null then the , there is a low chance all the variables produce a slope of zero, so at least one x influces y and model is valid
|
|
1-r^2
|
unexplained x on y
|
|
if p variables are greater than N observations
|
there could be alot of conflict
|
|
in 1 way ANOVA DFG is
|
I-1
|
|
in 1 way ANOVA DFE is
|
N-I
|
|
in 1 way ANOVA DFT is
|
N-1
|
|
in 1 way ANOVA MSG is
|
SSG/DFG
|
|
in 1 way ANOVA MSE is
|
SSE/DFE
|
|
in 1 way ANOVA F is
|
MSG/MSE
|
|
as DF gets bigger for the same groups
|
the resulting p values get small - bc less variability is allowed fo rit to be statistically signifigant.
|
|
if asked "what are the df for the anova f stat'
|
list both numerator and denomenator DFG and DFE , but default to DFE
|
|
R code for SE of mean response
|
vx<-var(x) and s is given in summary fit
se<-s*sqrt((1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx |
|
se for prediction interval yhat
|
vx<-var(x) and s is given in summary fit
se<-s*sqrt(1+(1/length(x))+((xstar-xbar)^2/((length(x)-1)*vx))) |