 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
Reading...
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
Play button
Play button
97 Cards in this Set
 Front
 Back
response variable

y

explanatory variable

x

population regression line

uY=Bo+B1x

sample/model regression line equation

yhat=bo+b1x

bo

intercept

b1

slope

subpopulation

the normal curve around each different value of x

UY

the actual average response for y given a value of x

sdeviations of x

are the same for all vlaue of x because , x values are part of a subpop on a normal curve.

ei

residual for a given x , obspredicted or yiyhati
or yi  (model regression equation, which gives yahti) 
s^2

is variance (thus s is SD)
s2= sum(yiyhati)^2/n2 you will not have to calculate this as it is given in summary fit 
in summary fit the "residual standard error"

is the same as "s" or standard deviation

when we estimate the standard error using "s" we move to the t dist. how many df does the t dist have

n2

list 4 different types of confidence intervals for t distributions in simple regression

b1+t*seb1
bo+t*sebo mean resp Uy+t*seuy (formula and r code ) prediction internval yhat+t*seyhat 
critical t in r

qt(.95,n2)

regular (not r formula) for se of mean response

s*sqrt ( 1/n) + xstar  xbar ^2 / sum (xixbar)^2

c level predication interval se formula (interval for yhat)

s*sqrt( 1 + 1/n + ((((( xstar  xbar ^2 / sum (xixbar)^2 )))))))

why does a small p value for a large n not mean there is a strong corr.

because , a large n helps us estimate the true mean, but it does not reduce scatter.

formula for sse

sum ( yiyhat) ^2
observed deviation from model 
formula for ssm

sum ( yhatybar) ^2
model deviation from actual 
sst

sum ( yi  ybar) ^2
total deviation from obs to actual 
if Ho: b1=0 were true the variation of y on x would be the same as

s2 or MST

t dist. dfm

1

t dist. dfe

n2

t dist dft

n1

mean square

ss/df

list forumas for msm, mse, mst

ssm/dfm
sse/dfe sst/dft 
r2 =

ssm/sst

f = (using anova variables)

msm/mse

t^2

f

how to get t

b1/seb1 (given in summary fit)

rho ( p ) is what

population correlation (r is sample correlation)

if rho p=o

then there is no linear relation, x and y are independent.

sig test for rho

is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.

in multiple linear regression p is

number of explanatory variables

in multiple linear regression I (capital i or j ) is

a specific explanatory varible

in multiple linear regression i is

one case for one explanatory variable

in multiple linear regression n is

n is the total number of values i

in multiple linear regression (one way anova) N is

the total number of n's  the sum of n's for each explanatory variable.

multiple regression format

multiple regression
yhat=bo+ b1xij...... 
how to get t

b1/seb1 (given in summary fit)

rho ( p ) is what

population correlation (r is sample correlation)

if rho p=o

then there is no linear relation, x and y are independent.

sig test for rho

is found using the t value just like for b1, but watch out as sometimes we want to only do a one sided test for rho.

in multiple linear regression p is

number of explanatory variables

in multiple linear regression I (capital i or j ) is

a specific explanatory varible

in multiple linear regression i is

one case for one explanatory variable

in multiple linear regression n is

n is the total number of values i

in multiple linear regression (one way anova) N is

the total number of n's  the sum of n's for each explanatory variable.

multiple regression format

multiple regression
yhat=bo+ b1xij...... 
in multiple linear regression  explain the total of I j p i n and N

sum of j/I = p
sume of i = n sum of n = N remember N observations and P explanatory variables 
sig test for bo in multiple regression

same bj/sej given in summary fit but
pt(t value, with df np1) not just n2 
hyp test for multiple regression
hypotheses 
Ho: all slopes = o
Ha: at least one explanatory variable not = o 
to execute a hyp test for multiple regression use the ___ test stat

f

in multiple regression DFM is

p

in multiple regression DFE is

np1

in multiple regression DFT is

n1 ( like in simple regression)

how to take f stat

msm/mse

sqrt( R ^ 2 )

the multiple correlation coefficient  correlation between yi and yhati
(is is a corr between yaht and y) 
why does a low p value not imply a large R^2

a statistically significant result is much different than a large effect of x on y .

factor

categorical explanatory variable

again how to find f

msm/mse

t^2 is

f

sig test for rho p involves

Ho: p not = 0 use t test (if p=0 then the relation is not linear and x , y are independent) (sometimes this test will be one sided)

how to find p value for t

2*(1pt(tstat, df n2 for simple and np1 when testing 1 P out of a multiple regression).
(when testing multiple P's use the F test) 
what is a one way anova

compares several population means
 a single quantitative response variable compared to several populations. 
2 sample t test formula (both sample sizes same n) for an individual mean1

t ij = (sqrt(n/2) * x1barx2bar )/ sp
where sp =sqrt(mse) 
2 sample t test formula (both sample sizes same n) for all means

t ij = (sqrt(n/2) * x1barx2bar )/ sp
where sp =sqrt(mse) take t and make it t^2 for f 
R code to run an F test for 1 way anova

juice<data.frame(vector,vector)
juice<stack(juice) names(juice) oneway.test(values~ind,data=juice,var.equal=T) fit<anova(lm(values~ind,data=juice) 
t test for one way anova

t.test(vector,vector,alternative=c("two.sided"),mu=0,paired=FALSE,var.equal=TRUE,conf.level=.95)

hypotheses in one way anova tests

ho: all means =
ha: not all means are = 
in order to assume equal SD's in one way anova

the largest sd must not be more than 100% the smallest

incase sqrt(mse is too much trouble to find sp, the manual way to find sp^2 is

(n11)*s1^2 + (n21)*s2^2....../(n11)+(n21).....
****s can be found in r by using the command "sd" sd(vector) 
I

number of populations to be compared

one way anova hypotheses

Ho: u1=u2=u3.....
Ha: not all Ui are = 
one way anova DFT

n1

one way anova DFG

group
I1 
one way anova DFE

NI

one way anova R^2

SSG/SST

to find the value of a residual given an x or y

plug it into the model and take the difference between what the model predicts and what was given

r code for getting to summary fit

set x and y vectors
fit<lm(y~x) can be +x2+x3... summary(fit) 
how to find p value for f

pf(f, DFM, DFE)
dfm and dfe vary depending on weather this is an f test of multiple regression or a 1 way ANOVA test. 
how to get anova chart in r

anova(fit)

in a hyp test conclusion:

re state hypotheses
re state variables 
how to conclude, if we reject null in a multiple regression F test

if we reject null then the , there is a low chance all the variables produce a slope of zero, so at least one x influces y and model is valid

1r^2

unexplained x on y

if p variables are greater than N observations

there could be alot of conflict

in 1 way ANOVA DFG is

I1

in 1 way ANOVA DFE is

NI

in 1 way ANOVA DFT is

N1

in 1 way ANOVA MSG is

SSG/DFG

in 1 way ANOVA MSE is

SSE/DFE

in 1 way ANOVA F is

MSG/MSE

as DF gets bigger for the same groups

the resulting p values get small  bc less variability is allowed fo rit to be statistically signifigant.

if asked "what are the df for the anova f stat'

list both numerator and denomenator DFG and DFE , but default to DFE

R code for SE of mean response

vx<var(x) and s is given in summary fit
se<s*sqrt((1/length(x))+((xstarxbar)^2/((length(x)1)*vx 
se for prediction interval yhat

vx<var(x) and s is given in summary fit
se<s*sqrt(1+(1/length(x))+((xstarxbar)^2/((length(x)1)*vx))) 