Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
60 Cards in this Set
 Front
 Back
regression predicts what?

causation


corelation shows what?

association, which is not causation


Regression formulas
SST(total sum of squares) =? 
SSR + SSE= sum of (yy bar) squared


Regression
sum of squares 
S=sum of all vertical deviations from proposed line squared
S=e1^2+e2^2+e3^2 

Regression
the best straight line, is one that ? 
minimizes S
smallest S is the best line 

A least squares regression selects line with?

the lowest sum of squared errors


Regression
The coefficient of determination is? the higher its value the more? 
R^2
higher value=more accurate 

R^2 measures what?

relationship btween INDV and DV


Regression
Standard Error equation 
Square root (SSE/ NK
N=# obs in sample K=number of INDV 

Regression
Interpolation 
prediction using value of IDV within observed range
uncontroversial 

Regression
Extrapolation 
preduction using value of IDV outside observed range
should be avoided, if poss 

Simple Linear Regression
IDV AKA DV AKA 
idv=predictive variable
dv=response variable 

Sig Testings in Regression
explain the diff tests involved 
Ftes= judges if explanatory variable(indep), adequately describe outcome variable
Ttest= applies to indivd. IDV(explanatory)says if this particular variable has effect on outcome,holding others same R^2measures strength of relationship of IDV and DV 

in simple linear regression
R^2=? 
r^2
rcorrelation coeffienct R^2coefficient of determination 

Multiple Linear Regression
E= B0. B1, Bn = 
statistics error in population
residual in sample B's=unkown parameters 

Multiple Regression
3 ways to do it, what are they? 
1)Backward multiple regression
2)Forward multiple regression 3)Mix of back/forward multiple regression 

backward multiple regression
AKA? How to do it? how to evaluate individual relationship 
AKA reverse elimination
drop least sig variables one at time, til left with only sig variables t test evaluates the inividual relationship 

forward multiple regression

pick IDV that explains most variation in DV, then the next one1 at time, till no variables significant explain variation


mixed of backwards and forward

do forward selection first, but drop variables which become no longer significant after introduction of new variables
not used much 

Multiple Regression
Dummy Variable aka? what does it do? 
indicator variable
introduces qualitative gives values of 0 or 1 to indicate absence of prescence of caregorical info female= 1 when pt is female female=0 when pt is male 

Assumptions for Multiple Linear Regresions

normal distribution
variance of regression line is same for all values of explanatory variables explanatory variables (IDV) are not correlated 

Nonlinear functions can be fit as?

regressions
logarithmic, exponential 

what is multicollinearity

problem in interpretation of regression coefficients when IDV are correlated


what is collinearity

exists when IDV are correlated


detecting multicollinearity

chekin correlation coefficient matrix
Ftest sig with many insig t VIF>5 

what is VIF

variance inflation factor
quantifies sverity of multicollinearity in ordinary least squares regression gives variance of an estimated regression coefficient is increased cuz of collinearity 

how to correct for multicollinearity

pick IDV with low collinearity
use stepwise regressionwhere u put most correlated variable in equation 1st, then next most order of entry of variables matters here 

what is adjusted R^2

adjusts for inflation in R^2 caused by number of varibales in equation
as sample size increase >20 cases per variable, asjut is less needed basically go with asjusted R2 unless, sample size > IDV *20 

logistic regression
define AKA 
multivariate regression that uses max likelihood estimate to see relationship btwn CATEGORICAL dependent(Y) and multiple IDV(X)
AKA logit analysis 

Logistic transformation

event occurrence (NO, YES)
> PROB (0.....1) >ODDS (0...+INF) do P/1P for each value=odds then do log (o) and log (inf) = (inf to + inf) 

types of logistic regression

simple
multiple multinominal 

simple logistic regression
application DV ADV? 
relationship btwn single IDV(continuous or categorical) and single DVusually binary variable
example: DVyes/no IDV: yes/no 

Multiple Logistic Regression
application DV: IDV 
relationship btwn 2 or > IDV (continuous of categorical) and single DVusually binary
DVStroke(Yes/NO) IDVage, HTN, diabetes, gender 

Multinominal Logistic Regression
application IDV DV 
relationship btwn 2> IDV (cont or cat) and a single CATEGORICAL dv with more than 2 possible choices
DVHTN(bad/mod/ok) IDVage,race,meds,gender AKA polytomous logistic regression 

define
Odds Odds ratio(OR) 
Oddsratio of probability of success to prob of failure
=P/(1P) Odds Ratio(OR)ratio of adds an event occur in one group to odds of others OR=Oddsgrp1/Oddsgrp2 

What do each of these mean
OR=1 OR>1 OR<1 
OR=1 no association
>1 positive association of grp1 and grp2 <1 negative assoc. of grp1 + grp2 

Relative Risk
define AKA 
prob that member of exposed grp will develop disease to prob member of unexp develops same disease
RR=P(Disexp)/P(Disunexp) 

parametric stats
what class of stats? assumptions? 
its inferential
normally distributed estimation of at least 1 parameter at least 1 interval measures mean=mode=median 

example of parametric stats

peasrson correlation coefficient(r)
unpaired/paired t test ANOVA Regression 

Non parametric stats
assumptions? 
distributionfree stats
second class of inferential stats usually nominal or ordinal but CAN be continuous(nom + ratio) 

nonparametric stats
examples? 
chisquare
sign test fisher exact test 

sign test
define 
test the equality of median of 2 comparative groups
simplest nonparametric test used for quick look at data for parametric 

Sign Test
data? assumption computation? 
paired data(DV)
assumptioneach paired diff is meaningful comp1do diff of each paired data 2discard any zeros 3apply sign test 4 hypothesis testing 

chi square stats
define 
nonparametric test to c if
relatinship exists btwn caregorical variables gender, ins type, statisfaction 

chi square tests?

"goodness of fit" btwn observation and theorretical distribution
values of 0 to infinity 

chi square hypothesis

H0data follow specified distribution
Ha: does not follow 

chi square
data? assumption? computation? 
DVnominal or ordinal
assumptiondata are indep random samples from population compconduct a R*C contringency table compute expected value do formula hypothesis testing 

Chi Square
fe= 
(fr*fc)/N
r=row, c=column N=# of subjects df= (C1)*(R1) 

main diff in descriptive and inferential stats?

inferential has hypothesis testing


define correlation

interrelationship btwn 2 CONTINUOUS variablesinterval and ratio


Assumptions of correlation

1normalitynormal distribution
2Linearitylinear relationship 3homoscedascityerr or residual variance in model are identically distributed 

what statistic to test homoscedascity?

Ftest


correlation coefficient
what does it indicate ranges from? 
r
direction + strength of correlation 1.0 to 1.0 

pearson correlation refers to
spearman correlation refers to 
simply correlation(continuous variable)
spearalternative for pearson, continous varibale but not normal distribution(nonparametric) 

what is type 3 error

correctly reject Null, for wrong reason


correlation
null hypothesis? alt? how to test 
H0correlation between two is 0, uncorrelated
hais nonzero to test if r is sig diff from 0, we use t test for pearsons r 

correlation
how to calc df 
degrees of freedom is N2*****************


factors that influence the correlation

1correlation coeff(2)..closer r is to 1 or +1 the greater chance of significance
2sample sizelarger samples, greater chance of sig 3linearitycorrelation only exists in linear relationship 

correlation CANNOT be equated with?

causation


correlation coefficient ranges

0.2 very low
.2.4 low .4.6 mod .6.8 highly mod .81 very high 