Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
60 Cards in this Set
- Front
- Back
regression predicts what?
|
causation
|
|
corelation shows what?
|
association, which is not causation
|
|
Regression formulas
SST(total sum of squares) =? |
SSR + SSE= sum of (y-y bar) squared
|
|
Regression
sum of squares |
S=sum of all vertical deviations from proposed line squared
S=e1^2+e2^2+e3^2 |
|
Regression
the best straight line, is one that ? |
minimizes S
smallest S is the best line |
|
A least squares regression selects line with?
|
the lowest sum of squared errors
|
|
Regression
The coefficient of determination is? the higher its value the more? |
R^2
higher value=more accurate |
|
R^2 measures what?
|
relationship btween INDV and DV
|
|
Regression
Standard Error equation |
Square root (SSE/ N-K
N=# obs in sample K=number of INDV |
|
Regression
Interpolation |
prediction using value of IDV within observed range
uncontroversial |
|
Regression
Extrapolation |
preduction using value of IDV outside observed range
should be avoided, if poss |
|
Simple Linear Regression
IDV AKA DV AKA |
idv=predictive variable
dv=response variable |
|
Sig Testings in Regression
explain the diff tests involved |
F-tes= judges if explanatory variable(indep), adequately describe outcome variable
T-test= applies to indivd. IDV(explanatory)-says if this particular variable has effect on outcome,holding others same R^2-measures strength of relationship of IDV and DV |
|
in simple linear regression
R^2=? |
r^2
r-correlation coeffienct R^2-coefficient of determination |
|
Multiple Linear Regression
E= B0. B1, Bn = |
statistics error in population
residual in sample B's=unkown parameters |
|
Multiple Regression
3 ways to do it, what are they? |
1)Backward multiple regression
2)Forward multiple regression 3)Mix of back/forward multiple regression |
|
backward multiple regression
AKA? How to do it? how to evaluate individual relationship |
AKA reverse elimination
drop least sig variables one at time, til left with only sig variables t test evaluates the inividual relationship |
|
forward multiple regression
|
pick IDV that explains most variation in DV, then the next one-1 at time, till no variables significant explain variation
|
|
mixed of backwards and forward
|
do forward selection first, but drop variables which become no longer significant after introduction of new variables
not used much |
|
Multiple Regression
Dummy Variable aka? what does it do? |
indicator variable
introduces qualitative -gives values of 0 or 1 to indicate absence of prescence of caregorical info female= 1 when pt is female female=0 when pt is male |
|
Assumptions for Multiple Linear Regresions
|
-normal distribution
-variance of regression line is same for all values of explanatory variables -explanatory variables (IDV) are not correlated |
|
Nonlinear functions can be fit as?
|
regressions
-logarithmic, exponential |
|
what is multicollinearity
|
problem in interpretation of regression coefficients when IDV are correlated
|
|
what is collinearity
|
exists when IDV are correlated
|
|
detecting multicollinearity
|
chekin correlation coefficient matrix
Ftest sig with many insig t VIF>5 |
|
what is VIF
|
variance inflation factor
quantifies sverity of multicollinearity in ordinary least squares regression gives variance of an estimated regression coefficient is increased cuz of collinearity |
|
how to correct for multicollinearity
|
pick IDV with low collinearity
use stepwise regression-where u put most correlated variable in equation 1st, then next most order of entry of variables matters here |
|
what is adjusted R^2
|
adjusts for inflation in R^2 caused by number of varibales in equation
as sample size increase >20 cases per variable, asjut is less needed basically go with asjusted R2 unless, sample size > IDV *20 |
|
logistic regression
define AKA |
multivariate regression that uses max likelihood estimate to see relationship btwn CATEGORICAL dependent(Y) and multiple IDV(X)
AKA logit analysis |
|
Logistic transformation
|
event occurrence (NO, YES)
---> PROB (0.....1) ------->ODDS (0...+INF) do P/1-P for each value=odds then do log (o) and log (inf) = (-inf to + inf) |
|
types of logistic regression
|
simple
multiple multinominal |
|
simple logistic regression
application DV ADV? |
relationship btwn single IDV(continuous or categorical) and single DV-usually binary variable
example: DV-yes/no IDV: yes/no |
|
Multiple Logistic Regression
application DV: IDV |
relationship btwn 2 or > IDV (continuous of categorical) and single DV-usually binary
DV-Stroke(Yes/NO) IDV-age, HTN, diabetes, gender |
|
Multinominal Logistic Regression
application IDV DV |
relationship btwn 2> IDV (cont or cat) and a single CATEGORICAL dv with more than 2 possible choices
DV-HTN(bad/mod/ok) IDV-age,race,meds,gender AKA polytomous logistic regression |
|
define
Odds Odds ratio(OR) |
Odds-ratio of probability of success to prob of failure
=P/(1-P) Odds Ratio(OR)-ratio of adds an event occur in one group to odds of others OR=Oddsgrp1/Oddsgrp2 |
|
What do each of these mean
OR=1 OR>1 OR<1 |
OR=1 no association
>1 positive association of grp1 and grp2 <1 negative assoc. of grp1 + grp2 |
|
Relative Risk
define AKA |
prob that member of exposed grp will develop disease to prob member of unexp develops same disease
RR=P(Disexp)/P(Disunexp) |
|
parametric stats
what class of stats? assumptions? |
its inferential
-normally distributed -estimation of at least 1 parameter -at least 1 interval measures mean=mode=median |
|
example of parametric stats
|
peasrson correlation coefficient(r)
unpaired/paired t test ANOVA Regression |
|
Non parametric stats
assumptions? |
distribution-free stats
second class of inferential stats usually nominal or ordinal but CAN be continuous(nom + ratio) |
|
nonparametric stats
examples? |
chi-square
sign test fisher exact test |
|
sign test
define |
test the equality of median of 2 comparative groups
simplest nonparametric test used for quick look at data for parametric |
|
Sign Test
data? assumption computation? |
paired data(DV)
assumption-each paired diff is meaningful comp-1-do diff of each paired data 2-discard any zeros 3-apply sign test 4- hypothesis testing |
|
chi square stats
define |
nonparametric test to c if
relatinship exists btwn caregorical variables gender, ins type, statisfaction |
|
chi square tests?
|
"goodness of fit" btwn observation and theorretical distribution
values of 0 to infinity |
|
chi square hypothesis
|
H0-data follow specified distribution
Ha: does not follow |
|
chi square
data? assumption? computation? |
DV-nominal or ordinal
assumption-data are indep random samples from population comp-conduct a R*C contringency table -compute expected value -do formula -hypothesis testing |
|
Chi Square
fe= |
(fr*fc)/N
r=row, c=column N=# of subjects df= (C-1)*(R-1) |
|
main diff in descriptive and inferential stats?
|
inferential has hypothesis testing
|
|
define correlation
|
interrelationship btwn 2 CONTINUOUS variables-interval and ratio
|
|
Assumptions of correlation
|
1-normality-normal distribution
2-Linearity-linear relationship 3-homoscedascity-err or residual variance in model are identically distributed |
|
what statistic to test homoscedascity?
|
F-test
|
|
correlation coefficient
what does it indicate ranges from? |
r
-direction + strength of correlation -1.0 to 1.0 |
|
pearson correlation refers to
spearman correlation refers to |
simply correlation(continuous variable)
spear-alternative for pearson, continous varibale but not normal distribution(nonparametric) |
|
what is type 3 error
|
correctly reject Null, for wrong reason
|
|
correlation
null hypothesis? alt? how to test |
H0-correlation between two is 0, uncorrelated
ha-is nonzero to test if r is sig diff from 0, we use t test for pearsons r |
|
correlation
how to calc df |
degrees of freedom is N-2*****************
|
|
factors that influence the correlation
|
1-correlation coeff(2)..closer r is to -1 or +1 the greater chance of significance
2-sample size-larger samples, greater chance of sig 3-linearity-correlation only exists in linear relationship |
|
correlation CANNOT be equated with?
|
causation
|
|
correlation co-efficient ranges
|
0-.2 very low
.2-.4 low .4-.6 mod .6-.8 highly mod .8-1 very high |