Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
63 Cards in this Set
- Front
- Back
Quantitative data- Continuous
|
Measurement- numerical values
- Temperature, height, weight, BP |
|
Quantitative- Discrete
|
- Counts
- Levels |
|
Sample vs. Population
|
Sample: attempted representation of a population, a subset
Population: - Target population- group you are trying to represent with data - Study population- where you pull your sample from |
|
Probability distributions
|
- Shape of probabilities of a random variables taking on specific values
--normal, chi square, binomial - Common distributions are defined by PARAMETERS |
|
Types of distributions
|
- Symmetric
- Skewed (right, left) - Discrete (bars) |
|
Outliers
|
Can have a big effect on the mean!
|
|
What are the five quartiles?
|
Q1= 0%
Q2 = 25% Q3 = 50% Q4 = 75% Q5 = 100% |
|
What is the min, median, max in quartiles?
|
Q1 = min
Q3 = median Q5 = max |
|
What is the range?
|
Q5 - Q1
|
|
What is the IQR (in terms of quartiles)
|
IQR = Q4-Q2, could be preferred to the range
- middle 50% data |
|
Qualitative data- 3 types
|
Nominal- no order to the categories
Ordinal- inherent order to categories Dichotomous- everyone falls in one category or the other |
|
Example nominal data
|
Zip code
Race gender |
|
Example ordinal data
|
Stage of disease
grade in school - can be described by medians and modes |
|
Example of dichotomous data
|
gender
unmarried/ married |
|
what is variance?
|
- spread of data
|
|
How do you calculate variance?
|
- mean of the squared deviation of the variable from its own mean
- Standard deviation = variance - Range (IQR) is another way to display variance |
|
What is two number summary
|
- median + IQR
- mean + SD |
|
Coefficient of variation
|
100 * (standard deviation/ mean)
|
|
Covariance
|
- how much they vary together
|
|
What does high correlation and covariance indicate?
|
- dependence between variables
|
|
what are other names for a normal distribution
|
- bell curve
- bell shaped - Gaussian distribution |
|
Describing data with two numbers
|
Normal distribution- Mean, SD
Right skewed- median, range Not continuous- group percentages |
|
Box and Whisker plot
|
Lower/Upper boundaries- quartiles
Dividing line- median Tails show how far data extends from quartiles |
|
What is a parameter? How is it useful?
|
- A parameter is a true value that summarizes a characteristic of a population
- Need to estimate parameters of a population using data |
|
What are statistics used for?
|
- Statistics are used to estimate parameters
|
|
Accurate and valid methods- bias?
|
no bias
|
|
Precision, Reliable, Repeatable describe...
|
method with low variability
|
|
What is a confidence interval?
|
- Range of values that a parameter could take on
- with 95% confidence, the paramer is between LB and UB |
|
Concept behind the confidence interval-
|
If you sampled the population 100 times, with the same data structure, 95% of the time, true value would be in the interval
|
|
Standard deviation vs. standard error
|
SD describes spread of data
SE describes estimated parameter, depends on sample size |
|
Simple linear regression
|
Y = B0 + B1 * (X+E)
|
|
What does B0 represent
|
B0 is the intercept
|
|
What does B1 represent
|
B1 is the slope of X, the linear relationship that X has with Y
|
|
What does epsilon represent?
What is the goal of modeling? |
Epsilon is the error term
- The goal in modeling is to minimize error - Terms in the model should explain away the error - The more significant variables you find to explain outcome, the smaller the error term will be |
|
What is the use of the linear regression?
|
- Study impact of X on Y
- Forecast the value of Y for a given X - Subtract effect of X from Y using this model to create Y* and Y* is now adjusted for X |
|
What to look for in a graph-
Estimate S.E. P Value |
Estimate- positive or negative, clinically large?
SE- large? P Value- less than 0.05? If so, it is a significant predictor of Y. These are called parameter tests. |
|
P values significance in regression
|
- General linear hypothesis of whether the model has ANY significant predictors
--Does the model explain any significant variability in the outcome? |
|
Interaction
|
- Cannot interpret either effect alone (age and insurance), must interpret both through interaction
- B1 + B12 are related, and must be considered together |
|
ADJUSTED values
|
subtract the effect of Xis from Y using the linear model to create Y*, and Y* is now ADJUSTED for Xi's
|
|
Analysis of variance
With two Wi's- what do you call this |
- Categorical variables that may explain variability in Y- factors
- W1, W2 - Two factor ANOVA |
|
With single factor ANOVA, and two levels, what do we want to conclude?
|
- Does Y differ, based on whether you are in group A or B?
- Is BMI ( continuous outcome Y) different for men and women (categorical explanatory variable, with two groups) |
|
What is an indicator variable
|
- An indicator variable, I (X), can only take on two values, (0, 1)
- Where I(X) = 1 if X is true, and I(X) = 0 if X is not true |
|
Simple regression model
Y= B0 + B1 (I(W1=B) + E |
Mean for A: B0
Mean for B = B0+ B1 Because of indicator variable, all observations take on values 0 or 1 - 1 is group B, 0 is group A - E are individual differences (errors) from the mean |
|
When modeling a continuous outcome variable, you may use the terms:
|
- Regression for continuous predictors only
- ANOVA for grouping or categorical variables - ANCOVA for continuous AND categorical variables in the model |
|
What does a chi square test show?
|
If two variables are associated
|
|
Define degrees of freedom given R= rows and C=columns
|
Degrees of freedom = (R-1)(C-1)
|
|
What is the degrees of freedom?
|
- Chi Square distribution is defined by mean, (degrees of freedom)
- Under null hypothesis, test statistic should be (R-1)(C-1) |
|
Comments on Chi-Square, cell size in 2x2
|
- Sample sizes in each cell must be large enough for the test to be accurate
- Each cell size must be higher than 5 for approximation to be close enough to chi-square distribution |
|
What are cross tabs? Frequency tables?
|
- Categorical variables: birth year, gender, race, diagnosis, zip code, school yr
- Cross tabs= Group sizes |
|
What is stratification?
|
- Separating the analysis based on categorical variables (like tabular data)
|
|
Odds ratio and relative risk
|
two statistics used and calculated from 2x2 tables
|
|
What is risk?
|
- Number of diseased compared to those at risk
|
|
If risk > 1, <1, =1
|
>1, higher likelihood of disease if exposed
<1, lower likelihood if you are exposed =1, indicate that likelihood of disease does not depend on exposure |
|
Odds Ratio
|
- Odds are a measure of likelihood of something happening, as compared to the likelihood that it does not
|
|
Odds ratio >1, <1, =1
|
>1, likelihood of disease if you are exposed is higher
<1, likelihood of disease if you are exposed is lower =1, likelihood is same as if you were or were not exposed |
|
What is the purpose of Fisher's Exact test?
|
- Tests null hypothesis of NO ASSOCIATION in a 2x2 table
|
|
Output/ Result Reporting- PARAMETERS
|
- not simply slope parameters anymore
- using a link function of natural log - Monotone: increasing unction - Positive and negative signs still mean the same thing (negative estimates indicate negative relationships) |
|
For RR> 1, Parameter >0
|
- HIGHER likelihood of outcome with an increase
|
|
For RR<1, Parameter<0
|
- lower likelihood of outcome with an increase
|
|
RR=1, Parameter estimate = 0
|
- would indicate that likelihood of outcome does not depend on variable
|
|
Confidence interval- look to see if 0 or 1 is in it
|
- if 1 in interval for RR
- if 0 in interval for parameter estimate |
|
In tabular (categorical data) what are outcome possibilities
|
- counts on tables
- measures on tables |
|
Parameters in these models- what are thy?
|
- NOT slopes
- Transformed rate ratios that act multiplicatively on the OUTCOME |