• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/63

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

63 Cards in this Set

  • Front
  • Back
Quantitative data- Continuous
Measurement- numerical values
- Temperature, height, weight, BP
Quantitative- Discrete
- Counts
- Levels
Sample vs. Population
Sample: attempted representation of a population, a subset
Population:
- Target population- group you are trying to represent with data
- Study population- where you pull your sample from
Probability distributions
- Shape of probabilities of a random variables taking on specific values
--normal, chi square, binomial
- Common distributions are defined by PARAMETERS
Types of distributions
- Symmetric
- Skewed (right, left)
- Discrete (bars)
Outliers
Can have a big effect on the mean!
What are the five quartiles?
Q1= 0%
Q2 = 25%
Q3 = 50%
Q4 = 75%
Q5 = 100%
What is the min, median, max in quartiles?
Q1 = min
Q3 = median
Q5 = max
What is the range?
Q5 - Q1
What is the IQR (in terms of quartiles)
IQR = Q4-Q2, could be preferred to the range
- middle 50% data
Qualitative data- 3 types
Nominal- no order to the categories
Ordinal- inherent order to categories
Dichotomous- everyone falls in one category or the other
Example nominal data
Zip code
Race
gender
Example ordinal data
Stage of disease
grade in school

- can be described by medians and modes
Example of dichotomous data
gender
unmarried/ married
what is variance?
- spread of data
How do you calculate variance?
- mean of the squared deviation of the variable from its own mean
- Standard deviation = variance
- Range (IQR) is another way to display variance
What is two number summary
- median + IQR
- mean + SD
Coefficient of variation
100 * (standard deviation/ mean)
Covariance
- how much they vary together
What does high correlation and covariance indicate?
- dependence between variables
what are other names for a normal distribution
- bell curve
- bell shaped
- Gaussian distribution
Describing data with two numbers
Normal distribution- Mean, SD
Right skewed- median, range
Not continuous- group percentages
Box and Whisker plot
Lower/Upper boundaries- quartiles
Dividing line- median
Tails show how far data extends from quartiles
What is a parameter? How is it useful?
- A parameter is a true value that summarizes a characteristic of a population
- Need to estimate parameters of a population using data
What are statistics used for?
- Statistics are used to estimate parameters
Accurate and valid methods- bias?
no bias
Precision, Reliable, Repeatable describe...
method with low variability
What is a confidence interval?
- Range of values that a parameter could take on
- with 95% confidence, the paramer is between LB and UB
Concept behind the confidence interval-
If you sampled the population 100 times, with the same data structure, 95% of the time, true value would be in the interval
Standard deviation vs. standard error
SD describes spread of data
SE describes estimated parameter, depends on sample size
Simple linear regression
Y = B0 + B1 * (X+E)
What does B0 represent
B0 is the intercept
What does B1 represent
B1 is the slope of X, the linear relationship that X has with Y
What does epsilon represent?
What is the goal of modeling?
Epsilon is the error term
- The goal in modeling is to minimize error
- Terms in the model should explain away the error
- The more significant variables you find to explain outcome, the smaller the error term will be
What is the use of the linear regression?
- Study impact of X on Y
- Forecast the value of Y for a given X
- Subtract effect of X from Y using this model to create Y* and Y* is now adjusted for X
What to look for in a graph-
Estimate
S.E.
P Value
Estimate- positive or negative, clinically large?
SE- large?
P Value- less than 0.05? If so, it is a significant predictor of Y. These are called parameter tests.
P values significance in regression
- General linear hypothesis of whether the model has ANY significant predictors
--Does the model explain any significant variability in the outcome?
Interaction
- Cannot interpret either effect alone (age and insurance), must interpret both through interaction
- B1 + B12 are related, and must be considered together
ADJUSTED values
subtract the effect of Xis from Y using the linear model to create Y*, and Y* is now ADJUSTED for Xi's
Analysis of variance
With two Wi's- what do you call this
- Categorical variables that may explain variability in Y- factors
- W1, W2
- Two factor ANOVA
With single factor ANOVA, and two levels, what do we want to conclude?
- Does Y differ, based on whether you are in group A or B?
- Is BMI ( continuous outcome Y) different for men and women (categorical explanatory variable, with two groups)
What is an indicator variable
- An indicator variable, I (X), can only take on two values, (0, 1)
- Where I(X) = 1 if X is true, and I(X) = 0 if X is not true
Simple regression model
Y= B0 + B1 (I(W1=B) + E
Mean for A: B0
Mean for B = B0+ B1
Because of indicator variable, all observations take on values 0 or 1
- 1 is group B, 0 is group A
- E are individual differences (errors) from the mean
When modeling a continuous outcome variable, you may use the terms:
- Regression for continuous predictors only
- ANOVA for grouping or categorical variables
- ANCOVA for continuous AND categorical variables in the model
What does a chi square test show?
If two variables are associated
Define degrees of freedom given R= rows and C=columns
Degrees of freedom = (R-1)(C-1)
What is the degrees of freedom?
- Chi Square distribution is defined by mean, (degrees of freedom)
- Under null hypothesis, test statistic should be (R-1)(C-1)
Comments on Chi-Square, cell size in 2x2
- Sample sizes in each cell must be large enough for the test to be accurate
- Each cell size must be higher than 5 for approximation to be close enough to chi-square distribution
What are cross tabs? Frequency tables?
- Categorical variables: birth year, gender, race, diagnosis, zip code, school yr
- Cross tabs= Group sizes
What is stratification?
- Separating the analysis based on categorical variables (like tabular data)
Odds ratio and relative risk
two statistics used and calculated from 2x2 tables
What is risk?
- Number of diseased compared to those at risk
If risk > 1, <1, =1
>1, higher likelihood of disease if exposed
<1, lower likelihood if you are exposed
=1, indicate that likelihood of disease does not depend on exposure
Odds Ratio
- Odds are a measure of likelihood of something happening, as compared to the likelihood that it does not
Odds ratio >1, <1, =1
>1, likelihood of disease if you are exposed is higher
<1, likelihood of disease if you are exposed is lower
=1, likelihood is same as if you were or were not exposed
What is the purpose of Fisher's Exact test?
- Tests null hypothesis of NO ASSOCIATION in a 2x2 table
Output/ Result Reporting- PARAMETERS
- not simply slope parameters anymore
- using a link function of natural log
- Monotone: increasing unction
- Positive and negative signs still mean the same thing (negative estimates indicate negative relationships)
For RR> 1, Parameter >0
- HIGHER likelihood of outcome with an increase
For RR<1, Parameter<0
- lower likelihood of outcome with an increase
RR=1, Parameter estimate = 0
- would indicate that likelihood of outcome does not depend on variable
Confidence interval- look to see if 0 or 1 is in it
- if 1 in interval for RR
- if 0 in interval for parameter estimate
In tabular (categorical data) what are outcome possibilities
- counts on tables
- measures on tables
Parameters in these models- what are thy?
- NOT slopes
- Transformed rate ratios that act multiplicatively on the OUTCOME