Descriptive statistics
describe, organize, summarize data Ex: avg. cholesterol values
Inferential statistics
make inferences based on data
Ex: sample cholesterol values from RANDOM # students
a population
observations or measurements of ENTIRE subjects
a sample
a subset of the population-measurements of PARTIALLY selected subjects
another word for observation
element (X)
simple random sample
every element has equal prob. of being included
Ex: drawing fromm stack of 52 playing cards
How do you eliminate biased samples?
True randomization...Stratified randome samples
stratified random sample
population divided into homogenous groups or strata (age, ethnicity, gender)
cluster sample
based on geographic areas
used when too expensive to draw simple random sample
systemic sample
ex: select every 5th student
Facts about probability
cannot be negativ
expressed as decimals
lie between 0 and 1
1-p=probabilty event wont occur
When do you use the addition rule?
when the events are mutually exclusive
When do you use the multiplication rule?
When 2 or more events are independent and both could happen at the same time
Binomial distribution is normally...
to describe inheritence of genetic disease. The p with ONLY 2 possibilities
Nominal data
organized into qualitative groups: male/female
Ordinal data
data organized in ranking order
DOES NOT provide info on size of INTERVAL b/n data
Interval data
data organized into meaningful order with meaningful intervals in b/n
EX: centrigrade scale
Ratio data
data organized with meaningful intervals
Ex: Kelvin scale, seconds, days, pulses/min
cumulative frequency distribution
% of elements lying within and below each class interval
Ogive curve
S-shaped curve for cumulative frequency
What do centile ranks tell us?
% of observations that fall below any particular score
What is the shape of the normal distribution curve?
symmentrical bell shaped
Gaussian distribution
How can you tell whether a skewed distribution is positive or negative?
by the location of the tail of the curve
Ex of central tendencies
mean, median, mode
Central tendency that occurs with the greatest frequency
What is the median when you have an odd # of elements?
middle number
What is the median when you have an even # of elements?
avg of two middle scores
VERY sensitive to extreme scores
3 measures of variability
range, variance, standard deviation
Difference b/n lowest and highest scores
How do you determine the deviation score?
difference between elements and the mean
* the sum of the deviation scores for all elements is 0
Can you use deviation scores to differentiate b/n 2 different normal distributions?
What is the variance of a distribution
the mean of the squares of all the deviation scores.
1. find deviation scores
2. square each deviation score
3. obtain mean
How do you find the standard deviation?
it is the square root of the variance.
What % of the distribution falls within +/- 1 s.d. of the mean?
What % of the distribution falls within +/- 2 s.d. of the mean?
What % of the distribution falls within +/- 3 s.d. of the mean?
if element lies above the mean, it will have a ____ z score?
if element lies below the mean, it will have a ____ z score?
how to find z score
subtract mean from element and divide by s.d.
What is used for finding the probability that a random element will have a score above or below a mean of the distribution?
z score
What type of statistics makes conclusions about a population?
When plotted, what type of distribution will you see when plotting the random sampling distribution of means?
normal distribution even though the shape of the pop. distribution is rectangular
Are confidence limits 1 tail or 2 tails?
ALWAYS 2 tails b/c you need a max and min value
What is CI (confidence interval)?
the diff. b/n the upper and lower confidence limits
CI decreases in proportion to what?
the square root of the sample size (to halve the CI, the sample size must be increased 4 fold)
How is precision proportional to the sample size?
precision = square root of n
(precision)^2 = n
As you increase sample size, what happens to the width of the CI?
width narrows
To double precision, how must you change the sample size?
multiply sample size by 4 to double precision
When something is precise, is it scattered or clustered?
When something is inaccurate, it it biased or unbiased?
When are statistics precise?
when they are immune from random variation
precision is shown by the ____while accuracy is shown by the ______ b/n the mean of the random sampling distributions of means and the true pop. mean.
precision is shown by the width of the distributions (inversely)
accuracy is shown by the distance (inversely)
Do you use the z score or t score when making inferences about means that are based on estimates of pop. parameters?
t score
For any given proportion of the distribution, is z constant? t?
z is constant while t is NOT constant b/c it depends on the size of the sample
When are z and t similar?
when the sample size is larger than 100
What do t values depend on?
degrees of freedom (df=n-1)
when do you use 1 tail? 2 tails?
1 tail is directional (improves, impairs)
2 tails is NON-directional (affects)
What is alpha in hypothesis testing?
decision criterion/significan level: the point when the difference b/n the sample mean and the hypothesized population is due to chance or due to real effect
What is the Central Limit Theorem?
States that a mean of random sampling means will be very close to the true pop. mean
What is the conventional level of alpha?
If the probability that the sample mean could have come from the hypothesized pop is less than or equal to 0.05 what happens to the null hyp?
it is rejected
What is the range of acceptance?
the middle 95% distribution
What are the limits of the area of rejection defined by?
the critical values
What does it mean in terms of significance when the null hyp is rejected?
that the diff. b/n the sample mean and the hypothesized mean is statistically significant b/c it was rejected at the 0.05 level
What does it mean in terms of significance when the null hyp is accepted?
that the diff. b/n the sample mean and the hyp population mean failed to reach statistical significance (NOT signifanct)
What does significant at p<0.05 mean?
an investigator can be 95% sure that the result was not obtained by chance. The diff was significant or real
What is a type I error?
rejecting the null when it is true (false negative)
What is a type I error also referred to as?
alpha error
What is a type II error?
accepting the alternative hyp when it is actually false (false postitive)
A type II error is also known as?
beta error
1-beta is what?
power of the test
what does power of the test tell us?
the probability that alt. hyp will be rejected
A test is reqd to a have a power of at least what to be acceptable?
What are the 2 errors of hypothesis testing?
alpha error: rejecting a null that is actually true
beta error: accepting a alt. hyp that is false
what is the correlation b/n power of test and alpha?
power increases as alpha increases
are 1 or 2 tailed tests more powerful?
1 are more powerful b/c they are more strict
What is the chi square test for?
a test of proportions: testing hypothesis for nominal scale data
What's the difference b/n experimental studies and nonexperimental?
experimental: give drug to experimental group and compare effect w/ control group that hasn't taken drug
nonexperimental: observe the effect of drugs by comparing 2 groups who have ALREADY taken the drugs and who have not taken any--No ethical issues
Purpose of clinical trials?
used to evaluate the effects of treatments and to isolate one factor by holding all other factors constant
What is the best method of assignment in RCCT trials?
randomization to reduce the selection bias so that any difference that appears b/n 2 groups at the end of the study can be attributed to TX
What is the diff. b/n double blind studies and single-blind studeis?
double-blind: neither subjects nor investigators know
single-blind:only the subject is unaware and isn't as effective b/c humans can't control their emotions
In RCCT, how are the effects of confounding variables reduced?
by matching: 48 yr old male vs. 42 yr old male to see effect of drug (not 48 vs. 21) need to eliminate confounding variables: age and gender
Explain crossover designs.
a repeated measures design b/c the measurements are repeated w/n each patient at different times
Ex: patient A: drug 1st month, washout 2nd month, placebo 3rd month
What study is the first method used to study a particular, new disease-also called exploratory studies?
Descriptive studies: most powerful method to study new disease
What studies fall under the category: nonexperimental studies?
descriptive studies
analytical studies
cohort studies
case-control studies
case-series studies
prevalence surveys
What do analytical studies do?
aim to test hypothesis or to prove explanations about a disease after observing the particular disease
What happens in a cohort study?
group of young, healthy people selected and observed for an extended period (15+yrs) and followed forward from a particular point in time
Cohort= prospective
Advantages to cohort study?
no ethical problems
lifestyle and health managment are the choice of individuals not investigators
establish absolute risk
NO bias
Disadvantages to cohort study?
impractical for RARE diseases
Describe a case control study.
comparison is made b/n the cases (who have disease) and the controls (who do not)
Retrospective b/c they start w/ the outcome and then look back into the past
Advantages of case-control study?
quick and cheap
important for RARE diseases
need few subjects
can establish multiple potential causes
Disadvantages of case-control study?
highest degree of recall bias
loss in info of risk factors if people die
cannot prove a cause-effect relationship
Describe case-series studies.
Reports or presentations of a disease.
NO Controls
DO NOT follow up
used to present new info about patients with RARE disease and develop new Ho.
Describe prevalence surveys "community surveys"
surveys of a WHOLE population
SINGLE examination of pop at a PARTICULAR point in time
DO NOT follow up
AKA: cross-sectional studies`
Advantages of prevalence surveys?
b/c of community study, info on wide range of disease and characteristics can be gathered and used for hypothesis
Disadvantages of prevalence study?
not usable for ACUTE diseases
loss of subjects leaving from the community
Eqn for incidence rate
# new cases of disease/total # people at risk x unit time
eqn for prevalance rate
# new cases with disease/total # people at risk at a PARTICULAR time
when can use prevalance rate?
for stable chronic condition: hypertension, diabetes
NOT for ACUTE DISEASE: appendicitis, pulm embolism
eqn for prevalence?
annual incidence rate x average duration (yrs)
eqn for mortality rate?
total # deaths/total # of people at risk x time
What happens to prevalence if you increase incidence?
bathtub: increase incidence increase prevalence
eqn for case fatality rate?
# deaths due to disease/total # cases

DOES NOT depend on time
eqn for attack rate?
# people contracting disease/ total # people at risk

(at 1 time, 1 incidence)
absolute risk =
incidence rate of disease
Which study is the best for determining the effect of a risk factor?
cohort study: identify the risk factor of the disease
relative risk =
incidence among those EXPOSED to risk factor/incidence amoun those not exposed to risk factor
absolute risk reduction:
control incidence - experimental incidence
1 life/absolute risk reduction
cost estimate eqn=
cost per month x months x NNT
attributable risk =
incidence exposed - incidence non-exposed
What kind of study will be used to find the odds ratio?
case control
eqn for odds ratio?
A x D / B x C
What does it mean when the odds ratio = 1
risk factor is NOT related to disease
What does it mean when the odds ratio is LESS than 1?
the risk factor may be a protective factor against disease