 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
Reading...
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
Play button
Play button
255 Cards in this Set
 Front
 Back
what is epidemiology?

The study of health and illness in human populations
The study of the distribution and determinants of healthrelated states or events in specified populations and the application of this study to control health problems 
How are disease occurrences determined?

Disease occurrences are determined by factors that can be identified and measured
modification of these factors is an effective way to prevent disease or to increase survival (public health) 
What do descriptive studies do?

To evaluate and describe distribution and trends in health and disease
To compare subgroups To provide the basis for planning health services and prevention activities To generate hypotheses To calculate measures of disease frequency 
What do analytic studies do?

To elucidate the determinants of disease
To test specific etiologic (or preventative) hypotheses To suggest potential for health promotion or disease prevention 
What are the design issues in observational studies?

Population (type and how observed)
Method of data collection Unit of analysis Selection of study participants Type of outcome measure 
What are the design issues with population?

How is the population observed
 Crosssectional or longitudinal How are they selected? Who do they represent?  Internal and external validity 
What are the design issues with methods of data collection?

Primary vs Secondary
Ways to measure exposure and disease status 
What are the design issues with unit of analysis?

Individuals
Groups 
What are the 4 levels of subject selection?

Levels of subject selection:
Target Population: population to which results can be applied Source Population: the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn Eligible Population: the population of subjects eligible to participate Study Participants: those people you contribute data to the study 
What is the direction of subject selection?

Downward: first target pop >source pop > eligible pop > study participants

What is the direction of 'participant selecton' applicatoin of results?

Upward: target pop <source pop < eligible pop < study participants

Describe target population

population to which results can be applied

Describe source population

the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn

Describe eligible population

the population of subjects eligible to participate

Describe study participants

those people you contribute data to the study

What is the direction of 'participant selecton' applicatoin of results?

Upward: target pop <source pop < eligible pop < study participants

Describe target population

population to which results can be applied

Describe source population

the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn

Describe eligible population

the population of subjects eligible to participate

Describe study participants

those people you contribute data to the study

What are the design issues with regard to outcome measures for crosssectional vs. longitudinal studies?

A longitudinal study is a correlational research study that involves repeated observations of the same items over long periods of time — often many decades. It is a type of observational study.
Unlike crosssectional studies, longitudinal studies track the same people, and therefore the differences observed in those people are less likely to be the result of cultural differences across generations. Because of this benefit, longitudinal studies make observing changes more accurate and they are applied in various other fields. In medicine, the design is used to uncover predictors of certain diseases. In advertising, the design is used to identify the changes that advertising has produced in the attitudes and behaviors of those within the target audience who have seen the advertising campaign. 
What are the design issues with regard to outcome measures for continuous vs. categorical data?

three primary measurement scales. These are continuous (quantitative,
scale), ordinal (semiquantitative, ranked), and categorical (qualitative, nominal). However, to simplify matters, we may treat ordinal outcomes as either continuous or categorical, based on judgement and depending on whether distributional assumptions can be met. quantitative information consists of both quantitative data  the numbers  and categorical data  the labels that tell us what the numbers measure. Qualitative variable are those that express a qualitative attribute such as hair color, eye color, religion, favorite movie, gender, and so on. The values of a qualitative variable do not imply a numerical ordering. Values of the variable “religion” differ qualitatively; no ordering of religions is implied. Qualitative variables are sometimes referred to as categorical variables. Values on qualitative variables do not imply order, they are simply categories. Quantitative variables are those variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size. 
What are the design issues with regard to outcome measures for prevalence vs. incident data?

• Prevalence: frequency of existing cases
• Incidence: frequency of new cases • New cases are called incident cases. • Existing cases are called prevalent cases. 
What are the differences between crosssectional vs. casecontrol studies?

Crosssectional studies (also known as Crosssectional analysis) form a class of research methods that involve observation of all of a population, or a representative subset, at a defined time.
They differ from casecontrol studies in that they aim to provide data on the entire population under study, whereas casecontrol studies typically include only individuals with a specific characteristic, with a sample, often a tiny minority, of the rest of the population. Both are a type of observational study. Unlike casecontrol studies, they can be used to describe absolute risks and not only relative risks. They may be used to describe some feature of the population, such as prevalence of an illness, or they may support inferences of cause and effect. Longitudinal studies differ from both in making a series of observations more than once on members of the study population over a period of time. 
Is epidemiology qualatative or quantitative?

quantitative

how do we go about measuring the distribution and determinants of disease?

By calculating measures of disease frequency and association

Why do we use Measures of Disease Frequency ?

Used to enumerate the occurrence of disease (or whatever your outcome is) in a specified population in a specified period of time.
The frequency of disease can be measured for either incidence or prevalent cases. The frequency can be expressed as either a count (an absolute measure) or as a relative measure 
Why do we use Measures of Association?

Reflect the strength of the statistical relationship between exposure status and disease occurrence. (example: relative risk, odds ratio)
Reflect the extra number of cases attributable to or prevented by the exposure in a particular population during a given time. (example: attributable risk) Compares the disease frequency between 2 or more groups at different exposure levels. 
How is count data useful?

Counts can be of incident or prevalent cases
Important to health planners Used in monitoring the occurrence of disease outbreaks This is generally the job of state and county health departments and federal agencies like the CDC. We can count incident OR prevalent cases. 
example of I and P measurement issues from WHO TB program

Often we are obliged to measure the burden of disease indirectly. The number of TB notifications varies with the iincidence of disease but unless we know the proportion of incident cases that are notified it is hard to determine the iincidence from the notification rate. Measuring the prevalence of disease is easier than measuring incidence since we only need to carry out one survey. Then if we know the average duration of disease, that is to say the time between developing disease and being cured or dying, we can make an estimate of incidence from prevalence. And of course the prevalence of infectious disease is important in its own right since it is this that determines transmission. Another way in which we might try to get an estimate of incidence is through the casefatality rate. If we know the proportion of people who will die after they have developed TB and if have an independent measure of the total number of people who die of TB, through vital registration data, for example, we can combine these to estimate the incidence of TB.

Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?

prevalence is dependent on both incidence and duration of the disease after onset  duration is determined by either survival for fatal disease or recovery for nonfatal diseases.
In a pop in a steadstate situation the relationship between prev and disease incidence and duration can be expressed as: (point prev/1point prev)= I*D The difference between I and PP is: its duration and magnitude of its PP. When PP is relatively low (0.05 or less) than (1PP) is almost = 1 therefore PP=I*D 
Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?

prevalence is dependent on both incidence and duration of the disease after onset  duration is determined by either survival for fatal disease or recovery for nonfatal diseases.
In a pop in a steadstate situation the relationship between prev and disease incidence and duration can be expressed as: (point prev/1point prev)= I*D The difference between I and PP is: its duration and magnitude of its PP. When PP is relatively low (0.05 or less) than (1PP) is almost = 1 therefore PP=I*D 
Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?
Part 2 
the prevalence ratio is the effect measure of interest when we are interested in the public health burden of disease, although in this situation the absolute prevalence and the prevalence difference are usually of more interest. However, when we are interested in disease etiology, the POR a) estimates the incidence rate ratio with fewer assumptions than are required for the prevalence ratio; b) can be estimated using the same methods as for the odds ratio in casecontrol studies, namely, the MantelHaenszel method and logistic regression; and c) provides practical, analytical, and theoretical consistency between analyses of a prevalence study and those of a prevalence casecontrol study based on the same study population. For these reasons, the POR will continue to be one of the standard methods for analyzing prevalence studies and prevalence casecontrol studies.
Effect measures in prevalence studies Environmental Health Perspectives, July, 2004 by Neil Pearce 
Between prevalence and incidence, which measure is more suitable for use in etiologic studies?

When studying the etiology of a disease, it is better to analyse incidence rather than prevalence, since prevalence mixes in the duration of a condition, rather than providing a pure measure of risk.
Etiologic research focuses on the factors that influence the change in status 
How do you measure incidence?

1. Cumulative Incidence (risk, incidence proportion)
Probability of a disease free individual developing disease over a specified time period, conditional on not dying from another cause Probability of dying (any cause) over a specified time period, then an unconditional probability Range 0 to 1, no dimensions 2. Incidence Density (Rate) Instantaneous occurrence of new cases of disease at a point in time, per unit time, relative to the size of the population at risk Range 0 to infinity, has dimensions For either formula: Number of new events in a defined population over a specified period of time/Population at risk for the event over the specified period of time 
What are the five ways to measure cumulative incidence?

1. Directly using the Simple Cumulative Method
2. Estimated by using the Life Table approach (Actuarial) 3. Estimated by the Kaplan Meier Method (survival) 4. Approximated by the Incidence Density Method 5. Cox Regression (Proportional Hazards Model)  
When do you use Simple Cumulative Method?

Data Requirement:
No losses to followup (for any reason including study termination or loss from competing outcomes) Generally, period of risk is short  But this is not mandatory Assumes a closed population  All individuals included in the calculation are followed up for the entire period of study 
What is the Simple Cumulative Method Formula?

CI(t0, t) = I/N’0
I = # events (t0, t) ; N’0 = # disease free at t0; t0 = start of follow up; t = end of follow up 
What is a Life Table?

 A systematic record of a group’s mortality or morbidity experience,
 over the study period which is broken up into specific time intervals. 
What the basic steps of a Life Table?

 Probabilities of survival and of cumulative incidence (CI) are calculated for each interval
 Overall cumulative survival is calculated for the study period  Lets you calculate CI over intervals 
What is the Life Table Method use for?

To examine the distribution of time between 2 events

When is the Life Table Method use for?

When there are losses to followup ( so the simple method will not work)
When there is an extended period of risk When exact times of disease occurrence and withdrawal are unknown When the rate of disease varies over the followup period When you are interested in knowing intervalbased risk 
What is the Classic Life Table's formula to calculate the probability of an event over 1 interval, (x,m)?

mqx = mdx / lx  (0.5) mcx
mqx = interval based probability of the event mdx = number of events occurring in that interval lx = population at risk at the beginning of the interval called the effective number at risk mcx = number of losses (censored observations) occurring in that interval x= time at the beginning of the interval m= time at the end of the interval, length of time 
What is the formula for Probability of SURVIVAL over 1 interval (x,m)?

mpx = 1 mqx
(1 – probability of an event) 
Probability of survival over >1 interval (x,m) is obtained by calculating the joint probability of no event =

mpx = (mpx) (mpx) (mpx)
For example, 3q0 = (1p0) (2p1) (3p2) 
Probability of event over >1 interval (x,m):

mqx = 1 – (mpx) (mpx)(mpx)
For example, 3q0 = 1 – (1p0) (2p1) (3p2) 
What is the first step of the Life table method?

Create a synthetic fixed cohort from the open cohort
Subject No in the yaxis Followup Months in the Xaxis 
Examples:
What is the risk of dying in interval 2? 
2q1 = 1/4 – (0.5)(1) = 0.286
x=1 lx=4 mdx=1 mcx=1 
What are the characteristics of the Life Table Method?

Does not require complete followup, i.e. it handles censoring.
Does not require the exact times at which events and withdrawals occurred. Followup period of interest must be split into successive intervals (these don’t have to be equal). 
What are the assumptions of the Life Table Method?

The rate of disease (after loss) of withdrawals is the same as the rate for those who remain under observation
Independence between withdrawal and survival Disease and withdrawals occur midway through the interval, on average Participants survive at risk for entire study period, i.e. risk is conditional on survival No secular trend in risk 
How is KaplanMeier's method similiar to the life table approach?

Handles loss to followup
Independence between withdrawal and survival Lack of secular trends 
How is KaplanMeier's method different to the life table approach?

No need to categorize followup time into intervals, intervals are based on the exact time when the event occures
Risk is estimated for the followup time corresponding to each case occurrence Primary distinction: no assumption about uniformity of withdrawals 
KaplanMeier Method: Formulas
Conditional probability of event at time i 
qi = di /ni
di = number of events occurring at time i ni = number of individuals under observation at time i 
KaplanMeier Method: Formulas
Conditional probability of survival at time i 
pi = 1 – qi

KaplanMeier Method: Formulas
Cumulative probability of survival at time 
Si = (pi) (pi) (pi)

KaplanMeier Method: Formulas
Cumulative probability of event at time i 
1  Si

Describe the KaplanMeier Method: Survival Curve

It is common to plot the cumulative survival (Si) or risk for the entire followup period – called a “survival curve”
Reflects the distribution of time to occurrence of all outcome events observed during the followup Si in the yaxis Time in the xaxis 
What does the KaplanMeier curve show?

The complements of the KaplanMeier estimated cumulative survival probabilities show
the CUMULATIVE RISK of recurrence throughout the study period. 
What does the incidence Density Method rely on?

Relies on the mathematical relationship between rate and risk in a time period where the RATE IS CONSTANT OVER THE TIME PERIOD.
It approximates cumulative incidence (risk) using incidence rates. 
In the Incidence Density Method, what is the relationship between Risk and Rate?

The simplest situation (introduced in Epid 603), when the risk is low, say <.1, and the rate is constant over the time period, the relationship is simple:
CI (approx =) Incidence RATE x specified tme period In summary: The population is decreasing each year as people die, But the formula does not take this reduction into account, therefore, We have overestimated the death rate. For a constant rate, the relationship is exponential, not linear. Although the formula provides a risk that is the same throughout the followup period, there is actually an exponential reduction in the size of the cohort If the 1year risk is 10%, the 5year risk is not 50%, (it’s something less) 
What is another formula to use to calucate ID when time is short?

when the time is short enough that the rate is still constant over the time period:
CI(t0, t) = 1 – e  [incidence rate * delta] ID = the rate over the time period delta = the elapsed time (age) between the t0 and time t 
What do we use when the rate is not constant over the entire period of interest?

the age/time stratified density method
CI(t0, t) = 1 – e [(sum of IDj (Delta j)] Delta j = the duration of the jth interval IDj = Ij/PTj =the estimated ID for the jth interval 
What are the assumptions of the Incidence Density Method?

Each agespecific ID is constant over the entire age interval for which that rate is estimated
Each agespecific ID is constant over calendar time (no secular trends) during relevant time frame 
Describe cumulative incidence

A measure of risk – the probability of an event in a specified time period/interval
Used in predicting risk, a measure of disease frequency Method of calculation depends on : Length of the underlying time period of “risk’ Features of the cohort 
If you have a fixed cohort, with a short period of risk, with few losses to followup, which Cumulative Incidence method do you use?

Simple Cumulative
fixed cohort = whent the exposure group in a cohort study represent groups that are defined at the start of f/u, with no movement of individusas between exposure groups are called fixed cohorts. if no losses occur from a fixed cohort it satisfies the close pop defination. > in thi s situatoin the unconditioal risks and average survival time s can be measured directly. 
When you have a fixed or dynamic cohort, with a long period of risk, and losses to follow up,
which Cumulative Incidence method do you use? 
Life Table :
Classical or KaplanMeier 
When you have a dynamic cohort, with usually long period of risk, and rates are available, which Cumulative Incidence method do you use?

Incidence Density
Dynamic cohort  the terms open or dynamic population describe a population in which the persontime experience can accrue from a changing roster of individuals. closed cohorts add no new people over time and loses members only to death, open pop may gain members over time, or lose membesrs who are still alive throug emigration. 
What is risk?

A risk (cumulative incidence) is a probability with no unit (although the element of time is implicit). It’s range is 0 1.

What is rate?

A rate is a ratio where the denominator is usually persontime at risk and has a unit. The range is 0  infinity.

What is the formulat for incidence density?

Incidence Density (ID) = I / PT (in a specified population during a specified period of time period
I = # of incident cases occurring during the time period PT =amount of person time experienced by the population at risk during followup 
Incidence Density RATE does what?

Measures the force (velocity) of disease occurrence change
Instantaneous occurrence of new cases of disease at a point in time, per unit time, relative to the size of the population at risk. The text makes a distinction between incidence density (denominator is based on individual data) and incidence rate (denominator based on aggregate data). In reality, these terms are used interchangeably. Since we cannot measure the instantaneous rate, we estimate an average rate for a given period. 
What are the assumptions for Incidence Density RATE?

One unit of persontime is equivalent to any other unit.
ID obscures any fluctuation in rates over time, ASSUMES A STEADY STATE of disease occurrence so it represents an AVERAGE RATE. IF RATES FLUCTUATE OVER TIME, ID MAY VARY for the same persontime followed, based on the actual – calendar  time of followup 
How do you calculate person time?

Depends on how much the investigator knows about individual followup times
1. If duration of followup (from entry until withdrawal) is known for each individual, ID can be calculated directly 2. If the duration of individual follow up is not known ID can not be calculated directly but it can be estimated. 
If duration of followup (from entry until withdrawal) is known for each individual, how do you calculate person time?

PT = sum of (delta) ti
(delta) ti = duration of followup for individual i 
If the duration of individual follow up is not known use the average population as the denominator (aggregate data), how do you calculate person time?

(1)PT = N’((delta) t)
N’ = estimated size of population at risk (often census data or vital satistics data) (delta)t = duration of observed followup Must assume that the population is stable (constant size and age distribution) Used to calculate rates from vital statistics data (2) PT (average population) = (initial population +final population) / 2 Used to determine the average population of a defined cohort 
Incident Density Assumptions

Independence between censoring and survival
Lack of secular trends ID individual data ~ ID aggregate data when withdrawals and events occur uniformly (more likely to occur with large sample and short intervals) Risk of event ~ constant over time during the interval risk: when this is not true, the followup can be divided in smaller intervals and ID calculated from each interval. 
Total Population Vs. Population at Risk
When calculating all causes of mortality, what is the TP/PAR relationship? 
TP=PAR
Incidence Density using TP in calculating person time = true rate 
Total Population Vs. Population at Risk
When the disease circumstance is low prevalence , what is the TP/PAR relationship? 
TP approx = PAR
Incidence Density using TP in calculating person time approximates the true rate 
Total Population Vs. Population at Risk
When the disease circumstance is high prevalence , what is the TP/PAR relationship? 
TP>PAR
Incidence Density using TP in calculating person time is less than the true rate 
Total Population Vs. Population at Risk
When the disease circumstance is low prevalence , what is the TP/PAR relationship? 
Incidence Density using TP in calculating person time

Total Population Vs. Population at Risk
When the disease circumstance  Infectious Disease where Immunity is Common, what is the TP/PAR relationship? 
TP>PAR
Incidence Density using TP in calculating person time is less than the true rate 
What is the Odds?

Another measure of disease frequency
The ratio of the probability of the event / probability of no event Can be calculated for incidence or prevalence Odds ~ a proportion when the proportion (either CI or prevalence) is <0.1 
Odds with incidence probabilities =

q / (1 – q)
Where q = probability of the event 
Prevalence odds =

Prev / ( 1 – prevalence)

Iif the proportion of smokers in a population = 20% then, the odds of being a smoker is =

.20 / (1.20) = 0.25
Another way to express the odds is 1:4 odds of being a smoker Although there’s nothing wrong about expressing disease occurrence in terms of the odds, it is not often done in epidemiology 
Describe the relationship between risk and odds

(Recall: odds ~ proportion when the proportion  either the CI or prevalence  is <0.1)
When risk = 0.80 > odds = 4 When risk = 0.50 > odds = 1.0 When risk = .20 > odds = .25 When risk = .10 > odds = .11 When risk = 0.01 > odds = 0.0101 
When do you calculate and/or report Cumulative Incidence (risk)
or Incidence Density (rate) 
No element of time is in CI, but have to have element of time in ID.

What are the Measures of Association (Effect) for?

Used to determine whether an association exists between an outcome and a study factor
Reflects the strength of the STATISTICAL RELATIONSHIP between the study factor and the disease Involves a DIRECT COMPARISON OF FREQUENCY MEASURES for different values or categories of the study factor Involves a comparison group which is ARBITARARY AND SET BY THE INVSETIGATOR (usually considered the unexposed or least exposed) 
How can we determine whether an GI illness was associated with food B, via a relative difference method?

calculate the RATIO of the attack rate between exposed and unexposed [ attack rate E / attack rate UE]

How can we determine whether an GI illness was associated with food B, via an absolute difference?

SUBTRACT the risk in the Exposed from the risk in the unexposed
[ attack rate E  attack rate UE] 
Types of Measures of Association Used in Analytic Epidemiologic StudiIies
Based on Relative Differences (Ratio Measures) 
Cumulative Incidence Ratio ( CIR, Risk Ratio)
Incidence Density Ratio (IDR, Rate Ratio) Odds Ratio (OR) 
Types of Measures of Association Used in Analytic Epidemiologic Studies
Based on Absolute Differences 
Attributable risk in the exposed (AR E, % AR E )
Population attributable risk (PAR, % PAR) Mean differences (continuous outcomes) 
Does absolute or relative differences search for causes?

Usually, Relative differences via relative risk/rate or relative odds

Does absolute or relative differences search for determinants, provide an example.

absolute difference  mean differences (continous outcomes)

Does absolute or relative differences search for primary prevention impact; search for causes?

absolute difference  via attributable risk in exposed

Does absolute or relative differences search for primary prevention impact

absolute difference  Population attribuatable risk

Does absolute or relative differences search for impact of intervention on recurrences, case fatality, etc.

absolute difference  efficacy

What are the Types of measures base on relative differences?

Ratio Measures (Relative Risk)
Cumulative Incidence Ratio (Risk Ratio) Incidence Density Ratio (Rate Ratio) Odds Ratio 
When looking at Cohort Study with Count Data, ie. A cohort study with dichotomous exposure categories and with all subjects followed for a fixed period of time, for Cumulative Incidence Ratio (Risk Ratio), what are the assumptions?

The distribution of CIR, (0, +∞), is not symmetric and not normal.
A log transformation is usually applied to CIR. If Ln(CIR) has an approximately normal distribution, the mathematical characteristics of this distribution can be used to construct a confidence interval. 
What is the formula for Cumulative Incidence Ratio (Risk Ratio)

CIR = CIe/CIne =
(a/a+b)/(c/c+d) 
When calculating CIR: Cohort study with count data, what are you doing?

A comparison of risk estimates
Generally calculated from cohort studies based on internal comparison of the cumulative incidence (risk) of the exposed and unexposed groups 
What is the defnintion of cumulative incidence/cumulative incidence rate?

The number or proportion of a group (cohort) of people who experience the onset of a healthrelated event during a specified time interval; this interval is generally the same for all members of the group, but, as in lifetime incidence, it may vary from person to person without reference to age.

What is the definition of the the cumulative incidence ratio?

the ratio of the cumultive incidence rate in the exposed to the unexposed.

How do you interpret the CIR?
Example: What is the association between taking antimalarial pills (E) and the development of malaria (D) among Peace Corps volunteers in Kenya, 1997? (1 year of followup, no losses, CI by the simple method) CI(E) = 60/1000 = 0.06 CI (UE) = 10/1000 =.01 CIR = 0.06 / .01 = 6.0 
Over the one year of the study, CI of developing malaria among those who took their antimalria pills was 6.0, compared to those who didn't take their antimalaria pills.

When looking a Incidence Density Ratio (Rate Ratio)
in a Cohort Study with Person time Data, what are you looking at? 
Ratio comparison of 2 average rates
Typically calculated from a cohort study drawn from a single defined population, either fixed or dynamic An internal comparison of incidence densities (rates) of the exposed and unexposed groups 
What is the formula for Incidence Density Ratio (Rate Ratio)

IDR = IDe/IDne = (Ae/Te)/(Ane/Tne)
Poptime (t0, t) Ae= Disease and Exposed Ane= Diseased and unexpoed Te=Poptime in exposed Tne= Poptime in unexposed 
Who do you interpret the Incidence Density Ratio (Rate Ratio)?
Example: Maternal alcohol consumption (E) and the development of fetal alcohol syndrome (D) in babies born in Oslo, Norway, 19901993 IDR = 60/6000 = 5.0 30/15000 
Mothers who consume alcohol during pregnancy are 5 times as likely to have a child with Fetal Alcohol Syndrome as those who do not consume alcohol during pregnancy.

What is the formula for Incidence Density Ratio (Rate Ratio), if we assume the rates are constant for the study period?

CI (03) = 1 – e (ID x 3)
For exposed: CI (03) = 1 – e (.01 x 3) = .0296 Unexposed: CI (03) = 1 – e (.002 x 3) = .006 Risk Ratio = .0296 / .006 = 4.93 
What is the Odds Ratio?

Calculated primarily from casecontrol studies, but also used for cohort studies (because it’s easy to calculate with logistic regression)
Same interpretation as CIR or IDR, with 1.0 being the null value Can calculate either the exposure odds ratio or the disease odds ratio – these are mathematically equivalent.** Some synonyms: probability relative odds, risk relative odds OR is a valid measure of association, but is often used to approximate the CIR/IDR in casecontrol studies We’ll cover this in greater detail in our lecture on casecontrol studies 
What is the Ratio of the odds of developing disease

odds of disease among the exposed / odds of disease among the nonexposed
OR disease = OR exposure 
What is the OR exposure?

odds of exposure among diseased / odds of exposure among the nondiseased

How do you calcuate the exposure OR?

odds of exposure in diseased/odds of exposure in nondiseased
= (a:c)/(b:d) = ad/bc 
How do you calcuate the disease OR?

(a:b)/(c:d) = ad/bc

OR example
Case control study of the association between aspirin consumption (E) and the development of a stomach ulcer (D) What is the OR dis and OR exp? How do you interpret this? 
ORdis = (150/76)/(78/69) =1.75
The odds of develping a stomach ulcer among those who consume asprin is 1.75 times greater than those who don't consume asprin. ORexp= (150/78) / (76/69) = 1.75 The odds of consuming asprin and developing a stomach ulcer is 1.75 times greater than those who dont have a stomach ulcer. 
Compare the formulas for disease incidence risk to probablity odds of disease

Disease Incidence Risk:
q+ = a/a+b q = c/c+d Probablity Odds of Disease q+ / (1 q+) = a/b q / (1q) = c/d 
OR vs RR in a Cohort Study with Count Data
OR is a valid measure of association, but is often used to approximate the CIR/IDR in casecontrol studies – Why? 
Because for many it’s easier to interpret
Because it is impossible to calculate the RR with certain designs (casecontrol) It’s easy to adjust an OR for confounding and can be derived from modeling (logistic regression) OR (event) is the exact reciprocal of the OR (nonevent) 
What is the OR ~ RR: The General Rule?

OR is a good approximation of the CIR/IDR when disease is rare in the population
In a casecontrol study, if controls are selected to represent the total population (rather than just noncases), then OR ~ CIR/IDR without regard for disease prevalence in the population. (Miettienen, 1976) Under certain sampling schemes, OR ~ CIR/IDR more directly, without regard for disease prevalence 
Why is the OR is a valid measure of association, but is often used to approximate the CIR/IDR in casecontrol studies?

Odds ratio (event) is the exact reciprocal of the OR (nonevent)
Example: Assume E = Female, D = Dead Alive F=308 Alive M = 142 Dead F = 154 Dead M = 709 OR = 0.1 The reciprocal would be OR (alive) = 1/0.1 = 10.0 (308/154) / (142/709) = 10.0 
OR ~ CIR, cont’d

Odds Ratio is biased away from the null ( in both directions)
Recall, OR= (q+ / 1q+)/(q / 1q) RR = q+ /q So, 1 q / 1 q+ defines the built in bias between RR & OR When disease is rare, this bias is negligible 
Example of Build in bias

Example (data from text tables 33 and 34):
OR = RR x ‘built in bias’ If, RR = 6.0 and CI UE = .0030 and CI E = .0180 OR = 6.0 x [(1.0030) / (1.0180)] = 6.09 If, RR= 6.09 and CI UE = .0705 and CI E = .2529 OR = 6.0 x [(1 .0705) / (1.2529)] = 7.46 If, RR= 3.59 and CI UE = .0705 and CI E = .2529 OR = 3.59 x [(1 .0705) / (1.2529)] = 4.46 
Both compare the likelihood of an event between two groups.
Example: Who’s more likely to die, men or women? Would you use OR or RR and are they the same? 
Alive F=308
Alive M = 142 Dead F = 154 Dead M = 709 Importance of structuring data in a 2x2 table Assume: E = Female, D = Dead RR = (154/462) / (709/851) = 0.4 Assume E = Male, D = Dead RR = (709/851) / (154/462) = 2.5 Assume E = Female, D = Alive RR = (308/462) / (142/851) = 3.99 OR = RR?? Assume E = Male, D = Dead  OR = 10.0 Assume E = Dead, D = Male  OR = 10.0 Recall, OR exp = OR dis Assume E = Female, D = Dead  OR = 0.1 
What are the Strengths of Ratio Measures?

All ratio measures have the same reference point, a null value of 1.0 (no association)
Comparisons across studies of different designs can be made, because OR ~ CIR/IDR IDR ~ CIR The strength of the association between a study factor and outcome is one element (an important one) used to assess causality 
What are the Limitations of Ratio Measures?

May be deceptive in addressing the impact of the risk factor in assessing an individual’s risk due to an exposure
Example: Risk of lung cancer and CHD for smokers and nonsmokers A) Lung Cancer &Smoker 0.06 Lung Cancer & NonSmoke 0.01 CHD & Smoke 0.4 CHD % Nonsmoke 0.2 B) Lung Cancer & CIRsmoke 6.0 Lung Cancer & Excess Risk smoking 0.05 CHD & CIR smoking 6.0 CHD & Excess Risk smoking .2 Note: CIR for Lung Cancer considerably higher than that for CHD, but smoking has a greater absolute impact on CHD as demonstrated by the excess risk measure. 
What are the Measures of Association: Absolute Differences (Difference Measures, Attributable Risk)?

These are measure of association between an exposure and outcome based on the absolute difference between two risk/rate estimates.
Difference measures are calculated by subtracting the frequency estimates of the reference group from the comparable estimate of the exposure group. 
Measures of Absolute Difference
What are the Types of difference measures? 
Attributable Risk (risk difference, excess fraction, etiologic fraction)
Attributable Risk in Exposed ( called Incidence Density Difference if using rates) Percent Attributable Risk in the Exposed Levin’s Population Attributable Risk 
What is the Attributable Risk in the Exposed?

Difference between risk estimates of different exposure levels and a reference level
The excess, above background, associated with the exposure under study Theoretically, the absolute excess incidence that would be prevented by eliminating the exposure. ARexp has the same unit as the incidence measure (dimensionless if CI; time1 if ID). 
What is the forumula for Attributable Risk in the Exposed?

ARe =CIe  CIne
or IDD = IDe  IDne ARexp has the same unit as the incidence measure (dimensionless if CI; time1 if ID). 
Attributable Risk in the Exposed, Example
How do you interpret ARexp? 
Excess risk of taking antimalarial pills attributed to the development of malaria among Peace Corps volunteers in Kenya, 1997
ARexp = (60/1000) – (10/1000) = 0.05 5% excess risk of developing malaria among those who take antimalarial medication versus those who do not take antimalarial medication. 
What is the Percent Attributable Risk in the Exposed?

The percent of the total risk in the exposed group attributable to the exposure, IF CAUSALITY HAS BEEN ESTABLISHED.

What are the formulas for Percent Attributable Risk in the Exposed?

Formulas:
%ARexp = (CIe  CIne /CIe) * 100 %ARexp = ((CIR* – 1.0)/CIR*) * 100 *Can use IDR or OR %ARexp = ((IDe  IDne)/IDe) *100 
What is Percent ARexp?

%ARexp is equivalent to percent efficacy when assessing an intervention such as a vaccine.
The group receiving the intervention is considered “nonexposed”, which has a lower incidence of the disease that is targeted by the vaccine. 
Percent Attributable Risk in the Exposed, Example
What is the percent of the total risk of developing malaria among those who take antimalarial medication attributable to the medication? (Peace Corp Volunteers, 1997) %ARe = (60/1000)  (10/1000) / (60/1000) x 100 %ARe = 83.3% How do you interpret this? 
83.3% of the total risk in the group taking antimalarial medication is attributable to taking the medication, conditional on the establishment of causality.

What is Levin’s Population Attributable Risk%?

The excess risk in the total population attributable to the exposure if a causal relationship is assumed.
Population attributable risk is a function of 2 parameters, prevalence of exposure in the population & the magnitude of the increase in incidence associated with the exposure. It is population specific, therefore, frequency of exposure is taken into account Cannot project the results from one population to another if the frequency of exposure differs between the populations 
What are the formulas for Levin’s Population Attributable Risk%?

Pop AR% = ((CIpop  CIne)/CIpop) x 100
PopAR% = ((pe(CIR*1)/ (pe(CIR* – 1) + 1) *100 pe: prevalence of exposure in the population * Can use IDR or OR 
When does CIpop ~ CIe?

When pe is high
Therefore, Pop AR% ~ ARexp when pe is high 
When does CIpop ~ CIne?

When pe is low

% PAR: Example
Given the following data, what is percent of the total risk of CHD in the population attributable to being a current smoker? RISK of CHDexp = 0.321 RISK of CHDnonexp = 0.162 RISK of CHD Total Pop = 0.197 How do you interpret this? 
%PAR = ((.197 – 0.162)/ .197) = 17.8%
17.8% of the total risk of CHD in the population is attributable to being a current smoker. 
What are the General Use of Difference Measures?

Reflect the excess number of cases attributable to the exposure
Reflect the expected effect of changing the distribution of one or more risk factors in a particular population Useful from a public health perspective – when attributable risk is high, the risk factor is of importance to the health of the community 
Which measure of association should you calculate from a study?
Ratio Versus Difference Measures Example: British Physician’s Cohort Study, Death rate per 100,000 person years LC=Lung cancer LC/Smoker = 166 LC/NSmo = 7 CHD/Smo = 599 CHD/NonSmo = 422 vs LC/IDR = 23.7 LC/Excess Rate = 159 LC %PAR  96% CHD/IDR = 1.4 CHD/ExRate = 177 CHD %PAR = 29% 
The impact of smoking in terms of absolute excess rate is about the same for both Lung Cancer and CHD
From a public health perspective on prevention, although the IDR for lung cancer > IDR for CHD, the absolute excess is the same 
Relative versus Absolute Differences

Relative differences are used most often when evaluating the DETERMINANTS of disease because they represent the magnitude of the association. This information is critical in the determination of causality.
ONCE CAUSALITY IS ASSUMED, absolute differences are more important measures from the perspective of public health administration and policy. 
Incidence of Diabetes Among Dependents of the U.S. Military Forces Admitted to U.S. Army Treatment Facilities, 19711991
Objective To determine the national incidence of diabetes in children by studying a group representing all parts of the country: the dependent children of U.S. Military personnel. Research Design and Methods Dates of admission, diagnosis of diabetes, age, and gender were collected for all 522,326 children age 21 or younger, of active duty military personnel admitted to US Army treatment facilitated during fiscal years 19711991. Incidence rates were expressed as cases per 100,000 personyears. Results  A total of 2308 cases of diabetes were diagnosed in 14.3 million personyears of followup over the 21 years. The overall incidence rate of diabetes in this population is 16.2 (95% CI 15.5 – 16.91) For 19871991, the agespecific rates were 8.1 (04 years), 15.9 (59 years, 25.6 (1014 years), 23.9 (1519 yrs), and 23.4 (2021 years) What type of study design? What is the discriptive group? What was their measure of ffrequency? Is this study generalizable? What are the 2 key problems with this study? 
What type of study? Cohort – longitudinal study
Discriptive group – single group cohort Calculated incidence Possible limited generalizability No case definition – DMI or DMII Under estimate of DMI bc parents prob not in the military if had h/o DMI 
What is the Epidemiologic definition of “cohort”?

Chort: A group of individuals that share a common characteristic
Example: Birth cohort: individuals born in the same period (often year) Exposure cohort: individuals sharing a common exposure (often an occupational exposure such as asbestos, etc.) The observation of a cohort(s) over time to measure a stated outcome. 
What are the two primary purposes of cohort studies?

Descriptive (measures of frequency)
To describe the incidence of an outcome over time or to describe the natural history of disease Analytic (measures of association) To analyze the relation between occurrence of outcomes and risk factors (or predictive factors) 
Cohort Studies: distinguishing features
What is the Directionality? 
: forward, incident cases
Recruited in healthy, but at risk for the disease of concern. Then broken into exposure status – follow them forward and measure disease status. 
When do you do prospective, longitudinal cohort studies?

sufficient EVIDENCE has accumulated from other studies to indicate a prospective cohort study is warranted
a NEW AGENT is introduced and requires monitoring for its possible association with adverse health outcomes (levaquin, vioxx) TEMPORAL association is unclear from a casecontrol study impressive results are obtained from a cc study (either positive or negative) and issues of VALIDITY (selection or information bias) are evident or are a concern 
Is Mold Exposure a Risk Factor for Asthma?
What is wrong with this statement: A remarkably consistent association between home dampness and respiratory symptoms and asthma has been observed in a large number of studies conducted across many geographic regions . In a recent review of 61 studies, it was concluded that dampness was a significant risk factor for airway effects such as cough, wheeze, and asthma, with odds ratios ranging from 1.4 to 2.2. Positive associations have been shown in infants, children, and adults, and some evidence for doseresponse relations has also been demonstrated . 
Although it has been concluded that the evidence for a causal association between dampness and respiratory morbidity is strong, this evidence is based mainly on crosssectional studies and
prevalence casecontrol studies; few prospective studies have been conducted 
Major Concerns for Cohort studies?

Selecting a cohort (sampling frame, sampling and recruitment, external vs internal validity)
Exposure assessment Followup Considered gold standard in observational studies Less prone to bias bc information bias – not relying on recall Randomized control trials overall gold standard – due to the randomization. 
For Population Based Cohort Studies, how do you select the cohort?

Any welldefined population (geographic, occupational, membership in HMOs)
Typically evaluate multiple hypotheses Primary Justification : external validity Tend to be very large, geographical, occupational, member of hmo, choosing peeps from the pop to answer a specific question Numbers needed pends on the prevalence, rare/common, 
What are the advantages of Population Based Cohort Studies?

Advantages
Estimation of distribution and prevalence of relevant exposure variables Calculation of risk factor trends over time Strong internal validity Strong external validity (primary justification) 
What are the disadvantages of Population Based Cohort Studies?

Disadvantages
Expensive, logistically complicated Often associated with large loss to followup If exposure is rare, inefficient 
Selecting the cohort, cont.
What is a sampling frame? 
Total population
Probability samples of the population The extent to which a cohort sample is representative of the total reference population depends on the completeness of the population frame available to sample from as well as participation rates. please refer to Delnevo et al article as an example. Similar to the concept of the source population 
Selecting the cohort, cont.
Population Based Cohort Studies : What are the External Validity Issues and Considerations ? 
Depends on eligibility criteria for inclusion
Initial response of the sample Stability of the cohort on followup. Requires variability of exposure and outcome levels Susceptibility of the population to the risk factor 
Selecting the cohort, cont.
What are the key points on Population Subgroups: special/exposed groups? 
May ensure the cohort has exposure of interest
Less likely to be lost to followup because of lower mobility (military, occupational cohort) May have relevant information on exposure & potential confounders in the medical and employment records Reduced ability to generalize results to the general population Access to data may be limited Generally, smaller sample sized needed 
What are the key issues when Selecting the Cohort: The nonexposed?

EXTERNAL COMPARISON GROUPS: chosen from a different source population
often general population controls from area are used must be susceptible to the same selective factors as the E group less costly if data already available sometimes called SMR studies 
What are the Levels of subject selection in selecting a cohort?

Target Population: population to which results ccan be applied
Source Population: the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn Eligible Population: the population of subjects eligible to participate Study Participants: those people you contribute data to the study 
Which direction to you choose subjects?

Downward

Which direction do you apply the results

upward

What are the important issues with cohort exposure?

Definition of exposure: intensity, duration
Induction period Latency Changing exposures Allocation of persontime in the above examples Categorizing exposure 
Cohort Exposure: important issues
What is the Induction period? 
Duration of time that it takes from exposre to onset of disease
Time during which exposure occurs ≠ time at risk Example: radiation exposure and leukemia, ~3.5 years. Exposure is classified as high, medium and low based on the amount of time working in a job where you are exposed to radiation ≠ PT at risk. What do you do with the persontime that accrues prior to the induction time? Only include time at risk of the outcome in the denominator of the rate. 
Cohort Exposure: important issues
What is the latency period? 
Duration of time from disease initiation to disease detection.
Very relevant when considering covariates such as detection bias, etc 
Cohort Exposure: important issues
What is changing exposures? 
Changing exposures:
Calculation of rates (as opposed to risks) allows flexibility in the analysis of cohort data. An individual can contribute followup to several different exposurespecific rates, depending on details of the study hypothesis The definition of the exposure group corresponds to the definition of PT eligible for each level of exposure. How to handle changing exposures depends also on cross over effects. If the effect of being exposed is cumulative, you can not change exposure groups when exposure ceases. 
Cohort Exposure: important issues
What is categorizing exposures? 
Categorizing exposures:
Often there is no guidance on appropriate categories of exposure. Or the line between exposed and unexposed is not defined No problem if your data are continuous If you want to calculate rates directly, you must observe populations within categories of exposure. Common error: apportioning to PT time, the unexposed time of a person who eventually becomes exposed. Occupational study where exposure is categorized according to duration of employment in a particular job. Highest exposure category is at least 20 years of employment. If a worker has been employed 30 years, it is a mistake to assign that employee to the 20+ years of employment because they only reached that exposure in the last 10 years. That’s the time frame relevant to 20+ years of exposure. 
Cohort Exposure: important issues
What is the measurement issue? 
Is Mold Exposure a Risk Factor for Asthma?:
It is not clear whether molds are merely markers of dampness or are causally related to the symptoms associated with dampness. Assessment of exposure to molds is often done by questionnaire. It is unknown to what extent questionnaire reports of mold growth correlate with exposure to relevant mold components . 
Cohort Exposure: important issues
What is the measurement issue continued... 
Is Mold Exposure a Risk Factor for Asthma?:
If mold exposure is quantified – Perhaps the most important problem, one that has rarely been acknowledged in the studies published to date, is that air sampling for mold for than 15 minutes is often impossible, and air concentrations usually vary a great deal over time. The few studies that have included repeated exposure measurements of mold have shown considerable temporal variation in concentrations, even over very short periods of time. Variability in isolated genera was even more substantial. 
What are the issues with cohort f/u?

Systematic, standardized data collection procedures on all or a sample of cohort members regardless of exposure status to avoid surveillance bias.
Data collection relies on primary and secondary collection procedures  linking of established dbf; professional, government, etc Controlling loss to followup is key.  baseline recruitment strategy; key identifiers for searching, adequate info to analyze differences in participation rates. Downside  longer questionnaire, interview may inhibit recruitment  mail, telephone survey,  both, monetary incentives, etc. 
What are the three types of cohort studies?

Concurrent
Retrospective Mixed Design 
What are the advantages of Cohort studies?

Advantages:
Study many different outcomes with exposure of interest Temporal relation not in doubt, therefore, preferred for causal inference Less prone to selection bias as D status does not influence selection of subjects with respect to exposure Repeated exposure data can be collected Provides information on changing risks over time Modification of risks by increasing age 
What are the disadvantages of Cohort studies?

Disadvantages:
Costly, resource intensive (manpower, time and money) Loss to followup of study subjects Inefficient for studying diseases with long latency Inefficient for studying rare diseases  example : If ID = 5 per 100,000 per year and sample size calculations say you need 100 cases, given an initial cohort of 40,000 subjects followup would have to continue for 50 years “Study effects” Changing exposure Withdrawals/loss to followup Basic design allows only 1 risk factor to be studied 
What is a casecontrol study?

An alternative to the cohort study for assessing exposuredisease relationships
Subjects chosen based on disease status and assessed for previous exposure Exposure data may be measured at the time of the study or gathered from existing data Analysis by the odds ratio (as an estimate of the relative risk) Particularly susceptible to certain types of bias which dictates design characteristics But optimized speed and efficiency. 
What are the Primary design issues?

selection of cases and controls
collection of accurate exposure data control of extraneous factors 
Rationale for CaseControl Study

Used to answer the same research question as in cohort studies:
Is the rate/risk of disease among the exposed different than that among the nonexposed? If yes, in what direction and by how much? Used as AN EFFICIENCY VERSION of a cohort study Used to estimate the IDR/CIR with the OR 
How is a c/c study efficient?

Only need a percentage of the controls, but the percenage needs to be the same in the E and nE (Sampling Fraction)

For valid casecontrol studies…

Cases must be representative of all cases in the source population – the same ones who would be considered cases if a cohort study was done.
Controls selected so that their exposure distribution reflects the exposure distribution among the person time in the source population, i.e. the same “source cohort (population)” as the cases. Both cases and controls must be selected independent of exposure status 
What is a Source Population?

The Source Population is:
The source of subjects for a particular study Defined by the participant selection methods of your study. 
Selection of Cases

Clearly define the source population
Establish strict diagnostic criteria for case definition, independent of exposure (cases really cases) Either incident or prevalent cases, but incident are ideal Can be selected crosssectionally (at a point in time) or longitudinally – longitudinally is a better choice Can use all cases within the population or a sample of the population 
Selection of Controls

Without a well defined source population, it is difficult or impossible to select unbiased controls.
Is critical and can be difficult Controls must come from the same source population that gives rise to the cases Controls must have the same exposure distribution as in the source of the cases Chosen independent of exposure status, i.e. the same sampling rate for exposed and unexposed controls If sample size is large enough, problems due to sampling error are avoided 
What is the goal of selecting cases and controls?

Goal is to choose cases and controls so that their proportion with the risk factor (E) in the study does not vary much more than sampling error from the source population.

What are the Sampling Strategies to Select Controls?

When selecting the controls we want to minimize – selection bias and maximize the potential for the OR ~~ the RR
If a disease is rare, all sampling strategies will give the same result (OR ~ IDR/CIR) If disease is common, different sampling strategies will give different results 
What are the Types of Sampling strategies?

Types of Sampling strategies:
Traditional (cumulative) sampling Casebased casecontrol study (traditional): Density Sampling  Casecontrol study within a cohort (hybrid, ambidirectional): 
Casebased casecontrol study (traditional)?

Casebased casecontrol study (traditional):
cases and controls are selected at a given point in time from a hypothetical cohort (i.e. at the end of followup). 
What are the two Casecontrol study within a cohort (hybrid, ambidirectional)?

Case cohort study:
Nested casecontrol study: 
Nested casecontrol study: controls selected at time when each case occurs (incidence density sampling)?

Nested casecontrol study: controls selected at time when each case occurs (incidence density sampling).

Case cohort study: controls are selected from the baseline cohort?

Case cohort study: controls are selected from the baseline cohort.

Describe CaseBased (Traditional):Cumulative Sampling

Typically, cases identified as diagnosed during study period from a stated source population
Controls (noncases) identified from the same source population from among the noncases at the end of the study period (cumulative sampling). Exposure to the risk factor of interest is measured/gathered OR is calculated as an estimate of the IDR/CIR Selecting controls from those diseasefree at the end of the observation period during which cases are identified. Primarily used only when the disease is rare, otherwise OR doesn’t estimate the IDR/CIR Selecting controls with this method, they do not represent the source population from which cases come, represent only noncases (although they do still come from the same source population). 
Issues with CaseBased (Traditional):Cumulative Sampling

Selection bias may occur when cases and noncases are not selected from the same source population, or populations with similar relevant characteristics.
Selection bias may occur if loss to follow that happens before the study groups are selected affect their comparability. 
What is the Bias in a casebased casecontrol study with a crosssectional ascertainment?

only cases with long survival are included.

Selection bias in a casebased casecontrol study ?

A crosssectional ascertainment identifies primarily prevalent cases, that is, those with the longest survival. Cases and controls who die before they can be included in the study may have a different exposure experience compared with the rest of the source population.
It is preferable to ascertain cases concurrently, i.e. to identify and obtain exposure information from cases as soon as possible after diagnosis. Same rules apply to controls. 
Casecontrol Studies within a Cohort?

Controls may be selected from the baseline cohort, i.e. “casecohort” design.
Controls may be selected from individuals at risk at time each case occurs, i.e. “nested casecontrol” design. Likelihood of selection bias diminished with either approach compared to casebased study design. 
. Casecontrol studies within a defined cohort: Case Cohort?

CC study conducted within the framework of existing, defined cohort, which becomes the SOURCE POPULATION
Cases are selected from the cohort (all or a sample) as they develop Controls are selected by random sample of the total cohort at BASELINE Controls have potential to become a case OR ~ CIR, no rarity assumption needed Selection bias is reduced due to control selection within the source population 
Studies within a defined cohort: Nested CaseControl?

Within framework of existing, defined cohort, the source population
Controls are a random sample of the cohort (noncases) at the time the case occurs Called INCIDENCE DENSITY SAMPLING OR RISK SET SAMPLING Matching on duration of followup Controls have the potential to become a case OR ~ IDR, no rarity assumption needed 
Incidence Density Sampling?

When a case occurs, a control (noncase) is selected (controls selected longitudinally)
“Matches” control to case based on time Controls have the potential to later become a case Ensures that controls represent the source population from which cases come Rare disease assumption not needed, OR ~ IDR/CIR for both common and rare diseases using this strategy 
EXAMPLES of Nested Casecontrol study

Example:
Levels of Maternal Serum AFP in Pregnant Women and Subsequent Breast Cancer Risk (AJE 1998;148:719727) Univ. of Ca. Berkeley Child Health & Development Studies (CHDS) 19591994 Cohort of 12,552 pregnant women Followup conducted by using license records from the department of motor vehicles, and review of death certificates Nested design Cases women in the CHDS cohort who developed breast cancer, identified through the California Cancer Registry Controls were members of the cohort who had not been diagnosed at that point in time with breast cancer Exposure assessment: Frozen sera accrued between 19591966 Data analysis: logistic regression 
What are the Advantages of CaseControl Studies within a Cohort?

The estimated exposure odds ratio is a statistically unbiased estimate of the relative risk since cases are included in the sampling frame for the selection of controls.
Efficient when need additional information (particularly detailed exposure information) that are not available for the entire cohort. 
What is the Measure of Association for a CaseControl study?

OR dis = OR exp
There is a builtin bias away from the null OR can approximate the RR in specific situations 
Rationale for CaseControl Study?

Used to answer the same research question as in cohort studies:
 Is the rate/risk of disease among the exposed different than that among the nonexposed? If yes, in what direction and by how much? USED AS AN EFFICENT VERSION OF A COHORT STUDY USED TO ESTIMATE IDR/CIR WITH THE OR 
How does the OR ~ RR?

Only used when you want to estimate RR
Rare = disease < 0.10 Most diseases are rare If controls are selected to represent the source population 
In a case cohort study OR ~ ?

CIR

In a nested casecontrol study OR ~ ?

IDR

Primary design concerns with the casecontrol design?

Selection Bias –
Information Bias – 
Selection Bias ?

can occur when cases and controls are not selected from the same source population
When selective survival occurs 
Information Bias ?

can occur when there is bias in the measurement of exposure resulting in misclassification since exposure is ascertained after disease has occurred.

Strengths of CaseControl Design?

Less expensive and time consuming than cohort design
Good for studying the etiology of rare diseases Good for studying diseases with long latency periods Possible to study many different exposures with respect to outcome of interest 
Weaknesses of CaseControl Design?

Causal inference less clear (temporal ambiguity)
Often cannot estimate the frequency of disease in a population Insufficient for studying rare exposures Particularly susceptible to both selection and information biases 
Ecologic Study?

Involves the comparison of GROUPLEVEL variables rather than comparison of INDIVIDUALlevel data.
A study that includes ecologic level (as opposed to individual level data.) They can be any design (cohort, casecontrol)  Often a geographically defined variable is used (eg: a country, a census tract, etc) We know the marginal frequencies of exposure and disease We do not know the joint distribution of exposure and disease 
Three general parameters of a study?

Data collection: levels of measurement
Levels of analysis: common level for which data on all variables are reduced and analyzed Interpretation: target levels of inference 
Data collection: levels of measurement
? 
Aggregate: means or proportions
Environmental: physical characteristics of place/individual analogue; e.g. hours of sunlight Global: attributes of groups or places/no individual analogue; e.g. healthcare system, law, population density 
Levels of analysis: common level for which data on all variables are reduced and analyzed?

Individual level
Group level Multilevel level 
Interpretation: target levels of inference?

Biologic: inference stated in terms of individual risks
Ecologic: inference stated in terms of group rates Contextual: use of both individual level with group level data to separate the effects of the 2 on each other. 
Classification of Ecologic Study Designs?

Subject grouping:
_ By place: multiplegroup design Meat Consumption and Colon Cancer: A MultiNational study _By time: timetrend design _By place and time: mixed design Example: Is hard drinking water a protective risk factor for CVD mortality?  The absolute change in CVD mortality rate between 19481964 in 83 towns. 
Rational for ecologic studies?

Low cost and convenience
Measurement limitations of individuallevel studies Design limitations of individuallevel studies; i..e. limited variability in exposure Interest in ecologic effects 
What is the ecosocial theory?

ecosocial analyses of disease distribution, population health, and health inequities.
Krieger, N. Public Health. 2008;98:221–230. doi:10.2105/AJPH.2007.111278) 
Effect Estimation?

Not able to calculate rate or risks directly due to lack of information
Regress groupspecific disease rates on group specific exposure rates __ Linear, logistic, multilevel analysis (contextual analysis, hierarchical regression), etc. 
Disadvantages of Ecologic Studie?

Ecologic bias
Temporal ambiguity  uncertainty about exposure preceding disease Collinearity  covariates more highly correlated at group level Confounder control  Adjusting for confounders may increase bias 
Ecological Fallacy?

The inability of the ecologic inference to accurately estimate the biologic effect at the individual level.
Classic example: Ecologic study assessing the relation between religion and suicide rates in Prussian communities in the late 19th century (Durkheim 1951). He found that there was a correlation between being Protestant and suicide rates Conclusion: Being Protestant is a risk factor for suicide Ecologic Fallacy: It is possible that most of the suicides w/in the communities were committed by Catholics, who were a minority and more socially isolated. 
Disadvantages of Ecologic Studies, cont’d?

Withingroup misclassification
 Nondifferential misclassification leads to bias away from null vs. toward null in individual level analysis Lack of adequate data  Data may be crude,  incomplete or unreliable Incomparable data across groups/countries Migration across groups  Can cause ecologic bias 
Uses of Ecologic Studies?

Generating individuallevel etiologic hypotheses
Appropriate when broad social or cultural factors are of interest Alternative to collecting sensitive or expensive data from individuals Testing impact of group wide interventions “ Comprehensive theoretical model of causality – one that considers all factors influencing the occurrence of disease often requires taking into account the role of upstream and ecological factors (including environmental, sociopolitical and cultural) in the causal chain”. 
Questions That Should Be Considered in Evaluating the Quality of an Ecological Study?

Would it be practical to conduct alternative ways of studying the same question (eg, cohort study, randomized control trial) or was the ecological study the only alternative?
Are the subjects in the ecological study representative of the group, place, or population of interest? Were the exposure and outcome variables measured and defined in the same or a similar way across the different populations or groups that are being studied? Have data been collected on important confounding variables that might also explain the exposure–outcome relationship and have they been statistically adjusted for? If data are not available on key factors, is it reasonable to assume that their prevalence is similar in the different groups or populations being compared? Is the identified ecological relationship between the exposure and outcomes biologically plausible and consistent which what is already known about a given topic at an individualsubject level? What is the strength of the quantitative and statistical associations between the exposure and the outcome? The stronger the associations, the greater the likelihood of a true causal relationship. Have the investigators interpreted their data with appropriate caveats? Did they acknowledge the possibility of an ecological fallacy? Were alternative explanations for the association between exposure and outcomes considered by the investigators? Have the study data been collected at multiple levels (eg, individual, physician, hospital, community, or country)? If yes, was multilevel modeling considered or used for analyzing the data? 
Sampling?

Sampling is almost always used when conducting surveys and when conducting epidemiologic studies

Sampling in Public Health Research

National Center for Health Statistics (NCHS) is the principal health statistics agency in the US (http://www.cdc.gov/nchs/about.htm).
Examples of NCHS surveys: The National Health and Nutrition Examination Survey The National Health Interview Survey Health agencies conduct surveys to assess health status of populations; guide the allocation of resources; and evaluate health policies. 
Sampling in Epidemiologic Studies

Sampling is often discussed in the context of survey research.
But sampling is an important element of all epidemiologic study designs: Selecting the sample for a crosssectional study Selecting subjects in a cohort study Selecting cases and controls in a casecontrol study Assigning subjects to study groups in a clinical t 
Basic Sampling Concepts
Populations? 
Target population: the population to which you want to apply the findings of your study (e.g. the total U.S. population, or all children in the U.S.)
Source (Survey) population: the population that is practical to include (e.g. the civilian noninstitutionalized US population). The differences between the target and survey populations need to be identified, justified, and documented. The consequences of the differences should also be assessed. in terms of Internal validity (selection bias) External validity 
Basic Sampling Concepts
Sampling Frame? 
Instrument that includes all units of the survey population – e.g. Lists, maps
Inclusion and exclusion criteria must be stated: who is eligible to be in the sample population. Incomplete or inaccurate lists can be a problem – called Coverage Error. 
Basic Sampling Concepts
Type of Sampling ? 
Probability sampling
Each member of the survey population has a chance to be selected. A random method of selection is used. The probability of selection is known. Nonprobability sampling (e.g. convenience sampling, snowball sampling) Limited ability to extrapolate from your sample to a larger population. 
Basic Sampling Concepts
Types of Probability Sampling? 
Simple random sampling
Systematic sampling Sampling with stratification Cluster and multistage sampling 
Basic Sampling Concepts
Simple Random Sampling (SRS)? 
SRS: any of the possible subsets of n distinct elements from the population of N is equally likely to be chosen.
Therefore, every element in the population has the same probability of being selected. However, not all sampling schemes in which every element has the same probability of being selected is SRS. 
Basic Sampling Concepts
SRS with and without Replacement? 
SRS without replacement offers a better precision than SRS with replacement.
If the size of the population is large relative to the sample, the difference is minimal. In largescale surveys, SRS is not used very often due to inconvenience. 
Basic Sampling Concepts
Sample Estimate and Population Parameter? 
If SRS was used, sample mean is an unbiased estimator of the population mean.
Variance of the sample mean = (1  n/N)*(sample variance / n) For large populations, sample size (n) rather than the sampling fraction (n/N) affects the precision of survey results more. 
Systematic Sampling

Systematic sampling involves taking every Kth element after a random start.
Each element in the population has the same chance of being selected, but the probabilities of different sets of elements being included in the sample are not all equal. If we were to select 3 individuals out of 10 and take every third person after a random start (1, 2, 3, or 4), what is the probability that both #1 and #2 are selected? What is the probability that both #1 and #4 are selected? For systematic sampling to take place, it is necessary to assume that the list has an approximately random order. If there is a pattern with respect to the variables of interest and the sampling interval coincides with the pattern, systematic sampling will not result in a good sample. 
Stratified Sampling
Stratification? 
First, classify the population into subpopulations (strata), based on existing information (e.g. grades in a school), and then select separate samples from each of the strata using SRS or other methods.
If the strata sample sizes are proportional to the strata population sizes (i.e. a uniform sampling fraction is used), it is known as PROPORTIONATE STRATIFICATION; otherwise, it is considered DISPROPORTIONATE STRATIFICATION 
Stratified Sampling
Example of Sampling with Stratification 
Variable of interest: # of hours/day of TV viewing in high school students.
We could take a 10% sample using SRS, or classify all students based on grades (9th – 12th) and then take a SRS sample from each grade (stratum). 
Stratified Sampling
Proportionate Stratification? 
Compared with SRS of the same size, a proportionate stratification sample with SRS in each of the strata will have a similar or smaller variance.
The gain in precision is large if the withinstrata variation is small and/or the betweenstrata variation is large. 
Stratified Sampling
Disproportionate Stratification? 
Sometimes used to allocate a sufficient sample size to specific strata so that separate estimates of adequate precision will be available for those strata.
The stratum that is given a higher sampling fraction usually has a relatively small size (e.g. minority groups in national surveys); or a relatively high variance in terms of the variable of interest. Disproportionate stratification can result in a higher variance of the sample mean than a SRS of the same size. 
Proportionate Stratification – ExampleEnvironmental Condition of N.O. Homes

The target population : all homes in New Orleans (except for specific excluded neighborhoods)
Exclusion: FEMA trailers and unoccupied housing Total study sample size = 100 Issues: sampling frame, and the sampling strategy. 
Proportionate Stratification – ExampleEnvironmental Condition of N.O. Homes conti

Stratify by neighborhood (defined by planning district), weighted by occupancy rates
Calculated as follows: proportion overnight occupancy p/n’hood (Rapid Population Estimate data) X the number of preK housing units (2000 Census) p/n’hood. That number was used to determine the proportion of total housing p/n’hood currently occupied in New Orleans. The sample size p/n’hood was weighted: total occupied Housing Units in planning district / total occupied housing units in New Orleans For example, 56% of French Quarter (FQ) was occupied (RPE). That translates to ~ 3,267 (.56 x 5881 total occupied units 2000 Census). Total occupied units in New Orleans = 64,481 (RPE). So, FQ has 5% of total occupied HU. So, 5% of the sample came from the French Quarter. Logic: The repopulation of N.O. would eventually be a function of both preKatrina occupancy patterns and the amount of devastation 
Cluster Sampling?

Groups of elements can be considered clusters (e.g. different classes in a school, different communities in a city). Often only a sample of the clusters are included in a study or survey.

Cluster and Multistage Sampling
Two step cluster sampling? 
Twostage cluster sampling: Only a sample of all the elements in selected clusters are included.

Multistage Sampling

Multistage cluster sampling: A hierarchy of clusters is used – this is sometimes loosely described as cluster sampling.

Difference between Strata and Clusters?

In sampling with stratification, all strata would be included in the final sample. We would like to have strata that are internally homogeneous and externally heterogeneous.
In cluster sampling, only a sample of the clusters will be included in the final sample. We would like each cluster to be as heterogeneous as possible. Major assumption of cluster sampling is that the cluster represents the whole. 
Precision in Cluster Sampling?

Compared with an SRS of the same size, cluster sampling often (although not all the time) leads to a loss in precision.
The justification for cluster sampling is the reduced cost (time and money). 
When to use sample size calculations?

1. BEFORE SUBMITTING A RESEARCH PROPOSAL, to estimate the approximate sample size needed to test relevant hypotheses
2. When a study is underway, to evaluate whether the planned sample size is still satisfying, given any new information 3. After a study is completed, power and sample size calculations can be conducted to help interpret your results (ie a statistically insignificant finding may be due to limited power) 
Sample Size?

Sample Size:
Calculated to choose the adequate number of subjects needed to detect a certain magnitude of effect with minimal statistical error Numerous formulas exist for calculating sample size 
Power?

Power:
Probability of identifying an effect when one truly exists (ie rejecting the null hypothesis when it is actually false, reducing type II error) 
What is the relationship of alpha when you accept Ho and Ho is true?

1alpha

When do you have a type one error?
What is this called? 
Reject Ho when Ho is true
alpha 
When do you have a type II error?
What is this called? 
Accept Ho when Ho is false
Beta 
Parameters Needed (Sample Size)?

1. Type I error (alpha):
The standard is 0.05. 2. Power (1Beta): The ability to detect a difference if one truly exists. The standard is 80%. 3. Proportion with factor (prevalance or incidence) Proportion of baseline population exposed to the factor of interest (CC study) with dichotomous outcomes. Proportion of baseline population that has the disease of interest (cohort or intervention studies) with dichotomous outcomes. 4. Magnitude of effect you want to detect: The difference in outcome rates between the two groups, the RR or OR. 
Where to get these estimates (prevalence, incidence and detectable magnitude of effect?

Previous studies
Pilot studies If none is available, investigator uses best judgment to provide a range of values or the most conservative estimates. 
Sample Size Example: Cohort studies, equal allocation?

n=((p1q1 + p2q2)*(Za +Zb)squared) / (p1  p2)*(p1p2)
where, n = # subjects in each group p1 = frequency of outcome in group 1 p2 = frequency of outcome in group 2 q1 = 1  p1 q2 = 1  p2 Zb = ((p1p2) * (sqrt (n)) / (sqrt (p1q1 + p2q2)) Za 
Example #1: Cohort study
We wish to test whether breast cancer (D) rate is increased in oral contraceptive (E) users; the 10year cumulative incidence estimate in unexposed women is 0.01 (1%) Significance level set at 0.05, twosided Power set at 90% We wish to be able to detect a doubling of risk to 0.02 How many participants will be needed? 
we’ll need 3,098 subjects in each group.
For a total of 6,196 participants 
Sample Size: CaseControl Study?

Given the proportion of controls exposed (p2) and the odds ratio predicted (OR), the proportion of cases exposed (p1) is given by:
p1 = (p2*OR)/(1+p2(OR1) 
Sample Size: CaseControl Study
Example: Casecontrol study We wish to be able to detect a doubling of risk (OR=2) associated with a factor which is present in 10% of the normal population from which the controls will be drawn. Power = 80% Significance level of 0.05, twosided How many cases and controls will we need? 
First, calculate the frequency of exposure in the cases using the frequency in controls and the odds ratio:
Second, use these to calculate n: So, we’ll need approximately 280 cases and 280 controls. 
Sample size needed increases with:
Does alpha (α) increase or decrease? 
decreases

Sample size needed increases with:
Does β error increase or decrease? 
Decreasing β error (increasing power)

Sample size needed increases with:
Does clinical significance (e.g., treatment difference you are trying to detect) increase or decrease? 
Decreasing clinical significance (e.g., treatment difference you are trying to detect)

Sample size needed increases with:
Does variability in the observed data increase or decrease? 
Increasing variability in the observed data
