Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/255

Click to flip

255 Cards in this Set

  • Front
  • Back
what is epidemiology?
The study of health and illness in human populations

The study of the distribution and determinants of health-related states or events in specified populations and the application of this study to control health problems
How are disease occurrences determined?
Disease occurrences are determined by factors that can be identified and measured

modification of these factors is an effective way to prevent disease or to increase survival (public health)
What do descriptive studies do?
To evaluate and describe distribution and trends in health and disease

To compare subgroups

To provide the basis for planning health services and prevention activities

To generate hypotheses

To calculate measures of disease frequency
What do analytic studies do?
To elucidate the determinants of disease

To test specific etiologic (or preventative) hypotheses

To suggest potential for health promotion or disease prevention
What are the design issues in observational studies?
Population (type and how observed)

Method of data collection

Unit of analysis

Selection of study participants

Type of outcome measure
What are the design issues with population?
How is the population observed
- Cross-sectional or longitudinal

How are they selected?

Who do they represent?
- Internal and external validity
What are the design issues with methods of data collection?
Primary vs Secondary

Ways to measure exposure and disease status
What are the design issues with unit of analysis?
Individuals
Groups
What are the 4 levels of subject selection?
Levels of subject selection:
Target Population: population to which results can be applied

Source Population: the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn

Eligible Population: the population of subjects eligible to participate

Study Participants: those people you contribute data to the study
What is the direction of subject selection?
Downward: first target pop -->source pop --> eligible pop --> study participants
What is the direction of 'participant selecton' applicatoin of results?
Upward: target pop <--source pop <-- eligible pop <-- study participants
Describe target population
population to which results can be applied
Describe source population
the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn
Describe eligible population
the population of subjects eligible to participate
Describe study participants
those people you contribute data to the study
What is the direction of 'participant selecton' applicatoin of results?
Upward: target pop <--source pop <-- eligible pop <-- study participants
Describe target population
population to which results can be applied
Describe source population
the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn
Describe eligible population
the population of subjects eligible to participate
Describe study participants
those people you contribute data to the study
What are the design issues with regard to outcome measures for cross-sectional vs. longitudinal studies?
A longitudinal study is a correlational research study that involves repeated observations of the same items over long periods of time — often many decades. It is a type of observational study.

Unlike cross-sectional studies, longitudinal studies track the same people, and therefore the differences observed in those people are less likely to be the result of cultural differences across generations. Because of this benefit, longitudinal studies make observing changes more accurate and they are applied in various other fields. In medicine, the design is used to uncover predictors of certain diseases. In advertising, the design is used to identify the changes that advertising has produced in the attitudes and behaviors of those within the target audience who have seen the advertising campaign.
What are the design issues with regard to outcome measures for continuous vs. categorical data?
three primary measurement scales. These are continuous (quantitative,
scale), ordinal (semi-quantitative, ranked), and categorical (qualitative, nominal). However, to simplify matters, we
may treat ordinal outcomes as either continuous or categorical, based on judgement and depending on whether
distributional assumptions can be met.

quantitative information consists of both quantitative data - the numbers - and categorical data - the labels that tell us what the numbers measure.

Qualitative variable are those that express a qualitative attribute such as hair color, eye color, religion, favorite movie, gender, and so on. The values of a qualitative variable do not imply a numerical ordering. Values of the variable “religion” differ qualitatively; no ordering of religions is implied. Qualitative variables are sometimes referred to as categorical variables. Values on qualitative variables do not imply order, they are simply categories. Quantitative variables are those variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size.
What are the design issues with regard to outcome measures for prevalence vs. incident data?
• Prevalence: frequency of existing cases
• Incidence: frequency of new cases
• New cases are called incident cases.
• Existing cases are called prevalent cases.
What are the differences between cross-sectional vs. case-control studies?
Cross-sectional studies (also known as Cross-sectional analysis) form a class of research methods that involve observation of all of a population, or a representative subset, at a defined time.

They differ from case-control studies in that they aim to provide data on the entire population under study, whereas case-control studies typically include only individuals with a specific characteristic, with a sample, often a tiny minority, of the rest of the population.

Both are a type of observational study. Unlike case-control studies, they can be used to describe absolute risks and not only relative risks. They may be used to describe some feature of the population, such as prevalence of an illness, or they may support inferences of cause and effect.

Longitudinal studies differ from both in making a series of observations more than once on members of the study population over a period of time.
Is epidemiology qualatative or quantitative?
quantitative
how do we go about measuring the distribution and determinants of disease?
By calculating measures of disease frequency and association
Why do we use Measures of Disease Frequency ?
Used to enumerate the occurrence of disease (or whatever your outcome is) in a specified population in a specified period of time.

The frequency of disease can be measured for either incidence or prevalent cases.

The frequency can be expressed as either a count (an absolute measure) or as a relative measure
Why do we use Measures of Association?
Reflect the strength of the statistical relationship between exposure status and disease occurrence. (example: relative risk, odds ratio)

Reflect the extra number of cases attributable to or prevented by the exposure in a particular population during a given time. (example: attributable risk)

Compares the disease frequency between 2 or more groups at different exposure levels.
How is count data useful?
Counts can be of incident or prevalent cases

Important to health planners

Used in monitoring the occurrence of disease outbreaks

This is generally the job of state and county health departments and federal agencies like the CDC.
We can count incident OR prevalent cases.
example of I and P measurement issues from WHO TB program
Often we are obliged to measure the burden of disease indirectly. The number of TB notifications varies with the iincidence of disease but unless we know the proportion of incident cases that are notified it is hard to determine the iincidence from the notification rate. Measuring the prevalence of disease is easier than measuring incidence since we only need to carry out one survey. Then if we know the average duration of disease, that is to say the time between developing disease and being cured or dying, we can make an estimate of incidence from prevalence. And of course the prevalence of infectious disease is important in its own right since it is this that determines transmission. Another way in which we might try to get an estimate of incidence is through the case-fatality rate. If we know the proportion of people who will die after they have developed TB and if have an independent measure of the total number of people who die of TB, through vital registration data, for example, we can combine these to estimate the incidence of TB.
Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?
prevalence is dependent on both incidence and duration of the disease after onset - duration is determined by either survival for fatal disease or recovery for nonfatal diseases.

In a pop in a stead-state situation the relationship between prev and disease incidence and duration can be expressed as:
(point prev/1-point prev)= I*D

The difference between I and PP is: its duration and magnitude of its PP.

When PP is relatively low (0.05 or less) than (1-PP) is almost = 1 therefore

PP=I*D
Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?
prevalence is dependent on both incidence and duration of the disease after onset - duration is determined by either survival for fatal disease or recovery for nonfatal diseases.

In a pop in a stead-state situation the relationship between prev and disease incidence and duration can be expressed as:
(point prev/1-point prev)= I*D

The difference between I and PP is: its duration and magnitude of its PP.

When PP is relatively low (0.05 or less) than (1-PP) is almost = 1 therefore

PP=I*D
Between prevalence and incidence, which measure is more suitable for measuring the burden of disease?

Part 2
the prevalence ratio is the effect measure of interest when we are interested in the public health burden of disease, although in this situation the absolute prevalence and the prevalence difference are usually of more interest. However, when we are interested in disease etiology, the POR a) estimates the incidence rate ratio with fewer assumptions than are required for the prevalence ratio; b) can be estimated using the same methods as for the odds ratio in case-control studies, namely, the Mantel-Haenszel method and logistic regression; and c) provides practical, analytical, and theoretical consistency between analyses of a prevalence study and those of a prevalence case-control study based on the same study population. For these reasons, the POR will continue to be one of the standard methods for analyzing prevalence studies and prevalence case-control studies.

Effect measures in prevalence studies

Environmental Health Perspectives, July, 2004 by Neil Pearce
Between prevalence and incidence, which measure is more suitable for use in etiologic studies?
When studying the etiology of a disease, it is better to analyse incidence rather than prevalence, since prevalence mixes in the duration of a condition, rather than providing a pure measure of risk.

Etiologic research focuses on the factors that influence the change in status
How do you measure incidence?
1. Cumulative Incidence (risk, incidence proportion)
Probability of a disease free individual developing disease over a specified time period, conditional on not dying from another cause

Probability of dying (any cause) over a specified time period, then an unconditional probability

Range 0 to 1, no dimensions

2. Incidence Density (Rate)

Instantaneous occurrence of new cases of disease at a point in time, per unit time, relative to the size of the population at risk

Range 0 to infinity, has dimensions

For either formula:
Number of new events in a defined population over a specified period of time/Population at risk for the event over the specified period of time
What are the five ways to measure cumulative incidence?
1. Directly using the Simple Cumulative Method
2. Estimated by using the Life Table approach (Actuarial)
3. Estimated by the Kaplan Meier Method (survival)
4. Approximated by the Incidence Density Method
5. Cox Regression (Proportional Hazards Model) -
When do you use Simple Cumulative Method?
Data Requirement:
No losses to follow-up (for any reason including study termination or loss from competing outcomes)

Generally, period of risk is short - But this is not mandatory

Assumes a closed population - All individuals included in the calculation are followed up for the entire period of study
What is the Simple Cumulative Method Formula?
CI(t0, t) = I/N’0

I = # events (t0, t) ;
N’0 = # disease free at t0;
t0 = start of follow up;
t = end of follow up
What is a Life Table?
- A systematic record of a group’s mortality or morbidity experience,

- over the study period which is broken up into specific time intervals.
What the basic steps of a Life Table?
- Probabilities of survival and of cumulative incidence (CI) are calculated for each interval
- Overall cumulative survival is calculated for the study period
- Lets you calculate CI over intervals
What is the Life Table Method use for?
To examine the distribution of time between 2 events
When is the Life Table Method use for?
When there are losses to follow-up ( so the simple method will not work)

When there is an extended period of risk

When exact times of disease occurrence and withdrawal are unknown

When the rate of disease varies over the follow-up period

When you are interested in knowing interval-based risk
What is the Classic Life Table's formula to calculate the probability of an event over 1 interval, (x,m)?
mqx = mdx / lx - (0.5) mcx

mqx = interval based probability of the event

mdx = number of events occurring in that interval

lx = population at risk at the beginning of the interval called the effective number at risk

mcx = number of losses (censored observations) occurring in that interval

x= time at the beginning of the interval

m= time at the end of the interval, length of time
What is the formula for Probability of SURVIVAL over 1 interval (x,m)?
mpx = 1- mqx
(1 – probability of an event)
Probability of survival over >1 interval (x,m) is obtained by calculating the joint probability of no event =
mpx = (mpx) (mpx) (mpx)

For example,
3q0 = (1p0) (2p1) (3p2)
Probability of event over >1 interval (x,m):
mqx = 1 – (mpx) (mpx)(mpx)

For example,
3q0 = 1 – (1p0) (2p1) (3p2)
What is the first step of the Life table method?
Create a synthetic fixed cohort from the open cohort

Subject No in the y-axis
Follow-up Months in the X-axis
Examples:

What is the risk of dying in interval 2?
2q1 = 1/4 – (0.5)(1) = 0.286

x=1
lx=4
mdx=1
mcx=1
What are the characteristics of the Life Table Method?
Does not require complete follow-up, i.e. it handles censoring.

Does not require the exact times at which events and withdrawals occurred.

Follow-up period of interest must be split into successive intervals (these don’t have to be equal).
What are the assumptions of the Life Table Method?
The rate of disease (after loss) of withdrawals is the same as the rate for those who remain under observation

Independence between withdrawal and survival

Disease and withdrawals occur midway through the interval, on average

Participants survive at risk for entire study period, i.e. risk is conditional on survival

No secular trend in risk
How is Kaplan-Meier's method similiar to the life table approach?
Handles loss to follow-up

Independence between withdrawal and survival

Lack of secular trends
How is Kaplan-Meier's method different to the life table approach?
No need to categorize follow-up time into intervals, intervals are based on the exact time when the event occures

Risk is estimated for the follow-up time corresponding to each case occurrence

Primary distinction: no assumption about uniformity of withdrawals
Kaplan-Meier Method: Formulas

Conditional probability of event at time i
qi = di /ni

di = number of events occurring at time i

ni = number of individuals under observation at time i
Kaplan-Meier Method: Formulas

Conditional probability of survival at time i
pi = 1 – qi
Kaplan-Meier Method: Formulas

Cumulative probability of survival at time
Si = (pi) (pi) (pi)
Kaplan-Meier Method: Formulas

Cumulative probability of event at time i
1 - Si
Describe the Kaplan-Meier Method: Survival Curve
It is common to plot the cumulative survival (Si) or risk for the entire follow-up period – called a “survival curve”
Reflects the distribution of time to occurrence of all outcome events observed during the follow-up

Si in the y-axis
Time in the x-axis
What does the Kaplan-Meier curve show?
The complements of the Kaplan-Meier estimated cumulative survival probabilities show
the CUMULATIVE RISK of recurrence throughout the study period.
What does the incidence Density Method rely on?
Relies on the mathematical relationship between rate and risk in a time period where the RATE IS CONSTANT OVER THE TIME PERIOD.

It approximates cumulative incidence (risk) using incidence rates.
In the Incidence Density Method, what is the relationship between Risk and Rate?
The simplest situation (introduced in Epid 603), when the risk is low, say <.1, and the rate is constant over the time period, the relationship is simple:

CI (approx =) Incidence RATE x specified tme period

In summary:
The population is decreasing each year as people die,

But the formula does not take this reduction into account,

therefore,

We have overestimated the death rate.

For a constant rate, the relationship is exponential, not linear.

Although the formula provides a risk that is the same throughout the follow-up period, there is actually an exponential reduction in the size of the cohort

If the 1-year risk is 10%, the 5-year risk is not 50%, (it’s something less)
What is another formula to use to calucate ID when time is short?
when the time is short enough that the rate is still constant over the time period:

CI(t0, t) = 1 – e - [incidence rate * delta]

ID = the rate over the time period

delta = the elapsed time (age) between the t0 and time t
What do we use when the rate is not constant over the entire period of interest?
the age/time stratified density method

CI(t0, t) = 1 – e [-(sum of IDj (Delta j)]

Delta j = the duration of the jth interval
IDj = Ij/PTj =the estimated ID for the jth interval
What are the assumptions of the Incidence Density Method?
Each age-specific ID is constant over the entire age interval for which that rate is estimated

Each age-specific ID is constant over calendar time (no secular trends) during relevant time frame
Describe cumulative incidence
A measure of risk – the probability of an event in a specified time period/interval

Used in predicting risk, a measure of disease frequency

Method of calculation depends on :

Length of the underlying time period of “risk’

Features of the cohort
If you have a fixed cohort, with a short period of risk, with few losses to follow-up, which Cumulative Incidence method do you use?
Simple Cumulative

fixed cohort = whent the exposure group in a cohort study represent groups that are defined at the start of f/u, with no movement of individusas between exposure groups are called fixed cohorts. if no losses occur from a fixed cohort it satisfies the close pop defination. --> in thi s situatoin the unconditioal risks and average survival time s can be measured directly.
When you have a fixed or dynamic cohort, with a long period of risk, and losses to follow up,
which Cumulative Incidence method do you use?
Life Table :
Classical or Kaplan-Meier
When you have a dynamic cohort, with usually long period of risk, and rates are available, which Cumulative Incidence method do you use?
Incidence Density

Dynamic cohort - the terms open or dynamic population describe a population in which the person-time experience can accrue from a changing roster of individuals.

closed cohorts add no new people over time and loses members only to death, open pop may gain members over time, or lose membesrs who are still alive throug emigration.
What is risk?
A risk (cumulative incidence) is a probability with no unit (although the element of time is implicit). It’s range is 0- 1.
What is rate?
A rate is a ratio where the denominator is usually person-time at risk and has a unit. The range is 0 - infinity.
What is the formulat for incidence density?
Incidence Density (ID) = I / PT (in a specified population during a specified period of time period

I = # of incident cases occurring during the time period

PT =amount of person time experienced by the population at risk during follow-up
Incidence Density RATE does what?
Measures the force (velocity) of disease occurrence change

Instantaneous occurrence of new cases of disease at a point in time, per unit time, relative to the size of the population at risk.

The text makes a distinction between incidence density (denominator is based on individual data) and incidence rate (denominator based on aggregate data). In reality, these terms are used interchangeably.

Since we cannot measure the instantaneous rate, we estimate an average rate for a given period.
What are the assumptions for Incidence Density RATE?
One unit of person-time is equivalent to any other unit.

ID obscures any fluctuation in rates over time, ASSUMES A STEADY STATE of disease occurrence so it represents an AVERAGE RATE.

IF RATES FLUCTUATE OVER TIME, ID MAY VARY for the same person-time followed, based on the actual – calendar - time of follow-up
How do you calculate person time?
Depends on how much the investigator knows about individual follow-up times

1. If duration of follow-up (from entry until withdrawal) is known for each individual, ID can be calculated directly
2. If the duration of individual follow up is not known ID can not be calculated directly but it can be estimated.
If duration of follow-up (from entry until withdrawal) is known for each individual, how do you calculate person time?
PT = sum of (delta) ti

(delta) ti = duration of follow-up for individual i
If the duration of individual follow up is not known use the average population as the denominator (aggregate data), how do you calculate person time?
(1)PT = N’((delta) t)

N’ = estimated size of population at risk (often census data or vital satistics data)
(delta)t = duration of observed follow-up

Must assume that the population is stable (constant size and age distribution)
Used to calculate rates from vital statistics data

(2) PT (average population) = (initial population +final population) / 2

Used to determine the average population of a defined cohort
Incident Density Assumptions
Independence between censoring and survival

Lack of secular trends

ID individual data ~ ID aggregate data when withdrawals and events occur uniformly (more likely to occur with large sample and short intervals)

Risk of event ~ constant over time during the interval risk: when this is not true, the follow-up can be divided in smaller intervals and ID calculated from each interval.
Total Population Vs. Population at Risk

When calculating all causes of mortality, what is the TP/PAR relationship?
TP=PAR

Incidence Density using TP in calculating person time
= true rate
Total Population Vs. Population at Risk

When the disease circumstance is low prevalence , what is the TP/PAR relationship?
TP approx = PAR

Incidence Density using TP in calculating person time approximates the true rate
Total Population Vs. Population at Risk

When the disease circumstance is high prevalence , what is the TP/PAR relationship?
TP>PAR

Incidence Density using TP in calculating person time is less than the true rate
Total Population Vs. Population at Risk

When the disease circumstance is low prevalence , what is the TP/PAR relationship?
Incidence Density using TP in calculating person time
Total Population Vs. Population at Risk

When the disease circumstance - Infectious Disease where Immunity is Common, what is the TP/PAR relationship?
TP>PAR

Incidence Density using TP in calculating person time is less than the true rate
What is the Odds?
Another measure of disease frequency

The ratio of the probability of the event / probability of no event

Can be calculated for incidence or prevalence
Odds ~ a proportion when the proportion (either CI or prevalence) is <0.1
Odds with incidence probabilities =
q / (1 – q)
Where q = probability of the event
Prevalence odds =
Prev / ( 1 – prevalence)
Iif the proportion of smokers in a population = 20% then, the odds of being a smoker is =
.20 / (1-.20) = 0.25

Another way to express the odds is 1:4 odds of being a smoker

Although there’s nothing wrong about expressing disease occurrence in terms of the odds, it is not often done in epidemiology
Describe the relationship between risk and odds
(Recall: odds ~ proportion when the proportion - either the CI or prevalence - is <0.1)

When risk = 0.80 --> odds = 4

When risk = 0.50 --> odds = 1.0

When risk = .20 --> odds = .25

When risk = .10 --> odds = .11

When risk = 0.01 --> odds = 0.0101
When do you calculate and/or report Cumulative Incidence (risk)
or Incidence Density (rate)
No element of time is in CI, but have to have element of time in ID.
What are the Measures of Association (Effect) for?
Used to determine whether an association exists between an outcome and a study factor

Reflects the strength of the STATISTICAL RELATIONSHIP between the study factor and the disease

Involves a DIRECT COMPARISON OF FREQUENCY MEASURES for different values or categories of the study factor

Involves a comparison group which is ARBITARARY AND SET BY THE INVSETIGATOR (usually considered the unexposed or least exposed)
How can we determine whether an GI illness was associated with food B, via a relative difference method?
calculate the RATIO of the attack rate between exposed and unexposed [ attack rate E / attack rate UE]
How can we determine whether an GI illness was associated with food B, via an absolute difference?
SUBTRACT the risk in the Exposed from the risk in the unexposed
[ attack rate E - attack rate UE]
Types of Measures of Association Used in Analytic Epidemiologic StudiIies

Based on Relative Differences (Ratio Measures)
Cumulative Incidence Ratio ( CIR, Risk Ratio)

Incidence Density Ratio (IDR, Rate Ratio)

Odds Ratio (OR)
Types of Measures of Association Used in Analytic Epidemiologic Studies

Based on Absolute Differences
Attributable risk in the exposed (AR E, % AR E )

Population attributable risk (PAR, % PAR)

Mean differences (continuous outcomes)
Does absolute or relative differences search for causes?
Usually, Relative differences via relative risk/rate or relative odds
Does absolute or relative differences search for determinants, provide an example.
absolute difference - mean differences (continous outcomes)
Does absolute or relative differences search for primary prevention impact; search for causes?
absolute difference - via attributable risk in exposed
Does absolute or relative differences search for primary prevention impact
absolute difference - Population attribuatable risk
Does absolute or relative differences search for impact of intervention on recurrences, case fatality, etc.
absolute difference - efficacy
What are the Types of measures base on relative differences?
Ratio Measures (Relative Risk)

Cumulative Incidence Ratio (Risk Ratio)

Incidence Density Ratio (Rate Ratio)

Odds Ratio
When looking at Cohort Study with Count Data, ie. A cohort study with dichotomous exposure categories and with all subjects followed for a fixed period of time, for Cumulative Incidence Ratio (Risk Ratio), what are the assumptions?
The distribution of CIR, (0, +∞), is not symmetric and not normal.

A log transformation is usually applied to CIR.

If Ln(CIR) has an approximately normal distribution, the mathematical characteristics of this distribution can be used to construct a confidence interval.
What is the formula for Cumulative Incidence Ratio (Risk Ratio)
CIR = CIe/CIne =

(a/a+b)/(c/c+d)
When calculating CIR: Cohort study with count data, what are you doing?
A comparison of risk estimates
Generally calculated from cohort studies based on internal comparison of the cumulative incidence (risk) of the exposed and unexposed groups
What is the defnintion of cumulative incidence/cumulative incidence rate?
The number or proportion of a group (cohort) of people who experience the onset of a health-related event during a specified time interval; this interval is generally the same for all members of the group, but, as in lifetime incidence, it may vary from person to person without reference to age.
What is the definition of the the cumulative incidence ratio?
the ratio of the cumultive incidence rate in the exposed to the unexposed.
How do you interpret the CIR?

Example:
What is the association between taking anti-malarial pills (E) and the development of malaria (D) among Peace Corps volunteers in Kenya, 1997? (1 year of follow-up, no losses, CI by the simple method)

CI(E) = 60/1000 = 0.06
CI (UE) = 10/1000 =.01
CIR = 0.06 / .01 = 6.0
Over the one year of the study, CI of developing malaria among those who took their anti-malria pills was 6.0, compared to those who didn't take their anti-malaria pills.
When looking a Incidence Density Ratio (Rate Ratio)
in a Cohort Study with Person time Data, what are you looking at?
Ratio comparison of 2 average rates

Typically calculated from a cohort study drawn from a single defined population, either fixed or dynamic

An internal comparison of incidence densities (rates) of the exposed and unexposed groups
What is the formula for Incidence Density Ratio (Rate Ratio)
IDR = IDe/IDne = (Ae/Te)/(Ane/Tne)

Pop-time (t0, t)

Ae= Disease and Exposed
Ane= Diseased and unexpoed
Te=Pop-time in exposed
Tne= Pop-time in unexposed
Who do you interpret the Incidence Density Ratio (Rate Ratio)?

Example:
Maternal alcohol consumption (E) and the development of fetal alcohol syndrome (D) in babies born in Oslo, Norway, 1990-1993

IDR = 60/6000 = 5.0
30/15000
Mothers who consume alcohol during pregnancy are 5 times as likely to have a child with Fetal Alcohol Syndrome as those who do not consume alcohol during pregnancy.
What is the formula for Incidence Density Ratio (Rate Ratio), if we assume the rates are constant for the study period?
CI (0-3) = 1 – e (-ID x 3)

For exposed: CI (0-3) = 1 – e (-.01 x 3) = .0296

Unexposed: CI (0-3) = 1 – e (-.002 x 3) = .006

Risk Ratio = .0296 / .006 = 4.93
What is the Odds Ratio?
Calculated primarily from case-control studies, but also used for cohort studies (because it’s easy to calculate with logistic regression)

Same interpretation as CIR or IDR, with 1.0 being the null value

Can calculate either the exposure odds ratio or the disease odds ratio – these are mathematically equivalent.**

Some synonyms: probability relative odds, risk relative odds

OR is a valid measure of association, but is often used to approximate the CIR/IDR in case-control studies
We’ll cover this in greater detail in our lecture on case-control studies
What is the Ratio of the odds of developing disease
odds of disease among the exposed / odds of disease among the non-exposed

OR disease = OR exposure
What is the OR exposure?
odds of exposure among diseased / odds of exposure among the non-diseased
How do you calcuate the exposure OR?
odds of exposure in diseased/odds of exposure in non-diseased
= (a:c)/(b:d) = ad/bc
How do you calcuate the disease OR?
(a:b)/(c:d) = ad/bc
OR example

Case control study of the association between aspirin consumption (E) and the development of a stomach ulcer (D)

What is the OR dis and OR exp?

How do you interpret this?
ORdis = (150/76)/(78/69) =1.75
The odds of develping a stomach ulcer among those who consume asprin is 1.75 times greater than those who don't consume asprin.

ORexp= (150/78) / (76/69) = 1.75
The odds of consuming asprin and developing a stomach ulcer is 1.75 times greater than those who dont have a stomach ulcer.
Compare the formulas for disease incidence risk to probablity odds of disease
Disease Incidence Risk:
q+ = a/a+b
q- = c/c+d

Probablity Odds of Disease
q+ / (1- q+) = a/b

q- / (1-q-) = c/d
OR vs RR in a Cohort Study with Count Data

OR is a valid measure of association, but is often used to approximate the CIR/IDR in case-control studies – Why?
Because for many it’s easier to interpret

Because it is impossible to calculate the RR with certain designs (case-control)

It’s easy to adjust an OR for confounding and can be derived from modeling (logistic regression)

OR (event) is the exact reciprocal of the OR (nonevent)
What is the OR ~ RR: The General Rule?
OR is a good approximation of the CIR/IDR when disease is rare in the population

In a case-control study, if controls are selected to represent the total population (rather than just non-cases), then OR ~ CIR/IDR without regard for disease prevalence in the population. (Miettienen, 1976)

Under certain sampling schemes, OR ~ CIR/IDR more directly, without regard for disease prevalence
Why is the OR is a valid measure of association, but is often used to approximate the CIR/IDR in case-control studies?
Odds ratio (event) is the exact reciprocal of the OR (nonevent)

Example:
Assume E = Female, D = Dead
Alive F=308
Alive M = 142
Dead F = 154
Dead M = 709

OR = 0.1

The reciprocal would be OR (alive) = 1/0.1 = 10.0

(308/154) / (142/709) = 10.0
OR ~ CIR, cont’d
Odds Ratio is biased away from the null ( in both directions)

Recall,

OR= (q+ / 1-q+)/(q- / 1-q-)

RR = q+ /q-

So, 1- q- / 1- q+ defines the built in bias between RR & OR

When disease is rare, this bias is negligible
Example of Build in bias
Example (data from text tables 3-3 and 3-4):

OR = RR x ‘built in bias’

If, RR = 6.0 and CI UE = .0030 and CI E = .0180
OR = 6.0 x [(1-.0030) / (1-.0180)] = 6.09

If, RR= 6.09 and CI UE = .0705 and CI E = .2529
OR = 6.0 x [(1- .0705) / (1-.2529)] = 7.46

If, RR= 3.59 and CI UE = .0705 and CI E = .2529
OR = 3.59 x [(1- .0705) / (1-.2529)] = 4.46
Both compare the likelihood of an event between two groups.
Example:
Who’s more likely to die, men or women?

Would you use OR or RR and are they the same?
Alive F=308
Alive M = 142
Dead F = 154
Dead M = 709

Importance of structuring data in a 2x2 table

Assume: E = Female, D = Dead
RR = (154/462) / (709/851) = 0.4

Assume E = Male, D = Dead
RR = (709/851) / (154/462) = 2.5

Assume E = Female, D = Alive
RR = (308/462) / (142/851) = 3.99

OR = RR??
Assume E = Male, D = Dead - OR = 10.0

Assume E = Dead, D = Male - OR = 10.0
Recall, OR exp = OR dis
Assume E = Female, D = Dead - OR = 0.1
What are the Strengths of Ratio Measures?
All ratio measures have the same reference point, a null value of 1.0 (no association)

Comparisons across studies of different designs can be made, because
OR ~ CIR/IDR
IDR ~ CIR

The strength of the association between a study factor and outcome is one element (an important one) used to assess causality
What are the Limitations of Ratio Measures?
May be deceptive in addressing the impact of the risk factor in assessing an individual’s risk due to an exposure

Example: Risk of lung cancer and CHD for smokers and non-smokers

A)
Lung Cancer &Smoker 0.06
Lung Cancer & NonSmoke 0.01
CHD & Smoke 0.4
CHD % Nonsmoke 0.2

B)
Lung Cancer & CIRsmoke 6.0
Lung Cancer & Excess Risk smoking 0.05
CHD & CIR smoking 6.0
CHD & Excess Risk smoking .2

Note: CIR for Lung Cancer considerably higher than that for CHD, but smoking has a greater absolute impact on CHD as demonstrated by the excess risk measure.
What are the Measures of Association: Absolute Differences (Difference Measures, Attributable Risk)?
These are measure of association between an exposure and outcome based on the absolute difference between two risk/rate estimates.

Difference measures are calculated by subtracting the frequency estimates of the reference group from the comparable estimate of the exposure group.
Measures of Absolute Difference

What are the Types of difference measures?
Attributable Risk (risk difference, excess fraction, etiologic fraction)
Attributable Risk in Exposed ( called Incidence Density Difference if using rates)

Percent Attributable Risk in the Exposed

Levin’s Population Attributable Risk
What is the Attributable Risk in the Exposed?
Difference between risk estimates of different exposure levels and a reference level

The excess, above background, associated with the exposure under study

Theoretically, the absolute excess incidence that would be prevented by eliminating the exposure.

ARexp has the same unit as the incidence measure (dimensionless if CI; time-1 if ID).
What is the forumula for Attributable Risk in the Exposed?
ARe =CIe - CIne

or

IDD = IDe - IDne
ARexp has the same unit as the incidence measure (dimensionless if CI; time-1 if ID).
Attributable Risk in the Exposed, Example

How do you interpret ARexp?
Excess risk of taking anti-malarial pills attributed to the development of malaria among Peace Corps volunteers in Kenya, 1997

ARexp = (60/1000) – (10/1000) = 0.05

5% excess risk of developing malaria among those who take anti-malarial medication versus those who do not take anti-malarial medication.
What is the Percent Attributable Risk in the Exposed?
The percent of the total risk in the exposed group attributable to the exposure, IF CAUSALITY HAS BEEN ESTABLISHED.
What are the formulas for Percent Attributable Risk in the Exposed?
Formulas:
%ARexp = (CIe - CIne /CIe) * 100
%ARexp = ((CIR* – 1.0)/CIR*) * 100
*Can use IDR or OR

%ARexp = ((IDe - IDne)/IDe) *100
What is Percent ARexp?
%ARexp is equivalent to percent efficacy when assessing an intervention such as a vaccine.
The group receiving the intervention is considered “nonexposed”, which has a lower incidence of the disease that is targeted by the vaccine.
Percent Attributable Risk in the Exposed, Example

What is the percent of the total risk of developing malaria among those who take anti-malarial medication attributable to the medication? (Peace Corp Volunteers, 1997)

%ARe = (60/1000) - (10/1000) / (60/1000) x 100

%ARe = 83.3%

How do you interpret this?
83.3% of the total risk in the group taking anti-malarial medication is attributable to taking the medication, conditional on the establishment of causality.
What is Levin’s Population Attributable Risk%?
The excess risk in the total population attributable to the exposure if a causal relationship is assumed.

Population attributable risk is a function of 2 parameters, prevalence of exposure in the population & the magnitude of the increase in incidence associated with the exposure.

It is population specific, therefore, frequency of exposure is taken into account

Cannot project the results from one population to another if the frequency of exposure differs between the populations
What are the formulas for Levin’s Population Attributable Risk%?
Pop AR% = ((CIpop - CIne)/CIpop) x 100

PopAR% = ((pe(CIR*-1)/ (pe(CIR* – 1) + 1) *100

pe: prevalence of exposure in the population

* Can use IDR or OR
When does CIpop ~ CIe?
When pe is high

Therefore, Pop AR% ~ ARexp when pe is high
When does CIpop ~ CIne?
When pe is low
% PAR: Example
Given the following data, what is percent of the total risk of CHD in the population attributable to being a current smoker?

RISK of CHDexp = 0.321
RISK of CHDnonexp = 0.162
RISK of CHD Total Pop = 0.197

How do you interpret this?
%PAR = ((.197 – 0.162)/ .197) = 17.8%

17.8% of the total risk of CHD in the population is attributable to being a current smoker.
What are the General Use of Difference Measures?
Reflect the excess number of cases attributable to the exposure

Reflect the expected effect of changing the distribution of one or more risk factors in a particular population

Useful from a public health perspective – when attributable risk is high, the risk factor is of importance to the health of the community
Which measure of association should you calculate from a study?

Ratio Versus Difference Measures

Example: British Physician’s Cohort Study, Death rate per 100,000 person years
LC=Lung cancer

LC/Smoker = 166
LC/NSmo = 7
CHD/Smo = 599
CHD/NonSmo = 422

vs

LC/IDR = 23.7
LC/Excess Rate = 159
LC %PAR - 96%
CHD/IDR = 1.4
CHD/ExRate = 177
CHD %PAR = 29%
The impact of smoking in terms of absolute excess rate is about the same for both Lung Cancer and CHD
From a public health perspective on prevention, although the IDR for lung cancer > IDR for CHD, the absolute excess is the same
Relative versus Absolute Differences
Relative differences are used most often when evaluating the DETERMINANTS of disease because they represent the magnitude of the association. This information is critical in the determination of causality.

ONCE CAUSALITY IS ASSUMED, absolute differences are more important measures from the perspective of public health administration and policy.
Incidence of Diabetes Among Dependents of the U.S. Military Forces Admitted to U.S. Army Treatment Facilities, 1971-1991

Objective
To determine the national incidence of diabetes in children by studying a group representing all parts of the country: the dependent children of U.S. Military personnel.

Research Design and Methods
Dates of admission, diagnosis of diabetes, age, and gender were collected for all 522,326 children age 21 or younger, of active duty military personnel admitted to US Army treatment facilitated during fiscal years 1971-1991. Incidence rates were expressed as cases per 100,000 person-years.

Results -
A total of 2308 cases of diabetes were diagnosed in 14.3 million person-years of follow-up over the 21 years. The overall incidence rate of diabetes in this population is 16.2 (95% CI 15.5 – 16.91) For 1987-1991, the age-specific rates were 8.1 (0-4 years), 15.9 (5-9 years, 25.6 (10-14 years), 23.9 (15-19 yrs), and 23.4 (20-21 years)

What type of study design?
What is the discriptive group?
What was their measure of ffrequency?
Is this study generalizable?
What are the 2 key problems with this study?
What type of study? Cohort – longitudinal study
Discriptive group – single group cohort
Calculated incidence
Possible limited generalizability
No case definition – DMI or DMII
Under estimate of DMI bc parents prob not in the military if had h/o DMI
What is the Epidemiologic definition of “cohort”?
Chort: A group of individuals that share a common characteristic

Example:
Birth cohort: individuals born in the same period (often year)

Exposure cohort: individuals sharing a common exposure (often an occupational exposure such as asbestos, etc.)

The observation of a cohort(s) over time to measure a stated outcome.
What are the two primary purposes of cohort studies?
Descriptive (measures of frequency)
To describe the incidence of an outcome over time or to describe the natural history of disease
Analytic (measures of association)

To analyze the relation between occurrence of outcomes and risk factors (or predictive factors)
Cohort Studies: distinguishing features

What is the Directionality?
: forward, incident cases

Recruited in healthy, but at risk for the disease of concern.
Then broken into exposure status – follow them forward and measure disease status.
When do you do prospective, longitudinal cohort studies?
sufficient EVIDENCE has accumulated from other studies to indicate a prospective cohort study is warranted

a NEW AGENT is introduced and requires monitoring for its possible association with adverse health outcomes (levaquin, vioxx)

TEMPORAL association is unclear from a case-control study

impressive results are obtained from a c-c study (either positive or negative) and issues of VALIDITY (selection or information bias) are evident or are a concern
Is Mold Exposure a Risk Factor for Asthma?

What is wrong with this statement:
A remarkably consistent association between home dampness and respiratory symptoms and asthma has been observed in
a large number of studies conducted across many geographic regions .
In a recent review of 61 studies, it was concluded that dampness was a significant risk factor for airway effects such as cough, wheeze, and asthma, with odds ratios ranging from 1.4 to 2.2.
Positive associations have been shown in infants, children, and adults, and
some evidence for dose-response relations has also been demonstrated .
Although it has been concluded that the evidence for a causal association between dampness and respiratory morbidity is strong, this evidence is based mainly on cross-sectional studies and

prevalence case-control studies;

few prospective studies have been conducted
Major Concerns for Cohort studies?
Selecting a cohort (sampling frame, sampling and recruitment, external vs internal validity)

Exposure assessment

Follow-up

Considered gold standard in observational studies
Less prone to bias bc information bias – not relying on recall
Randomized control trials overall gold standard – due to the randomization.
For Population Based Cohort Studies, how do you select the cohort?
Any well-defined population (geographic, occupational, membership in HMOs)

Typically evaluate multiple hypotheses

Primary Justification : external validity

Tend to be very large, geographical, occupational, member of hmo, choosing peeps from the pop to answer a specific question
Numbers needed pends on the prevalence, rare/common,
What are the advantages of Population Based Cohort Studies?
Advantages
Estimation of distribution and prevalence of relevant exposure variables

Calculation of risk factor trends over time
Strong internal validity

Strong external validity (primary justification)
What are the disadvantages of Population Based Cohort Studies?
Disadvantages
Expensive, logistically complicated

Often associated with large loss to follow-up

If exposure is rare, inefficient
Selecting the cohort, cont.

What is a sampling frame?
Total population
Probability samples of the population


The extent to which a cohort sample is representative of the total reference population depends on the completeness of the population frame available to sample from as well as participation rates.
please refer to Delnevo et al article as an example.

Similar to the concept of the source population
Selecting the cohort, cont.
Population Based Cohort Studies :

What are the External Validity Issues and Considerations ?
Depends on eligibility criteria for inclusion

Initial response of the sample

Stability of the cohort on follow-up.

Requires variability of exposure and outcome levels

Susceptibility of the population to the risk factor
Selecting the cohort, cont.

What are the key points on Population Subgroups: special/exposed groups?
May ensure the cohort has exposure of interest

Less likely to be lost to follow-up because of lower mobility (military, occupational cohort)

May have relevant information on exposure & potential confounders in the medical and employment records

Reduced ability to generalize results to the general population

Access to data may be limited

Generally, smaller sample sized needed
What are the key issues when Selecting the Cohort: The non-exposed?
EXTERNAL COMPARISON GROUPS: chosen from a different source population
often general population controls from area are used

must be susceptible to the same selective factors as the E group

less costly if data already available

sometimes called SMR studies
What are the Levels of subject selection in selecting a cohort?
Target Population: population to which results ccan be applied

Source Population: the population, defined in general terms and enumerated if possible, from which eligible subjects are drawn

Eligible Population: the population of subjects eligible to participate

Study Participants: those people you contribute data to the study
Which direction to you choose subjects?
Downward
Which direction do you apply the results
upward
What are the important issues with cohort exposure?
Definition of exposure: intensity, duration

Induction period

Latency

Changing exposures

Allocation of person-time in the above examples

Categorizing exposure
Cohort Exposure: important issues

What is the Induction period?
Duration of time that it takes from exposre to onset of disease

Time during which exposure occurs ≠ time at risk

Example: radiation exposure and leukemia, ~3.5 years.

Exposure is classified as high, medium and low based on the amount of time working in a job where you are exposed to radiation ≠ PT at risk.

What do you do with the person-time that accrues prior to the induction time?

Only include time at risk of the outcome in the denominator of the rate.
Cohort Exposure: important issues

What is the latency period?
Duration of time from disease initiation to disease detection.

Very relevant when considering covariates such as detection bias, etc
Cohort Exposure: important issues

What is changing exposures?
Changing exposures:
Calculation of rates (as opposed to risks) allows flexibility in the analysis of cohort data.

An individual can contribute follow-up to several different exposure-specific rates, depending on details of the study hypothesis

The definition of the exposure group corresponds to the definition of PT eligible for each level of exposure.

How to handle changing exposures depends also on cross over effects. If the effect of being exposed is cumulative, you can not change exposure groups when exposure ceases.
Cohort Exposure: important issues

What is categorizing exposures?
Categorizing exposures:
Often there is no guidance on appropriate categories of exposure.

Or the line between exposed and unexposed is not defined

No problem if your data are continuous

If you want to calculate rates directly, you must observe populations within categories of exposure.

Common error: apportioning to PT time, the unexposed time of a person who eventually becomes exposed.

Occupational study where exposure is categorized according to duration of employment in a particular job. Highest exposure category is at least 20 years of employment. If a worker has been employed 30 years, it is a mistake to assign that employee to the 20+ years of employment because they only reached that exposure in the last 10 years. That’s the time frame relevant to 20+ years of exposure.
Cohort Exposure: important issues

What is the measurement issue?
Is Mold Exposure a Risk Factor for Asthma?:

It is not clear whether molds are merely markers of dampness or are causally related to the symptoms associated with dampness.

Assessment of exposure to molds is often done by questionnaire.

It is unknown to what extent questionnaire reports of mold growth correlate with exposure to relevant mold components .
Cohort Exposure: important issues

What is the measurement issue continued...
Is Mold Exposure a Risk Factor for Asthma?:
If mold exposure is quantified –

Perhaps the most important problem, one that has rarely been acknowledged in the studies published to date, is that air sampling for mold for than 15 minutes is often impossible, and air concentrations usually vary a great deal over time.

The few studies that have included repeated exposure measurements of mold have shown considerable temporal variation in concentrations, even over very short periods of time.

Variability in isolated genera was even more substantial.
What are the issues with cohort f/u?
Systematic, standardized data collection procedures on all or a sample of cohort members regardless of exposure status to avoid surveillance bias.

Data collection relies on primary and secondary collection procedures
- linking of established dbf; professional, government, etc

Controlling loss to follow-up is key.
- baseline recruitment strategy; key identifiers for searching, adequate info to analyze differences in participation rates.
Downside - longer questionnaire, interview may inhibit recruitment
- mail, telephone survey, - both, monetary incentives, etc.
What are the three types of cohort studies?
Concurrent
Retrospective
Mixed Design
What are the advantages of Cohort studies?
Advantages:

Study many different outcomes with exposure of interest

Temporal relation not in doubt, therefore, preferred for causal inference

Less prone to selection bias as D status does not influence selection of subjects with respect to exposure

Repeated exposure data can be collected

Provides information on changing risks over time

Modification of risks by increasing age
What are the disadvantages of Cohort studies?
Disadvantages:

Costly, resource intensive (manpower, time and money)

Loss to follow-up of study subjects

Inefficient for studying diseases with long latency

Inefficient for studying rare diseases
- example : If ID = 5 per 100,000 per year and sample size calculations say you need 100 cases, given an initial cohort of 40,000 subjects follow-up would have to continue for 50 years

“Study effects”

Changing exposure

Withdrawals/loss to follow-up

Basic design allows only 1 risk factor to be studied
What is a case-control study?
An alternative to the cohort study for assessing exposure-disease relationships

Subjects chosen based on disease status and assessed for previous exposure

Exposure data may be measured at the time of the study or gathered from existing data

Analysis by the odds ratio (as an estimate of the relative risk)

Particularly susceptible to certain types of bias which dictates design characteristics

But optimized speed and efficiency.
What are the Primary design issues?
selection of cases and controls
collection of accurate

exposure data

control of extraneous factors
Rationale for Case-Control Study
Used to answer the same research question as in cohort studies:
Is the rate/risk of disease among the exposed different than that among the non-exposed? If yes, in what direction and by how much?

Used as AN EFFICIENCY VERSION of a cohort study

Used to estimate the IDR/CIR with the OR
How is a c/c study efficient?
Only need a percentage of the controls, but the percenage needs to be the same in the E and nE (Sampling Fraction)
For valid case-control studies…
Cases must be representative of all cases in the source population – the same ones who would be considered cases if a cohort study was done.

Controls selected so that their exposure distribution reflects the exposure distribution among the person time in the source population,
i.e. the same “source cohort (population)” as the cases.

Both cases and controls must be selected independent of exposure status
What is a Source Population?
The Source Population is:

The source of subjects for a particular study
Defined by the participant selection methods of your study.
Selection of Cases
Clearly define the source population

Establish strict diagnostic criteria for case definition, independent of exposure (cases really cases)

Either incident or prevalent cases, but incident are ideal

Can be selected cross-sectionally (at a point in time) or longitudinally – longitudinally is a better choice

Can use all cases within the population or a sample of the population
Selection of Controls
Without a well defined source population, it is difficult or impossible to select unbiased controls.

Is critical and can be difficult

Controls must come from the same source population that gives rise to the cases

Controls must have the same exposure distribution as in the source of the cases

Chosen independent of exposure status, i.e. the same sampling rate for exposed and unexposed controls

If sample size is large enough, problems due to sampling error are avoided
What is the goal of selecting cases and controls?
Goal is to choose cases and controls so that their proportion with the risk factor (E) in the study does not vary much more than sampling error from the source population.
What are the Sampling Strategies to Select Controls?
When selecting the controls we want to minimize – selection bias and maximize the potential for the OR ~~ the RR

If a disease is rare, all sampling strategies will give the same result (OR ~ IDR/CIR)

If disease is common, different sampling strategies will give different results
What are the Types of Sampling strategies?
Types of Sampling strategies:
Traditional (cumulative) sampling -Case-based case-control study (traditional):

Density Sampling - Case-control study within a cohort (hybrid, ambidirectional):
Case-based case-control study (traditional)?
Case-based case-control study (traditional):
cases and controls are selected at a given point in time from a hypothetical cohort (i.e. at the end of follow-up).
What are the two Case-control study within a cohort (hybrid, ambidirectional)?
Case cohort study:

Nested case-control study:
Nested case-control study: controls selected at time when each case occurs (incidence density sampling)?
Nested case-control study: controls selected at time when each case occurs (incidence density sampling).
Case cohort study: controls are selected from the baseline cohort?
Case cohort study: controls are selected from the baseline cohort.
Describe Case-Based (Traditional): Cumulative Sampling
Typically, cases identified as diagnosed during study period from a stated source population

Controls (non-cases) identified from the same source population from among the non-cases at the end of the study period (cumulative sampling).

Exposure to the risk factor of interest is measured/gathered

OR is calculated as an estimate of the IDR/CIR

Selecting controls from those disease-free at the end of the observation period during which cases are identified.

Primarily used only when the disease is rare, otherwise OR doesn’t estimate the IDR/CIR

Selecting controls with this method, they do not represent the source population from which cases come, represent only non-cases (although they do still come from the same source population).
Issues with Case-Based (Traditional): Cumulative Sampling
Selection bias may occur when cases and non-cases are not selected from the same source population, or populations with similar relevant characteristics.

Selection bias may occur if loss to follow that happens before the study groups are selected affect their comparability.
What is the Bias in a case-based case-control study with a cross-sectional ascertainment?
only cases with long survival are included.
Selection bias in a case-based case-control study ?
A cross-sectional ascertainment identifies primarily prevalent cases, that is, those with the longest survival. Cases and controls who die before they can be included in the study may have a different exposure experience compared with the rest of the source population.

It is preferable to ascertain cases concurrently, i.e. to identify and obtain exposure information from cases as soon as possible after diagnosis. Same rules apply to controls.
Case-control Studies within a Cohort?
Controls may be selected from the baseline cohort, i.e. “case-cohort” design.

Controls may be selected from individuals at risk at time each case occurs, i.e. “nested case-control” design.

Likelihood of selection bias diminished with either approach compared to case-based study design.
. Case-control studies within a defined cohort: Case Cohort?
C-C study conducted within the framework of existing, defined cohort, which becomes the SOURCE POPULATION

Cases are selected from the cohort (all or a sample) as they develop

Controls are selected by random sample of the total cohort at BASELINE

Controls have potential to become a case

OR ~ CIR, no rarity assumption needed

Selection bias is reduced due to control selection within the source population
Studies within a defined cohort: Nested Case-Control?
Within framework of existing, defined cohort, the source population

Controls are a random sample of the cohort (non-cases) at the time the case occurs

Called INCIDENCE DENSITY SAMPLING OR RISK SET SAMPLING

Matching on duration of follow-up

Controls have the potential to become a case

OR ~ IDR, no rarity assumption needed
Incidence Density Sampling?
When a case occurs, a control (non-case) is selected (controls selected longitudinally)

“Matches” control to case based on time

Controls have the potential to later become a case

Ensures that controls represent the source population from which cases come

Rare disease assumption not needed, OR ~ IDR/CIR for both common and rare diseases using this strategy
EXAMPLES of Nested Case-control study
Example:
Levels of Maternal Serum AFP in Pregnant Women and Subsequent Breast Cancer Risk (AJE 1998;148:719-727)
Univ. of Ca. Berkeley Child Health & Development Studies (CHDS)
1959-1994
Cohort of 12,552 pregnant women
Follow-up conducted by using license records from the department of motor vehicles, and review of death certificates
Nested design
Cases women in the CHDS cohort who developed breast cancer, identified through the California Cancer Registry
Controls were members of the cohort who had not been diagnosed at that point in time with breast cancer
Exposure assessment: Frozen sera accrued between 1959-1966
Data analysis: logistic regression
What are the Advantages of Case-Control Studies within a Cohort?
The estimated exposure odds ratio is a statistically unbiased estimate of the relative risk since cases are included in the sampling frame for the selection of controls.

Efficient when need additional information (particularly detailed exposure information) that are not available for the entire cohort.
What is the Measure of Association for a Case-Control study?
OR dis = OR exp
There is a built-in bias away from the null
OR can approximate the RR in specific situations
Rationale for Case-Control Study?
Used to answer the same research question as in cohort studies:
- Is the rate/risk of disease among the exposed different than that among the non-exposed? If yes, in what direction and by how much?

USED AS AN EFFICENT VERSION OF A COHORT STUDY

USED TO ESTIMATE IDR/CIR WITH THE OR
How does the OR ~ RR?
Only used when you want to estimate RR

Rare = disease < 0.10

Most diseases are rare

If controls are selected to represent the source population
In a case cohort study OR ~ ?
CIR
In a nested case-control study OR ~ ?
IDR
Primary design concerns with the case-control design?
Selection Bias –
Information Bias –
Selection Bias ?
can occur when cases and controls are not selected from the same source population
When selective survival occurs
Information Bias ?
can occur when there is bias in the measurement of exposure resulting in misclassification since exposure is ascertained after disease has occurred.
Strengths of Case-Control Design?
Less expensive and time consuming than cohort design

Good for studying the etiology of rare diseases

Good for studying diseases with long latency periods

Possible to study many different exposures with respect to outcome of interest
Weaknesses of Case-Control Design?
Causal inference less clear (temporal ambiguity)

Often cannot estimate the frequency of disease in a population

Insufficient for studying rare exposures

Particularly susceptible to both selection and information biases
Ecologic Study?
Involves the comparison of GROUP-LEVEL variables rather than comparison of INDIVIDUAL-level data.

A study that includes ecologic level (as opposed to individual level data.)

They can be any design (cohort, case-control)
- Often a geographically defined variable is used (eg: a country, a census tract, etc)

We know the marginal frequencies of exposure and disease

We do not know the joint distribution of exposure and disease
Three general parameters of a study?
Data collection: levels of measurement

Levels of analysis: common level for which data on all variables are reduced and analyzed

Interpretation: target levels of inference
Data collection: levels of measurement
?
Aggregate: means or proportions

Environmental: physical characteristics of place/individual analogue; e.g. hours of sunlight

Global: attributes of groups or places/no individual analogue; e.g. healthcare system, law, population density
Levels of analysis: common level for which data on all variables are reduced and analyzed?
Individual level

Group level

Multilevel level
Interpretation: target levels of inference?
Biologic: inference stated in terms of individual risks

Ecologic: inference stated in terms of group rates

Contextual: use of both individual level with group level data to separate the effects of the 2 on each other.
Classification of Ecologic Study Designs?
Subject grouping:

_ By place: multiple-group design
Meat Consumption and Colon Cancer: A Multi-National study

_By time: time-trend design

_By place and time: mixed design
Example: Is hard drinking water a protective risk factor for CVD mortality?
-- The absolute change in CVD mortality rate between 1948-1964 in 83 towns.
Rational for ecologic studies?
Low cost and convenience

Measurement limitations of individual-level studies

Design limitations of individual-level studies; i..e. limited variability in exposure

Interest in ecologic effects
What is the ecosocial theory?
ecosocial analyses of disease distribution, population health, and health inequities.

Krieger, N. Public Health. 2008;98:221–230. doi:10.2105/AJPH.2007.111278)
Effect Estimation?
Not able to calculate rate or risks directly due to lack of information

Regress group-specific disease rates on group specific exposure rates
__ Linear, logistic, multi-level analysis (contextual analysis, hierarchical regression), etc.
Disadvantages of Ecologic Studie?
Ecologic bias

Temporal ambiguity - uncertainty about exposure preceding disease

Collinearity - covariates more highly correlated at group level

Confounder control - Adjusting for confounders may increase bias
Ecological Fallacy?
The inability of the ecologic inference to accurately estimate the biologic effect at the individual level.

Classic example:
Ecologic study assessing the relation between religion and suicide rates in Prussian communities in the late 19th century (Durkheim 1951).

He found that there was a correlation between being Protestant and suicide rates

Conclusion: Being Protestant is a risk factor for suicide

Ecologic Fallacy: It is possible that most of the suicides w/in the communities were committed by Catholics, who were a minority and more socially isolated.
Disadvantages of Ecologic Studies, cont’d?
Within-group misclassification
- Nondifferential misclassification leads to bias away from null vs. toward null in individual level analysis

Lack of adequate data
- Data may be crude, - incomplete or unreliable
Incomparable data across groups/countries

Migration across groups
- Can cause ecologic bias
Uses of Ecologic Studies?
Generating individual-level etiologic hypotheses

Appropriate when broad social or cultural factors are of interest

Alternative to collecting sensitive or expensive data from individuals

Testing impact of group wide interventions

“ Comprehensive theoretical model of causality – one that considers all factors influencing the occurrence of disease- often requires taking into account the role of upstream and ecological factors (including environmental, sociopolitical and cultural) in the causal chain”.
Questions That Should Be Considered in Evaluating the Quality of an Ecological Study?
Would it be practical to conduct alternative ways of studying the same question (eg, cohort study, randomized control trial) or was the ecological study the only alternative?

Are the subjects in the ecological study representative of the group, place, or population of interest?

Were the exposure and outcome variables measured and defined in the same or a similar way across the different populations or groups that are being studied?

Have data been collected on important confounding variables that might also explain the exposure–outcome relationship and have they been statistically adjusted for? If data are not available on key factors, is it reasonable to assume that their prevalence is similar in the different groups or populations being compared?

Is the identified ecological relationship between the exposure and outcomes biologically plausible and consistent which what is already known about a given topic at an individual-subject level?

What is the strength of the quantitative and statistical associations between the exposure and the outcome? The stronger the associations, the greater the likelihood of a true causal relationship.

Have the investigators interpreted their data with appropriate caveats? Did they acknowledge the possibility of an ecological fallacy? Were alternative explanations for the association between exposure and outcomes considered by the investigators?

Have the study data been collected at multiple levels (eg, individual, physician, hospital, community, or country)? If yes, was multilevel modeling considered or used for analyzing the data?
Sampling?
Sampling is almost always used when conducting surveys and when conducting epidemiologic studies
Sampling in Public Health Research
National Center for Health Statistics (NCHS) is the principal health statistics agency in the US (http://www.cdc.gov/nchs/about.htm).

Examples of NCHS surveys:
The National Health and Nutrition Examination Survey
The National Health Interview Survey

Health agencies conduct surveys to
assess health status of populations;
guide the allocation of resources; and
evaluate health policies.
Sampling in Epidemiologic Studies
Sampling is often discussed in the context of survey research.

But sampling is an important element of all epidemiologic study designs:
Selecting the sample for a cross-sectional study
Selecting subjects in a cohort study
Selecting cases and controls in a case-control study
Assigning subjects to study groups in a clinical t
Basic Sampling Concepts

Populations?
Target population: the population to which you want to apply the findings of your study (e.g. the total U.S. population, or all children in the U.S.)


Source (Survey) population: the population that is practical to include (e.g. the civilian non-institutionalized US population).


The differences between the target and survey populations need to be identified, justified, and documented.

The consequences of the differences should also be assessed. in terms of
-Internal validity (selection bias)
-External validity
Basic Sampling Concepts

Sampling Frame?
Instrument that includes all units of the survey population – e.g. Lists, maps

Inclusion and exclusion criteria must be stated: who is eligible to be in the sample population.

Incomplete or inaccurate lists can be a problem – called Coverage Error.
Basic Sampling Concepts

Type of Sampling ?
Probability sampling
Each member of the survey population has a chance to be selected.
A random method of selection is used.
The probability of selection is known.

Non-probability sampling (e.g. convenience sampling, snowball sampling)
Limited ability to extrapolate from your sample to a larger population.
Basic Sampling Concepts

Types of Probability Sampling?
Simple random sampling

Systematic sampling

Sampling with stratification

Cluster and multistage sampling
Basic Sampling Concepts

Simple Random Sampling (SRS)?
SRS: any of the possible subsets of n distinct elements from the population of N is equally likely to be chosen.

Therefore, every element in the population has the same probability of being selected.

However, not all sampling schemes in which every element has the same probability of being selected is SRS.
Basic Sampling Concepts

SRS with and without Replacement?
SRS without replacement offers a better precision than SRS with replacement.

If the size of the population is large relative to the sample, the difference is minimal.

In large-scale surveys, SRS is not used very often due to inconvenience.
Basic Sampling Concepts

Sample Estimate and Population Parameter?
If SRS was used, sample mean is an unbiased estimator of the population mean.

Variance of the sample mean = (1 - n/N)*(sample variance / n)

For large populations, sample size (n) rather than the sampling fraction (n/N) affects the precision of survey results more.
Systematic Sampling
Systematic sampling involves taking every Kth element after a random start.

Each element in the population has the same chance of being selected, but the probabilities of different sets of elements being included in the sample are not all equal.

If we were to select 3 individuals out of 10 and take every third person after a random start (1, 2, 3, or 4),
what is the probability that both #1 and #2 are selected?
What is the probability that both #1 and #4 are selected?

For systematic sampling to take place, it is necessary to assume that the list has an approximately random order.

If there is a pattern with respect to the variables of interest and the sampling interval coincides with the pattern, systematic sampling will not result in a good sample.
Stratified Sampling


Stratification?
First, classify the population into subpopulations (strata), based on existing information (e.g. grades in a school), and then select separate samples from each of the strata using SRS or other methods.

If the strata sample sizes are proportional to the strata population sizes (i.e. a uniform sampling fraction is used), it is known as PROPORTIONATE STRATIFICATION; otherwise, it is considered DISPROPORTIONATE STRATIFICATION
Stratified Sampling

Example of Sampling with Stratification
Variable of interest: # of hours/day of TV viewing in high school students.

We could take a 10% sample using SRS, or classify all students based on grades (9th – 12th) and then take a SRS sample from each grade (stratum).
Stratified Sampling

Proportionate Stratification?
Compared with SRS of the same size, a proportionate stratification sample with SRS in each of the strata will have a similar or smaller variance.

The gain in precision is large if the within-strata variation is small and/or the between-strata variation is large.
Stratified Sampling

Disproportionate Stratification?
Sometimes used to allocate a sufficient sample size to specific strata so that separate estimates of adequate precision will be available for those strata.

The stratum that is given a higher sampling fraction usually has
a relatively small size (e.g. minority groups in national surveys); or
a relatively high variance in terms of the variable of interest.

Disproportionate stratification can result in a higher variance of the sample mean than a SRS of the same size.
Proportionate Stratification – Example Environmental Condition of N.O. Homes
The target population : all homes in New Orleans (except for specific excluded neighborhoods)

Exclusion: FEMA trailers and unoccupied housing

Total study sample size = 100

Issues: sampling frame, and the sampling strategy.
Proportionate Stratification – Example Environmental Condition of N.O. Homes conti
Stratify by neighborhood (defined by planning district), weighted by occupancy rates
Calculated as follows: proportion overnight occupancy p/n’hood (Rapid Population Estimate data) X the number of pre-K housing units (2000 Census) p/n’hood.
That number was used to determine the proportion of total housing p/n’hood currently occupied in New Orleans.
The sample size p/n’hood was weighted: total occupied Housing Units in planning district / total occupied housing units in New Orleans

For example, 56% of French Quarter (FQ) was occupied (RPE). That translates to ~ 3,267 (.56 x 5881 total occupied units 2000 Census). Total occupied units in New Orleans = 64,481 (RPE). So, FQ has 5% of total occupied HU. So, 5% of the sample came from the French Quarter.

Logic: The repopulation of N.O. would eventually be a function of both pre-Katrina occupancy patterns and the amount of devastation
Cluster Sampling?
Groups of elements can be considered clusters (e.g. different classes in a school, different communities in a city). Often only a sample of the clusters are included in a study or survey.
Cluster and Multistage Sampling

Two step cluster sampling?
Two-stage cluster sampling: Only a sample of all the elements in selected clusters are included.
Multistage Sampling
Multistage cluster sampling: A hierarchy of clusters is used – this is sometimes loosely described as cluster sampling.
Difference between Strata and Clusters?
In sampling with stratification, all strata would be included in the final sample. We would like to have strata that are internally homogeneous and externally heterogeneous.

In cluster sampling, only a sample of the clusters will be included in the final sample. We would like each cluster to be as heterogeneous as possible.

Major assumption of cluster sampling is that the cluster represents the whole.
Precision in Cluster Sampling?
Compared with an SRS of the same size, cluster sampling often (although not all the time) leads to a loss in precision.

The justification for cluster sampling is the reduced cost (time and money).
When to use sample size calculations?
1. BEFORE SUBMITTING A RESEARCH PROPOSAL, to estimate the approximate sample size needed to test relevant hypotheses

2. When a study is underway, to evaluate whether the planned sample size is still satisfying, given any new information

3. After a study is completed, power and sample size calculations can be conducted to help interpret your results (ie a statistically insignificant finding may be due to limited power)
Sample Size?
Sample Size:
Calculated to choose the adequate number of subjects needed to detect a certain magnitude of effect with minimal statistical error
Numerous formulas exist for calculating sample size
Power?
Power:
Probability of identifying an effect when one truly exists (ie rejecting the null hypothesis when it is actually false, reducing type II error)
What is the relationship of alpha when you accept Ho and Ho is true?
1-alpha
When do you have a type one error?

What is this called?
Reject Ho when Ho is true

alpha
When do you have a type II error?

What is this called?
Accept Ho when Ho is false

Beta
Parameters Needed (Sample Size)?
1. Type I error (alpha):
The standard is 0.05.

2. Power (1-Beta):
The ability to detect a difference if one truly exists. The standard is 80%.

3. Proportion with factor (prevalance or incidence)
Proportion of baseline population exposed to the factor of interest (C-C study) with dichotomous outcomes.
Proportion of baseline population that has the disease of interest (cohort or intervention studies) with dichotomous outcomes.

4. Magnitude of effect you want to detect:
The difference in outcome rates between the two groups, the RR or OR.
Where to get these estimates (prevalence, incidence and detectable magnitude of effect?
Previous studies

Pilot studies

If none is available, investigator uses best judgment to provide a range of values or the most conservative estimates.
Sample Size Example: Cohort studies, equal allocation?
n=((p1q1 + p2q2)*(Za +Zb)squared) / (p1 - p2)*(p1-p2)

where,
n = # subjects in each group
p1 = frequency of outcome in group 1
p2 = frequency of outcome in group 2
q1 = 1 - p1
q2 = 1 - p2

Zb = ((p1-p2) * (sqrt (n)) / (sqrt (p1q1 + p2q2)) -Za
Example #1: Cohort study

We wish to test whether breast cancer (D) rate is increased in oral contraceptive (E) users; the 10-year cumulative incidence estimate in unexposed women is 0.01 (1%)

Significance level set at 0.05, two-sided

Power set at 90%

We wish to be able to detect a doubling of risk to 0.02

How many participants will be needed?
we’ll need 3,098 subjects in each group.

For a total of 6,196 participants
Sample Size: Case-Control Study?
Given the proportion of controls exposed (p2) and the odds ratio predicted (OR), the proportion of cases exposed (p1) is given by:

p1 = (p2*OR)/(1+p2(OR-1)
Sample Size: Case-Control Study

Example: Case-control study

We wish to be able to detect a doubling of risk (OR=2) associated with a factor which is present in 10% of the normal population from which the controls will be drawn.

Power = 80%

Significance level of 0.05, two-sided

How many cases and controls will we need?
First, calculate the frequency of exposure in the cases using the frequency in controls and the odds ratio:

Second, use these to calculate n:

So, we’ll need approximately 280 cases and 280 controls.
Sample size needed increases with:
Does alpha (α) increase or decrease?
decreases
Sample size needed increases with:
Does β error increase or decrease?
Decreasing β error (increasing power)
Sample size needed increases with:
Does clinical significance (e.g., treatment difference you are trying to detect) increase or decrease?
Decreasing clinical significance (e.g., treatment difference you are trying to detect)
Sample size needed increases with:
Does variability in the observed data increase or decrease?
Increasing variability in the observed data