Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
109 Cards in this Set
- Front
- Back
application of statistical principles in medicine, public health, or biology -collecting, summarizing, interpreting info -making inferences that appropriately account for uncertainty |
biostatistics |
|
a collection about all individuals about whom we would like to make a statement |
population |
|
a subset of the population of interest |
sample |
|
-must clearly define research q -must choose approp study design -must select sufficiently large, representative sample -must carefully collect data -must carefully summarize and examine relationships -must quantify certainty -must limit inferences to appropriate pop |
biostatistical analysis approach |
|
types of study designs (2) |
randomized studies observational studies |
|
we intervene and measure a response sometimes called analytic or experimental studies |
randomized studies |
|
we observe a phenomenon sometimes called descriptive, assoc. , nonrandomized, or correlational studies |
observational studies |
|
-detailed report of specific features of case -systematic review of common features of a small number of cases |
case report/series |
|
advantages and disadvantages of case report/series |
-advantage- cost efficient, easy to conduct-disadvantages- no comparison group, no specific research question |
|
-conducted at one point in time |
cross-sectional survey |
|
advantages and disadvantages of cross-sectional survey |
adv- cost efficient, easy to implement, ethical dis- no temporal info, nonresponse bias |
|
a study involving a group of individuals who meet inclusion criteria at the start of a study |
cohort study |
|
(concurrent; longitudinal) the individuals are enrolled and followed going forward in time |
prospective cohort study (concurrent; longitudinal) |
|
the exposure or risk factor is ascertained by looking back in time |
retrospective cohort study (non-concurrent, historical) |
|
advantages and disadvantages of a cohort study |
adv- assess temporal relationships, est and compare incidence of disease, rate at which participants free of disease develop disease dis- need large numbers for rare outcomes, cofounding (distortion of the effect of a risk factor on the outcome by other characteristics) |
|
a study involving individuals with and without outcome of interest |
case-control study |
|
advantages and disadvantages of case-control studies |
adv- cost and time efficient for rare outcomes dis- need careful selection of cases and controls, bias(misclassification, selection, recall) |
|
3 types of biases |
misclassification selection recall |
|
incorrect classification of outcome status |
misclassification |
|
the relationship between status and disease may be differently in those who chose to participate as opposed to those who did not |
selection |
|
cases and controls differentially recall exposure status |
recall |
|
types of observational study designs (4) |
case report/series cross-sectional survey cohort study case-control study (&nested case control study) |
|
randomized study designs (2) |
randomized controlled trial or clinical trial crossover trial |
|
experimental study where patients are randomized to receive one of several comparison treatments |
randomized control trial |
|
advantages and disadvantages of a randomized control trial |
adv- gold standard from stat point of view, casual inference, minimizes bias and coufounding dis- expensive, req. extensive monitoring, inclusion criteria can limit generalizability |
|
-each participant is assigned to two or more treatments sequentially -washout period in between treatments |
crossover trial |
|
advantages and disadvantages of a crossover trial |
adv- each participant is their own control dis- carry over effects |
|
-complex relationships among variables that can distort relationships between risk factors |
cofounding variables |
|
proportion of participants with disease at a particular point in time |
prevalence |
|
point prevalence equation |
# of persons with disease __________________________ # of persons examined at baseline |
|
likelihood of developing disease among persons free of disease who are at risk of developing disease |
incidence |
|
cumulative incidence equation |
# of persons who develop disease during a specified period ____________________________________________ # of persons at risk at baseline |
|
incidence rate equation |
# of persons who develop disease during a specified period _____________________________________________ sum of the lengths of time during which persons are disease - free |
|
-cumulative incidence requires complete follow-up on all participants -person-time data is used to take full advantage of available info in incidence rate -incidence rate is often expressed as an integer per multiple of participants over a specified time |
computing incidence |
|
risk difference, excess risk equation = |
prevalence(exposed) - prevalence(unexposed) |
|
population attributable risk equation = |
prevalence(overall) - prevalence(unexposed) ______________________________________________ prevalence(overall) |
|
relative risk equation = |
prevalence(exposed) ______________________ prevalence(unexposed) |
|
odds ratio equation = |
prevalence(exposed) / (1-prevalence(exposed)) ____________________________________________ prevalence(unexposed) / (1-prevalence(unexposed)) |
|
not possible to estimate relative risk in case control studies can est odds ratio bc of its invariance property |
relative risks and odds ratios |
|
two types of odds ratios in epidemiology |
disease odds ratios exposure odds ratios |
|
The odds of getting a disease, B, ifan exposure, A,is present divided by the odds ofgetting disease B ifthe exposure A is not present |
disease odds ratio |
|
The odds of getting a disease, B, ifan exposure, A,is present divided by the odds ofgetting disease B ifthe exposure A is not present |
exposure odds ratio |
|
OR as est of RR |
odds ratio will approximate relative risk when disease under study is rare, usually defined as a prevalence or cumulative incidence less than 10%. for this reason, the interpretation of an odds ratio is often taken to be identical to that of a relative risk when the prevalence or cumulative incidence is low |
|
scales of measurement (4) |
nominal scale ordinal scale interval scale ratio scale |
|
not really scales, but labels numbers, text, used to differentiate objects SSN, gender, racial groups, first generation immigrants |
nominal scale |
|
uses numbers to put objects in order no info other than more or less is available ranking, level of education |
ordinal scale |
|
contains ordinal info, but distance between scale units is always same temperature, IQ score |
interval scale |
|
contains interval info, but theres a TRUE zero point on scale -this zero point is necessary for statement to have meaning -its not valid to have measure below zero height, weigh, temperature in kelvin |
ratio scale |
|
variable types -for nominal scale -for ordinal scale -for interval and ratio scales |
nominal - dichotomous, categorical ordinal - ordinal interval/ratio- continuous(or measurement) |
|
variables have 2 possible responses (ex. yes/no) -ex gender, current smoker, and CVD -approp. forms: frequency distribution table, histogram |
dichotomous variable |
|
variables have more than 2 responses that are unordered -consider all responses -ex. race -some approp forms: frequency distribution table histogram |
categorical |
|
variables have more than 2 responses that are ordered -consider all responses -level of education some approp forms: freq. distribution table, histogram |
ordinal |
|
variables assume in theory any values between a theoretical minimum and maximum(unlimited # of distinct responses) -consider avg. and how wide rang of values -level of education -approp forms: text(and descriptive stats mean and standard deviation), histogram -aka measurement/ quantitative -ex- height & diastolic blood pressure |
continuous (or measurement) |
|
-distinct ordered option of the ordinal variable -are shown on x axis -y axis can be relative freq. or freq. -please label approp. |
histogram |
|
central or typical value of a distribution data -mean -median -mode |
central tendency |
|
the extent to which data points vary from each other -standard deviation -range |
variability |
|
other measures (2) |
quartiles interquartile range |
|
50% of values above and below |
median (aka 2nd quartile) |
|
most freq value in data |
mode |
|
average squared deviations from mean |
variance |
|
more commonly used square root of sample variance |
standard deviation |
|
max value - min value |
range |
|
interquartile range (IQR) = |
Q3 - Q1 |
|
how to get sample variance (4 steps) |
1.deviations= subtract mean from each value 2. sq. deviatiosn = square ea deviation 3 - sum of sq deviations = add sq. deviations 4. variance = divide sum of sq dev by (n-1) |
|
data value such that 25% of the observations are below it |
first quartile (q1) |
|
data value 75% of observations |
third quartile (q3) |
|
-when there are no outliers, the sample mean and standard deviation summarize central tendency/ location and variability -when there are outliers, the median and IQR summarize central tendency/ location and variability |
summarizing central tendency and variability |
|
extreme values in data below q1-1.5(IQR) above q3+1.5(IQR) |
outliers |
|
meaning of lines (top to bottom or left to right) -maximum -third quartile -median -first quartile -minimum shaded box represents 50% of data |
box and whisker plots |
|
N |
size of population |
|
n |
size of sample |
|
μ |
mean of population |
|
_ m or X |
mean of sample |
|
σ |
standard deviation of population |
|
sd or s |
standard deviation of sample |
|
σ2 |
variance in population |
|
s2 |
variance in sample |
|
Z |
z score in population |
|
z |
z score in sample |
|
P(A) |
probability of A
|
|
P(A I B) |
probability of A given B |
|
Bayes theorem |
P(A I B) = P(B I A)P(A) ___________
P(B) |
|
-model for discrete outcome -process or experiment has 2 poss outcomes: success and failure -replications of process are independent -P(success) is constant for each replication |
binomial distribution |
|
notations for binomial distribution n p x 0 |
n=# times process is replicated p- P(success) x = # of successes of interest 0 < x < n |
|
binomial distribution equation = |
p(xsuccesses) = n! ______ x p^x(1-p)^n-x
x!(n-x)! |
|
model for continuous outcome normal/gaussian distribution |
normal distribution |
|
graph when mean = median = mode |
unimodal |
|
graph when cut into 2 halves by mean/median/mode line |
symmetrical |
|
graph when it comes close to (but never touches) zero at the far left and right end (tails) |
asymptotic |
|
where: x is a raw score µ is the mean σ is the standard deviation z is the standardized score of x, whose unit is standard deviation |
standardization: z scores |
|
Z score equation = |
Z = raw score - mean ______ standard deviation |
|
value that holds a specified percentage of the distribution below it |
percentile |
|
-enumerate all members of pop. N-sampling frame -select n individuals at random- random selection: each has same probability being selected |
simple random sample |
|
-determine sample interval N/n -randomly select an integer from 1 to N/n ex k |
systematic sample |
|
-organize pop into mutually exclusive strata -each stratum has a common characteristic -select individuals at random within each stratum |
stratified sample |
|
non-probability sampling (3) |
convenience quota snowball |
|
nonprobability sample not for inference making |
convenience |
|
select a predetermined number of individuals from groups of interest |
quota sample |
|
select someone who meets the criteria ask them to recommend others who they may know |
snowball sampling |
|
Sickle-cell disease is a painful disorder of the red blood cells that in the United States affects mostly African-Americans. To investigate whether the drug hydroxyurea can reduce the pain associated with sickle-cell disease, a study by the National Institutes of Health gave the drug to 150 sickle-cell sufferers and a placebo to another 150. The researchers then counted the number of episodes of pain reported by each subject. What is the response (or "outcome") variable in this study? |
# episodes of pain |
|
describe the diff between observational study and experimental/interventional |
-experiment- treatments are deliberately imposed upon individuals an their responses are observed -observational- study w/o any attempt to influence responses, indiv. are observed and variables of interest are measured |
|
There is an Ebola epidemic in West Africa. In one of the quarantine units Dr. Wilkerson decides to pull the medical records of 5 patients with the Ebola virus and write a report on their symptoms. What type of study is this? |
case series |
|
A researcher is studying the relationship between sugar consumption and weight gain. The researcher randomly assigned twelve volunteers to one of two groups. The first group of five participants was put on a diet low in sugar and the second group of the remaining seven participants received 10% of their calories from sugar. After 8 weeks, weight gain was recorded from each participant. What type of study is this? |
randomized clinical trial |
|
A group of college students believes that herbal tea has remarkable restorative powers. To test its theory, the group makes weekly visits to a local nursing home, visiting with residents, talking with them, and serving them herbal tea. After several months, many of the residents are more cheerful and healthy. What is the explanatory variable (or "predictor") of interest in this study? |
the herbal tea |
|
An investigator wants to assess whether smoking is a risk factor for pancreatic cancer. Electronic medical records at a local hospital will be used to identify fifty patients with pancreatic cancer. One hundred patients who are similar but free of pancreatic cancer will also be selected. Each participant’s medical record will be analyzed for smoking history. Identify the type of study proposed: |
case-control |
|
An investigator wants to assess whether the use of a specific medication given to infants born prematurely is associated with developmental delay. Fifty infants who were given the medication and fifty comparison infants who were also born prematurely but not given the medication will be selected for the analysis. Each infant will undergo extensive testing at age 2 for various aspects of development. Identify the type of study proposed: |
cohort |
|
An investigator wants to assess the association between caffeine consumption and Body Mass Index (BMI). A study is planned to include seventy participants. Each participant will be surveyed with regard to their daily caffeine consumption. In addition, the weight and height of each participant will be measured. Identify the type of study design that was proposed: |
cross-sectional survey |
|
Dr Peterson wants to conduct a study on the incidence of HIV in people aged 20-50 in New York City over the next 5 years? He come to you because he needs help with his study design. He only wants to include people who are 20-50, and they cannot have HIV. Based on this information which study design is most appropriate. |
prospective cohort study |
|
A study is planned to compare two weight loss programs in patients who are obese. The first program is based on restricted caloric intake and the second is based on specific food combinations. The study will involve twenty participants and each participant will follow each program. The programs will be assigned in random order (i.e., some participants will first follow the restricted calorie diet and then follow the food combination diet, while others will first follow the food combination diet and then follow the restricted calorie diet). The number of pounds lost will be compared between diets. Identify the type of study design proposed: |
randomized cross-over |