Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
105 Cards in this Set
- Front
- Back
Questions to ask yourself about a study.
|
How generalizabl are the findings?
What degree of clinical severity are being looked at? How matched are the cases and controls? how are the scores distributed in the study population? |
|
How do you look at the data in a study?
|
Determine what the saa looks like.
What trends are shown? What are the outliers (mathematicaly unusual scores)? What statistical tests are most appropriate? What is the typical value, central tendancy, or average response? How much spread, or variability, is there in the values? Highly variable = difficult to predict. Less variable = more predictable responses |
|
What are the main ways to display one variable, frequency distribution data?
|
Categorical Data:
*Bar chart *Pie chart Continuous Data: *Histogram *Dot plot *Box plot *Stem-and-leaf plot |
|
What are the main ways to display two variable, relationship data?
|
Categorical Data - Segmented bar chart
Continuous Data - Scatter plot |
|
What are the possible shapes of frequency distribution?
|
Modality - how many peaks does the curve have? unimodal = 1, bimodal = 2, multi-modal = >2
Is the curve semetrical or skewed? The tail end points in the direction of the skew. |
|
Arithmetric mean (average)
|
The sum of the values divided by the number of values.
|
|
Median
|
middle value, 50th percentile
|
|
Mode
|
the most common value
|
|
Geometric mean
|
antilog of the mean of the log data (for skewed data)
|
|
Weighted mean
|
when certain values carry ‘more weight’ or importance than others
|
|
If the mean = the median
|
the distribution is symmetrical
|
|
If the mean does not equal the median
|
the distribution is not symmetrical
|
|
What are the measures of central tendancy?
|
Mean
Median Mode Geometric mean Weighted mean |
|
What are the advantages of using the mean as a measure of central tendancy?
|
Uses all data
Algebraically defined |
|
What are the advantages of using the median as a measure of central tendancy?
|
Not distorted by outliers
Not distorted if skewed |
|
What are the advantages of using the mode as a measure of central tendancy?
|
Easy if categorical data
|
|
What are the advantages of using the Geometric Mean
as a measure of central tendancy? |
Transformed, same pros as mean
Good for skewed data |
|
What are the advantages of using the Weighted Mean
as a measure of central tendancy? |
Same pros as mean
Relative importance Algebraically defined |
|
What are the disadvantages of using the mean as a measure of central tendancy?
|
Distorted by outliers
Distorted if skewed |
|
What are the disadvantages of using the median as a measure of central tendancy?
|
Ignores most data
Not algebraic |
|
What are the disadvantages of using the mode as a measure of central tendancy?
|
Ignores most data
Not algebraic |
|
What are the disadvantages of using the geometric mean as a measure of central tendancy?
|
Only good if log transformation yields symmetrical distribution
|
|
What are the disadvantages of using the weighted mean as a measure of central tendancy?
|
Need good estimates of weights
|
|
Measures of variability
|
Range (minimum - maximum)
Percentiles – value of x that has n% of the observations below it is the nth percentile interquartile range = range between the 25th and the 75th percentiles or the central 50% Variance – average deviation from the mean s2 = Σ(xi – x)2/n - 1 Standard Deviation – square root of the variance s = sd = √ Σ(xi – x)2/n - 1 |
|
What are the advantages of using range to determine variability?
|
Easy to determine
|
|
What are the disadvantages of using range to determine variability?
|
Uses only 2 data pts.
Distorted by outliers Increases with sample size |
|
What are the advantages of using Inter-Quartile Range
to determine variability? |
Unaffected by outliers
Independent of sample size Good for skewed data |
|
What are the advantages of using Variance to determine variability?
|
Clumsy to calculate
Uses every data pt. Algebraic |
|
What are the advantages of using Standard Deviation
to determine variability? |
Same as variance
Units of measure = units of raw data Easy to interpret |
|
What are the disadvantages of using Inter-Quartile Range
to determine variability? |
Clumsy to calculate
Not good for small samples Uses only 2 data pts. Not algebraic |
|
What are the disadvantages of using Variance to determine variability?
|
Units of measure = square of the raw data units
Sensitive to outliers Not good for skewed data |
|
What are the disadvantages of using Standard Deviation
to determine variability? |
Sensitive to outliers
Not good for skewed data |
|
Don't forget to look at your slide for the stem and leaf plot
|
Sept 18th Lecture
|
|
Don't forget to look at your slide for the Box and whiskers plot
|
Sept 18th Lecture
|
|
Why do we care about central tendency and variation?
|
* Answers what is the typical value - is it what we expect, how does it compare to other groups?
* Will help us understand how confident we are in our point estimate * Provides guidance for what statistical test is most appropriate |
|
Refer to Mossad article on Zinc Losenges Table 1
How generalizable are the findings for a 70 year old Asian female patient? |
These findings are not generalizable to a 70 year old Asian woman.
|
|
Refer to Weaver article on Sleep Apnea Table 2
What is the degree of clinical severity in the patients in this study? |
Comparing the normal range to the range of scores exhibited by the study participants, there is a large variance in the degree of severity.
|
|
Refer to Valuck article on B12 deficiency Table
How matched are the case and controls? Difference in history of acid suppression therapy usage? |
There is a good match between the percentages of male/female, age, and therapy.
|
|
Refer to Perez article on Antidepressants Table 1
How is the HAMD, a measure of depression severity, distributed in the study population? |
The curve is skewed to the left.
|
|
What are the four main ways to measure risk?
|
* Absolute Risk Reduction (ARR)
* Number Needed to Treat (NNT) * Relative Risk (RR) * Odds Ratio (OR) |
|
Calculating Absolute Risk Reduction
|
|
|
Calculate Number Needed to Treat
|
|
|
Calculate Relative Risk
|
|
|
Calculate Relative Risk
|
|
|
Calculate Odds Ratio
|
|
|
Bias
|
A systematic difference between the results obtained by a study and the true state of affairs
|
|
Blinding (definition, single, double)
|
When the patients, clinicians, and assessors of response to treatment are unaware of of the treatment allocation (double-blind), or when the patient is aware of the treatment received but the assessor of the response is not (single-blind). Also called masking.
|
|
Blocking
|
Also called stratification. Grouping experimental units that share similar characteristics into a homogenous block or stratum.
|
|
Cohort
|
A group of individuals, all without the outcome of interest (e.g. disease), is followed (usually prospectively) to study the effect on future outcomes of exposure to a risk factor.
|
|
Controls (control group)
|
A term used in comparative studies, e.g. clinical trials, to denote a comparaison group. This group of individuals either does not have the disease or is not receiving the treatment.
|
|
Experimental study
|
The investigator intervenes in some way to affect the outcome
|
|
Geometric mean
|
A measure of location for data whose distribution is skewed to the right; it is the antilog of the arithmetic mean of the log data
|
|
Incidence
|
The number of new cases of a disease in a defined period of time divided by the number of individuals suceptible at the start or mid-period of the period
|
|
Inclusion/Exclusion criteria
|
A definition of which patients are to be recruited
|
|
Intention-to-treat
|
All patients in the clinical trial are analysed in the groups to which they were originally assigned
|
|
Interquartile Range (IQR)
|
The difference between the 25th and the 75th percentiles; it contains the central 50% of the ordered observations
|
|
Longitudinal study
|
Follows individuals over a period of time
|
|
Matching
|
A process of creating (usually) pairs of individuals who are similar in respect to variables that may influence the response of interest
|
|
Mean
|
A measure of location obtained by dividing the sum of the observations
|
|
Measures of Risk
|
EER, CER, RR, OR, ARR, RRR, and NNT
|
|
Median
|
A measure of location that is the middle value of the ordered observations
|
|
Mode
|
The value of a single variable that occurs mist frequently in a data set
|
|
Nominal
|
A categorical variable whose categories have no natural ordering
|
|
Numerical – Continuous
|
A numberical variable in which there is no limitation on the values that that variable can take other than that restricted by degree of accuracy of the measuring technique
|
|
Numerical – Discrete
|
A numberical variable that can only take integer values
|
|
Observational study
|
The investigator does nothing to affect the outcome
|
|
Ordinal
|
A categorical variable whose categories are ordered in some way
|
|
Prevalence
|
The number or proportion of individuals with a disease at a given point in time (point prevalence) or within a defined interval (period prevalance)
|
|
Primary endpoint
|
The outcome that most accurately reflects the benefit of a new therapy in a clinical trial
|
|
Randomization (random allocation)
|
Patients are allocated in a random manner
|
|
Range
|
The difference between the smallest and largest observations
|
|
Risk Factor (exposure)
|
A determinant that effects the incidence of a particular outcome e.g. disease
|
|
Secondary endpoint
|
The outcome(s) in a clinical trial that are not of primary importance
|
|
Standard deviation
|
A measure of spread equal to the square root of the variance
|
|
Surrogate endpoint
|
An endpoint measure that is highly correlated with the endpoint of interest but which can be measured more easily, quickly or cheaply than the endpoint
|
|
Variable
|
Any quantity that varies
|
|
Weighted mean
|
A modification of the arithmetic mean, obtained by attaching weights to each value of the variable in the data set
|
|
t-distribution
|
* the parameter that characterizes the t-distribution is the degrees of freedom so we can draw the probablility density function if we know the equation of the t-distribution and its degree of freedom
* its shape is similar to that of the standard normal distribution but it is more spread out with longer tails * its shape approaches normality as the degrees of freedom increase *it is particularly useful for calculating confidence intervals for testing hypotheses about one or two means |
|
the chi-squared (x2) distribution
|
* it is a right-skewed distribution taking positive values
* it is characterized by its degrees of freedom * is shape depends on the degrees of freedom. * it becomes more symetrical and approaches normailty as the degrees of freedom increases * it is particularly useful for analyzing categorical data |
|
the f-distribution
|
* it is skewed to the right
* it is defined by a ratio; the distribution ratio of two estimated variances calculated from normal data approximates the f-distribution * the two parameters which characterize it are the degrees of freedom of the numerator and denominator * the f-distribution is particularly useful for comparing two variances, and more than two means using the analysis of the variance |
|
the lognormal distribution
|
* it is the probability distribution of a random variable whose log follows normal distribution
* it is highly skewed to the right *if you take the log of the raw data (which is skewed to the right) and the result is an empirical distribution that is nearly normal, our data is lognormal distribution * many variables in medicine follow lognormal distribution * the properties of normal distribution can be used to make inferences about these variables after transforming the data by taking the logs of the raw data * if the a data set has a lognormal distribution, use the geometrica mean as a summary measure of location |
|
binomial distribution
|
* in a given situation there are only two outcomes, sucess and failure
* two parameters describe binominal distribution: n, the number of individuals int eh sample and π, the true probability for success for each individual * its mean is nπ. * its variance is nπ(1-π) * when n is small the distribution is skewed to the right if π < 0.5 and to the left is π > 0.5 * the distribution becomes more symmetrical as the sample size increases and approximates the normal distribution if both nπ and n(1-π) are >5. * use the properties of the binominal distribution when making inferences about proportions * the normal approximation to the binominal distribution is often used when analyzing proportions |
|
the Piosson distribution
|
* the poisson random variable is the count of the number of events that occur idependently and randomly in time or space at some average rate, μ,
* the parameter that describes the Poisson distribution is the mean or the average rate * the mean equals the variance in the Poisson distribution * it is a right skewed distribution if the mean is small, but becomes more symmetrical as the mean increased, when it approcimates normal distribution |
|
Why apply transformations to our raw data?
|
When the observations in our ivestigation may not comply with the requirements of the intended statistical analysis
* a variable may not be normally distributed (a distributional requirement for many different analyses) * the spread of the observations in each of a number of groups may be different (constant variance is an assumption about a parameter in the comparison of means using the unpaired t-test and anaylsis of variance * two variables may not be linearly related (linearity is an assumption of many regression analyses) It is helpful to transform our raw data to satisfy the assumptions underlying the proposed statistical techniques. |
|
typical transformations
|
* Logarithmic transformation, z=log y
* Square root transformation, z= √y * recipricol transformation, z= 1/y * square transformation, z = y2 * the logit (logistic) transformation, z = ln (p/1-p) |
|
Logarithmic transformation, z=log y
|
The effects of Logarithmic transformation are normalizing, linearizing, and variance stabilizing
|
|
Primary study
|
collect de novo (new) data to answer a specific question in a population (Example: single clinical trial)
|
|
Secondary study
|
attempt to combine or “synthesize” results (ie, existing data) from 2 or more primary studies, to generate a global/overall answer to a question (Example: meta-analysis of several clinical trials)
|
|
Probablility
|
Measures the chance of a given event occuring. It is a measure of uncertainty. It is a value from 0-1. 0 means that there is no chance that the event can occur. 1 means that the event must occur.
|
|
The Probability of the complementary event (the event not occuring) is …
|
1- the probability of the event occuring
|
|
Approaches to calculating probability
|
Subjective, frequentist, and a priori
|
|
Subjective approach to calculating probability
|
our personal degree of belief that the event will occur
|
|
Frequentist
|
the proportion of times the event would occur if we were to repeat the experiment a large number of times (tossing a coin)
|
|
A priori
|
based on a theoretical model called the probability distribution, which describes the probabilities of all possible outcomes of the experiment.
|
|
The rules of probability
|
The addition rule and the multiplication rule
|
|
The addition rule
|
If two events A and B are mutually exclusive then the probability that either one or the other will occur is equal to the sum of their probabilities. Prob (A or B) = Prob (A) + Prob (B)
|
|
The multiplication rule
|
if two events A and B are independent then the probability that both events occur is equal to the product of the probability of each. Prob (A and B) = Prob (A) x Prob (B)
|
|
Random variable
|
a quantity that can take nay one of a set of mutually exclusive values with a given probability
|
|
Probability distribution
|
shows the probabilities of all possible values of the random variable. It is a theoretical distribution that is expressed mathematically, and has a mean and a variance that is analogous to those of an empirical distribution.
|
|
Normal (Gaussian) distribution
|
One of the most important distributions in statistics. Its probability density function is: *completely described by two parameters, the mean and the variance *bell-shaped (unimodal) *symmetrical about its mean *shifted to the right if the mean is increased and to the left if decreased (assuming constant variance) *flattened as the variance is increased and more peaked as the variance is decreased (for a fixed mean) *the mean and the median of a Normal distribution are equal *the probability that a Normally distributed random variable, x, with mean u, and standard deviation SD, lies between (u-SD) and (u+SD) is 0.68 (u- 1.96SD) and (u+ 1.96SD) is 0.95 (u- 2.58SD) and (u+ 2.58SD) is 0.99. These intervals may be used to define reference intervals.
|
|
The standard normal distribution
|
The standard normal distribution has a mean of 0 and a variance of 1. If the random variable, x, has a normal distribution with mean u and variance v, then the standardized normal deviate (SND), z= (x-μ)/δ, is a random variable that has a standard normal distribution
|
|
Z Scores
|
A "Z-score" is a standardized score showing many standard deviations a subject's score is from the mean.
z= (x-μ)/δ where x = raw score μ = the mean δ = the standard deviation |
|
Perez article objective
|
Depression is a serious health problem that affects over 5% of the population. Antidepressants, SSRIs in particular, are not effective in over a third of the patients with this condition. In addition, traditional antidepressants have a slow onset of action. This study analyzed the effect of adding pindolol, a serotonin receptor and beta-adrenoceptor antagonist, to a fluoxetine antidepressant treatment.
|
|
Perez article study type
|
Experimental study
Randomized, double-blind, clinical trial |
|
Perez article outcome/findings
|
More patients responded favorably to treatment with pindolol and fluoxetine than to treatment with placebo and fluoxetine . However, there was no reduction in the onset of action.
|