Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
97 Cards in this Set
- Front
- Back
- 3rd side (hint)
Name four measures of central tendency.
|
Mean, median, mode, geometric mean.
|
|
|
Dependent and independent variables are associated with which of either outcome or predictor?
|
Dependent variables represent the outcome variable; independent variables represent the predictor. The dependent variable is cholesterol level the independent variable is the drug.
|
|
|
A variable with no inherent order examples are race and gender.
|
Nominal variables.
|
|
|
Variables with the natural order but not evenly spaced examples are stages of cancer Likert scales.
|
Ordinal variables.
|
|
|
Variables which are evenly spaced and no absolute zero exists. Examples are the temperature scale in Fahrenheit or Celsius.
|
Interval variables.
|
|
|
Variables which are true zero point represents total absence of a variable examples are age annual income and temperature and Kelvin
|
Ratio variables.
|
|
|
Name two types of categorical or discrete variables.
|
Nominal and ordinal.
|
|
|
Name two types of continuous variables.
|
Interval and ratio
|
|
|
A frequency distribution, bar graph and pie chart are ways to display which of either categorical or continuous variables?
|
Frequency distributions, bar graphs, and pie charts are ways to display categorical variables.
|
|
|
What are the two types of measures that are used on continuous variables?
|
Measures of central tendency and measures of spread/variation/dispersion.
|
|
|
The arithmetic mean is commonly known as an average. What is the measure of a mean sensitive to?
|
The arithmetic mean is sensitive to extreme values.
|
|
|
The median is the middle value of an odd number of sorted data. In the following even number of data what is the median? 4 4 6 9 12 16
|
Take the mean of the two middle numbers. (6+9)/2 = 7.5.
|
|
|
The median is not sensitive to what?
|
Extreme values.
|
|
|
The mode is most commonly occurring value is in more resistant to:
|
Extreme values.
|
|
|
A bimodal distribution has how many modes?
|
Two.
|
|
|
If you take the log of each value and find the mean of all those log values you have calculated what?
|
The geometric mean. It is used to describe a skewed distribution and can only be used for positive values.
|
|
|
In a right or positive skewed distribution the tail is on the right or left? And the mean is greater or less than the median?
|
The tail is on the right and the mean is greater than the median. The opposite is true for left or negative skew.
|
|
|
Named the five measures of spread/variation/dispersion.
|
Range, interquartile range, variance, standard deviation, coefficient of variation.
|
|
|
Of the five measures of dispersion, the difference between the minimum and maximum values and provides no information about scatter within the series is the definition of:
|
Range.
|
|
|
Of the five measures of dispersion, which one examines the difference between individual data points and the mean?
|
Variance.
|
|
|
Of the five measures of dispersion, first find the median than calculate the mean of the upper and lower halves defines:
|
Interquartile range.
|
|
|
Of the five measures of dispersion, the most useful measure of dispersion and is the square root of variance defines:
|
Standard deviation.
|
|
|
Of the five measures of dispersion,the ratio of the SD to the Mean times 100 defines:
|
Coefficient of Variation. It is used to compare the spread in two sets of data. For instance, weight and height. If 50 people have a mean weight of 150lbs and as SD of 28lbs and the same people have a mean height of 66 in and a SD of 6 in then 28/150 x100 = 18.7% or CV for weight and 6/66 x 100 = 9.1% or CV for height. This tells you that the spread of weight is greater than the spread of height among these people.
|
|
|
In a histogram what is represented on the Y axis
|
Frequencies. The data bins are on the X axis.
|
|
|
On what diagram can you represent the mean, median, range, standard deviation, and interquartile range.
|
Box plot
|
|
|
The extremes of the box represent:
|
Interquartile range
|
|
|
The line through the box represents:
|
Median
|
|
|
The plus symbol in the box represents:
|
Mean
|
|
|
The whiskers can variously represent:
|
The range of the data, the standard deviation, 9TH and 91st percentiles.
|
|
|
What is the symbol for the population number, mean, and standard deviation?
|
N, mu, sigma
|
|
|
What is the symbol for the sample number, mean, standard deviation?
|
n, x-bar, s or SD
|
|
|
What do the following features describe: 1) every unit in the population had the same probability of being selected 2) chance alone determines whether a particular unit in the population is selected for the sample 3) allows estimation or inference of the nature of the population without having to observe the entire population.
|
Random sample.
|
|
|
Name four types of random samples
|
Simple, stratified, cluster, systematic. SSCS
|
|
|
Describe a simple random sampling
|
Each unit n in the population N has an equal chance of being selected.
|
|
|
Describe a stratified random sampling
|
Population is divided into groups (by gender, location, age) and a random sample is taken from each group.
|
|
|
Describe a cluster random sampling
|
A random sampling of groups is made and all in that group are in the sample. for instance randome schools are chosen in a state and all the children in each chose school are sampled.
|
|
|
Describe systematic random sampling
|
a method such as every 10th member or day of the week is used to select from the population.
|
|
|
What is another name for a probability distribution.
|
Normal distribution.
|
|
|
In a standard normal distribution what is the value of mu (population mean) and sigma (population standard deviation)?
|
mu = 0, and sigma = 1
|
|
|
The total area under the curve of a normal distribution equals what value?
|
1
|
|
|
What three measures equal each other in a normal distribution
|
Mean = median = mode
|
|
|
What value has changed when the normal distribution shifts either left or right?
|
A change in the mean will shift the normal distribution either left or right.
|
|
|
What values change when the normal distribution becomes flatter or more peaked?
|
A change in the standard deviation
|
|
|
What does the area under the curve represent?
|
Probability
|
|
|
Two standard deviations about the mean represent what percentage of the distribution
|
0.9545
|
|
|
Exactly 95% of the distribution falls within plus or minus ____ standard deviations of the mean?
|
1.96
|
|
|
A ___ table gives the probability of a random outcome variable (or one random sample value) being greater or less than a chosen number or cut point.
|
Z. The probability of getting the value, Z, is the area in one or both tails and is derived from the body of the Z-table (standard normal distribution)
|
|
|
Describe how a Z-score is used.
|
A z-score reflects how many standard deviations above or below the population mean a raw score is.
For instance, a group of patients had a mean blood pressure of mu = 120 and a standard deviation sigma = 10, what is the probability of randomly selected patient having a BP of greater than x = 135? You must convert this nl distribution to the std nl distribution and obtain a z –score using z = (x-mu)/sigma. In this case a z-score of (135-120)/10 = 1.5. You then take your z-score 1.5 (which indicates that the value 135 is 1.5 standard deviations above the mean) and get from the Z table that the probability of getting this value or greater is 0.0668. |
|
|
What does the following statement define: means of multiple samples from a population tend to have a normal distribution regardless of how the data are distributed. This tendency increases as the number of observations in the samples increase
|
The central limit theorem
|
|
|
What is the difference between standard deviation and standard error of the mean
|
Use standard deviation when you're interested in showing the scatter of the data. Use the standard error of the mean when you want to show how well you know the mean (precision of the mean)
|
|
|
What is the formula for standard error of the mean
|
SEM = SD/sq rt of n
|
|
|
What is the formula for calculating the 95% confidence interval
|
Xbar +/-1.96(SD/sq rt of n)
|
|
|
Often one wants to increase the precision of the their estimate by decreasing the confidence interval. What three things can do in the formula to increase precision or decrease the CI
|
Decrease the alpha level, decrease the standard deviation, or increase the n.
|
|
|
The different types of probability distributions are divided into two groups. There are probability distributions for discrete data and probability distributions for continuous data. What are two common types of probability distributions for discrete and five common types ofr continuous data?
|
Discrete: binomial, poisson. Continuous: normal, standard normal (Z dist.), Student's T (T dist.), F distribution and Chi^2.
|
|
|
A classic example for a binomial distribution is a coin flip. Each trial must result in:
|
one of two possible mutually exclusive outcomes: success or failure. Furthermore, the events must be independent of each other.
|
|
|
A poisson distribution is for a. continuous or b. categorical variables
|
Poisson is for categorical variables. With Poisson, think counts, think rare discrete events, think Mean = Variance
|
|
|
What is the shape of the T distribution?
|
It is like the nl distribution only flatter.
|
|
|
What happens to the shape of the T. distribution as the degrees of freedom increase?
|
The distribution approaches the normal or z. the curve becomes less spread out and more peaked to match the nl distribution
|
|
|
In hypothesis testing the groups that can be compared are two different means or two different:
|
You typically test sameness or difference of Means or Proportions. mu = mu or p = p.
|
|
|
The null hypothesis states that there is no difference between the two groups compared. If we fail to reject (accept the null hypothesis) than any difference between the groups is due to:
|
Chance alone
|
|
|
The alternative hypothesis states that there is a difference between the two groups compared and that factors other than chance alone account for the:
|
Difference
|
|
|
If the P. value is greater than 0.05 then you fail to reject the null hypothesis and conclude that there is:
|
No difference between the groups
|
|
|
If the P. value is less than 0.05 then you reject the null hypothesis and conclude that the difference is
|
Due to some other factor and not from chance alone
|
|
|
An alpha error is associated with which type of error
|
Type I error
|
|
|
A beta error is associated with which type of error
|
Type II.
|
|
|
If you fail to reject the null, or your p-value is greater than 0.05 and you stated that there was no difference when there actually was a difference between the groups which type of error had been made?
|
This is a beta or type II error and is usually assoicated with a low power study or your n is too small (sts).
|
|
|
If you reject the null and your p-value is less than 0.05 and you state that there is a difference when there truly is no difference between the groups which type of error had been made?
|
This is the type I or alpha error
|
|
|
Power is equal to one minus beta. Beta is equal to the probability of failing to reject a true alternative hypothesis (incorrect). One minus beta is equal to the probability of accepting a true alternative hypothesis (correct)
|
Power is the ability of the study to detect the true difference of specified size that actually exists
|
|
|
In a good study your alpha . and beta are as small as possible
|
A good alpha value is 0.05 a good beta value is 0.20 or power is equal to 0.80
|
|
|
deleted question
|
|
|
|
The chi-squared distribution is a distribution is skewed to the right and approaches a symmetric distribution as the degrees of freedom increase
|
The shape of the F. test for ANOVA depends on the degrees of freedom and the number of groups being compared
|
|
|
When using confidence intervals for inference remember that if 95% confidence interval does not include the null value or 1 then reject the null hypothesis
|
If the 95% confidence interval does include the null value or 1 then you cannot reject the null hypothesis
|
|
|
What is the difference between parametric and non-parametric tests
|
Parametric statistics assume that data have come from a type of known probability distribution and makes inferences about the parameters of the population distribution. Non-Parametric statistics do not depend on a known distribution or known parameter (such as mean or SD). These tests are generally more complicated but can minimize the affect of outliers and assumptions such as the central limit theorem that may not apply to your data.
|
|
|
HYPOTHESIS TESTING. 3 quick associations:
|
1) Continuous data ~ Means and SDs ~ T-tests, ANOVA, pearson's correlation, linear regression.
2) Categorical data ~ ratios and proportions ~ Chi^2, Fisher's exact, logistic regression. 3) Non-Parametric tests (do not follow the normal or other probability distribution; "distribution free"; based on ranks) ~ Wilcoxon Rank Sum, Wilxoxon Signed Rank, Kruskal-Wallis, Spearman's Correlation |
Other associations:
Two Sample t-test ~ Wilcoxon Rank Sum Paired t-test ~ Wilxoxon Signed Rank (WSR, "Pair of Wizards") ANOVA ~ Kruskal-Wallis |
|
When choosing a hypothesis test statistic what are 4 factors to consider: (look for these features in the board question stems)
|
TYPE OF DATA VARIABLE:
1. Continuous 2. Categorical 3. Survival. PURPOSE OF THE TEST: 1. Comparing means 2. Comparing proportions 3. Comparing survival curves DISTRIBUTION OF THE OUTCOME 1. Normal or key distribution (parametric method) 2. Non-normal distribution (nonparametric method) INDEPENDENT OR DEPENDENT OUTCOME 1. Independent outcome: each patient contributes an outcome or is tested only once. 2. Dependent outcome: each patient contributes more than one outcome or is tested more than once. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: T-DIST: (Parametric method) ONE-SAMPLE T-TEST. |
PURPOSE:
Compare sample mean to known population mean. DISTRIBUTION: T-distribution but approaches the nl dist when sample >30 NULL HYPOTHESIS: Sample mean = Pop mean INTERPRETATION: When calculated T-value is greater than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is out in the tail and you reject the null and say that there is a difference. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: T-DIST: (Parametric method) TWO-SAMPLE T-TEST. |
PURPOSE:
Compare sample mean of two different populations for sameness or difference. DISTRIBUTION: T-distribution but approaches the nl dist when sample >30 NULL HYPOTHESIS: group 1 mean = group 2 mean INTERPRETATION: When calculated T-value is greater than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is out in the tail and you reject the null and say that there is a difference. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: T-DIST: (Parametric method) PAIRED T-TEST. |
PURPOSE:
Compare mean of one population measured before and after exposure to a variable (example: drug treatment) DISTRIBUTION: T-distribution but approaches the nl dist when sample >30 NULL HYPOTHESIS: Pop mean before = Pop mean after INTERPRETATION: When calculated T-value is less than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is less than your critical value and under the curve, not out in the tail. So, you would fail to reject the null and say that there is no difference. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: F-DIST: (Parametric method) ANOVA. - Comparison of three or more variables (like two sample T-test except with more variables) |
PURPOSE:
Compare means of three or more populations DISTRIBUTION: F-distribution but approaches the nl dist when sample >30 NULL HYPOTHESIS: All means equal each other. The alternative hypothesis is that at least one mean is different than another. INTERPRETATION: Determines if there is at least one difference between means. Follow up with post hoc tests pairing population means to discover which means differ. Use the F-statistic to make the decision. |
|
|
How is ANCOVA different than ANOVA
|
Its purpose is the same as ANOVA except that it can adjust for covariants like age, smoking, race etc.
|
|
|
TEST OF CORRELATION:
CONTINUOUS DATA: NL DIST: (Parametric method) PEARSON'S CORRELATION |
PURPOSE:
Quantify the association between two different continuous variables DATA: Used when both X and Y variables have normally distributed histograms. DISTRIBUTION: Nl distribution INTERPRETATION: The goal of correlation analysis to understand the nature (positive or negative) and the strength (High r value or low r value). Two independent variables, X. and Y., are plotted against each other. Correlation is denoted as r. The range of r is -1 to 1. example: drowning deaths are most frequent during the months when ice cream sales are their highest. 0.75 to 1 equals high positive correlation. 0.50 to 0.57 is a moderately high positive correlation. 0.25 to 0.50 is a moderately little positive correlation. Zero 20.25 is a low positive correlation Correlation does not equal causation. By eating ice cream you are more likely to get cramped and drowned while swimming. OR when people drown the mourners are more likely to console themselves by eating ice cream. |
|
|
NON-PARAMETRIC TEST PROPERITES
|
DATA:
Continuous data. DISTRIBUTION: no distribution assumption USEFUL FOR: small sample sizes, not normally distributed data, ordinal data, nonparametric tests are less powerful than parametric tests. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: NON-NORMAL DIST: (Non-Parametric method) WILCOXON RANK SUM TEST |
PURPOSE:
Rank all the oucomes in order from lowest to hightest. Then compare the sum of the ranks in group 1 vs the sum of the ranks in group 2. COMPARED TO: Two sample T-test DISTRIBUTION: Non-normally distributed HYPOTHESIS: Determine if one group is greater than another (one-tailed) or if the groups are different (two -tailed). INTERPRETATION: Determines if there is at least one difference between means. Follow up with post hoc tests pairing population means to discover which means differ. Use the F-statistic to make the decision. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: NON-NORMAL DIST: (Non-Parametric method) WILCOXON SIGNED RANK TEST |
PURPOSE:
Compare one population before and after an intervention. Sum the ranks between all positive and all negative differences and compare the positive difference to the negative difference. COMPARED TO: Paired T-test (Mnemonic: Pair of McNemar Wizards WSR) DISTRIBUTION: Non-normally distributed HYPOTHESIS: Null hypothesis is that there is no difference between measured outcomes before and after intervention. INTERPRETATION: If positive sum is very different from the negative sum than the p-value will be low and you can reject the null if it is less than 0.05. |
|
|
TEST OF HYPOTHESIS:
CONTINUOUS DATA: NON-NORMAL DIST: (Non-Parametric method) SPEARMANS'S CORRELATION |
PURPOSE:
Quantify the association between two different continuous variables when X and Y are not normally distributed variables. DISTRIBUTION: Non-normal distribution - (non-parametric) INTERPRETATION: The goal of correlation analysis to understand the nature (positive or negative) and the strength (High r value or low r value). Two independent variables, X. and Y., are plotted against each other. Correlation is denoted as r. The range of r is -1 to 1. 0.75 to 1 equals high positive correlation. 0.50 to 0.57 is a moderately high positive correlation. 0.25 to 0.50 is a moderately little positive correlation. Zero 20.25 is a low positive correlation Correlation does not equal causation. |
|
|
CONTINUOUS DATA:
LINEAR REGRESSION |
PURPOSE:
Predict outcome, Y (dependent var) from predictor, X, (independent var) TYPES: Simple linear - predicts outcome from one independent var Multiple linear - predicts outcome from multiple independent var. |
|
|
What is the linear regression equation:
|
Y= B0 +B1(X1) (1 is subscript)
B0= y intercept B1 = slope X = Independant var (covariate) Y = mX + B Y = dependent var m = slope X = independent var (covariate) B = y-intercept Y = Dependent var In multiple linear regression you add additional var B2(X2) + B3(X3) |
|
|
The errors or residuals of a fitted line is:
|
the distance between the ploted point and the fitted line
|
|
|
TEST OF HYPOTHESIS:
CATEGORICAL DATA: (dichotomous) DISTRIBUTION: CHI2 (Non-Parametric method) CHI SQUARED TEST |
PURPOSE:
Comparison of proportions of two or more samples NULL HYPOTHESIS: proportion 1 is equal to proportion 2 INTERPRETATION: Calculate chi squared test value Compare to chi squared critical value for p of 0.05. if larger than critical value (in the tail) than fail to reject the null (accept the null) and conclude there is no difference between proportions. |
|
|
TEST OF HYPOTHESIS:
CATEGORICAL DATA: (dichotomous) DISTRIBUTION: CHI2 (Non-Parametric method) CHI SQUARED TEST |
PURPOSE:
It is applied to 2 × 2 contingency tables with a dichotomous trait, with MATCHED PAIRS of subjects, to determine if the proportion of success from treatement 1 is different from the proportion of success from treatment 2. It is unique in that it is used for paired subjects (in this way it is similar to the paired T-test and Wilcoxon Signed Rank test) "Pair of McNemar Wizards" NULL HYPOTHESIS: proportion of success from treatment 1 is equal to proportion of success from treatment 2 INTERPRETATION: Calculate McNemar's chi squared test value Compare to critical value for p of 0.05. if larger than critical value (in the tail) than reject the null and conclude that there is a difference between the proportions. Key is that it can be used with paired subjects such as before and after treatment |
|
|
Which is used for continuous and which is used for categorical data: T-Test, WSR, Chi squared, ANOVA, Logistic regression, McNemar's?
|
Continuous data: T-Test, WSR, ANOVA
Categorical data: Chi squared, Logistic regression, McNemar's |
|
|
CATEGORICAL DATA: (dichotomous or descrete)
LOGISTIC REGRESSION |
PURPOSE:
Predict odds of success from predictors, X, independent variables. MNEMONIC: Y var, Outcome, Dependent - Predictor, Independent, X var (YODa PIX) Key is to associate Odds ratios with lOgistic regression |
|
|
SURVIVAL ANALYSIS
|
PURPOSE:
Predict survival time or time to disease during a specific time period (to 5 years) from independent variables. OUTCOME: Survival time or time to develop disease FEATURES: Each subject has known entry point but this point can vary between subjects. Each subject has well-defined end-point (death, aquired disease, lived beyond 10y). Expressed as survival time after some event (infection, surgery, treatment). Units are person years of observation before end-point No continuous or categorical data suitable for standard statistical tests. Other tests also not applicable becuase subjects censored, lost to follow-up and drop out. |
|
|
Kaplan-Meier Life table
|
Be familiar with interpreting the graph. Hashes equal a censored subject, drop-offs equal a death. Many tests to compare two or more survival curves. Test names have Mantel ,Cox, Wilcoxon, Gehan, and Peto in them .
|
|
|
Cox Proportional Hazard Regression
|
Combines survival curve methods with logistic regression.
Can adjust for other factors. |
|
|
Log Rank Test
|
Example interpretaton:
Breast cancer survival rate is significantly different between the presence and the abscence of lymph node metastasis subjects. |
|
|
Meta Analyses
|
Forest Plots (plot with horizontal conficence interval bars). Small CI suggest large numbers of subjects
Funnel Plots (if pyramid shape than no publication bias) Axes are std error and log odds ratio |
|