• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/97

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

97 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
Name four measures of central tendency.
Mean, median, mode, geometric mean.
Dependent and independent variables are associated with which of either outcome or predictor?
Dependent variables represent the outcome variable; independent variables represent the predictor. The dependent variable is cholesterol level the independent variable is the drug.
A variable with no inherent order examples are race and gender.
Nominal variables.
Variables with the natural order but not evenly spaced examples are stages of cancer Likert scales.
Ordinal variables.
Variables which are evenly spaced and no absolute zero exists. Examples are the temperature scale in Fahrenheit or Celsius.
Interval variables.
Variables which are true zero point represents total absence of a variable examples are age annual income and temperature and Kelvin
Ratio variables.
Name two types of categorical or discrete variables.
Nominal and ordinal.
Name two types of continuous variables.
Interval and ratio
A frequency distribution, bar graph and pie chart are ways to display which of either categorical or continuous variables?
Frequency distributions, bar graphs, and pie charts are ways to display categorical variables.
What are the two types of measures that are used on continuous variables?
Measures of central tendency and measures of spread/variation/dispersion.
The arithmetic mean is commonly known as an average. What is the measure of a mean sensitive to?
The arithmetic mean is sensitive to extreme values.
The median is the middle value of an odd number of sorted data. In the following even number of data what is the median? 4 4 6 9 12 16
Take the mean of the two middle numbers. (6+9)/2 = 7.5.
The median is not sensitive to what?
Extreme values.
The mode is most commonly occurring value is in more resistant to:
Extreme values.
A bimodal distribution has how many modes?
Two.
If you take the log of each value and find the mean of all those log values you have calculated what?
The geometric mean. It is used to describe a skewed distribution and can only be used for positive values.
In a right or positive skewed distribution the tail is on the right or left? And the mean is greater or less than the median?
The tail is on the right and the mean is greater than the median. The opposite is true for left or negative skew.
Named the five measures of spread/variation/dispersion.
Range, interquartile range, variance, standard deviation, coefficient of variation.
Of the five measures of dispersion, the difference between the minimum and maximum values and provides no information about scatter within the series is the definition of:
Range.
Of the five measures of dispersion, which one examines the difference between individual data points and the mean?
Variance.
Of the five measures of dispersion, first find the median than calculate the mean of the upper and lower halves defines:
Interquartile range.
Of the five measures of dispersion, the most useful measure of dispersion and is the square root of variance defines:
Standard deviation.
Of the five measures of dispersion,the ratio of the SD to the Mean times 100 defines:
Coefficient of Variation. It is used to compare the spread in two sets of data. For instance, weight and height. If 50 people have a mean weight of 150lbs and as SD of 28lbs and the same people have a mean height of 66 in and a SD of 6 in then 28/150 x100 = 18.7% or CV for weight and 6/66 x 100 = 9.1% or CV for height. This tells you that the spread of weight is greater than the spread of height among these people.
In a histogram what is represented on the Y axis
Frequencies. The data bins are on the X axis.
On what diagram can you represent the mean, median, range, standard deviation, and interquartile range.
Box plot
The extremes of the box represent:
Interquartile range
The line through the box represents:
Median
The plus symbol in the box represents:
Mean
The whiskers can variously represent:
The range of the data, the standard deviation, 9TH and 91st percentiles.
What is the symbol for the population number, mean, and standard deviation?
N, mu, sigma
What is the symbol for the sample number, mean, standard deviation?
n, x-bar, s or SD
What do the following features describe: 1) every unit in the population had the same probability of being selected 2) chance alone determines whether a particular unit in the population is selected for the sample 3) allows estimation or inference of the nature of the population without having to observe the entire population.
Random sample.
Name four types of random samples
Simple, stratified, cluster, systematic. SSCS
Describe a simple random sampling
Each unit n in the population N has an equal chance of being selected.
Describe a stratified random sampling
Population is divided into groups (by gender, location, age) and a random sample is taken from each group.
Describe a cluster random sampling
A random sampling of groups is made and all in that group are in the sample. for instance randome schools are chosen in a state and all the children in each chose school are sampled.
Describe systematic random sampling
a method such as every 10th member or day of the week is used to select from the population.
What is another name for a probability distribution.
Normal distribution.
In a standard normal distribution what is the value of mu (population mean) and sigma (population standard deviation)?
mu = 0, and sigma = 1
The total area under the curve of a normal distribution equals what value?
1
What three measures equal each other in a normal distribution
Mean = median = mode
What value has changed when the normal distribution shifts either left or right?
A change in the mean will shift the normal distribution either left or right.
What values change when the normal distribution becomes flatter or more peaked?
A change in the standard deviation
What does the area under the curve represent?
Probability
Two standard deviations about the mean represent what percentage of the distribution
0.9545
Exactly 95% of the distribution falls within plus or minus ____ standard deviations of the mean?
1.96
A ___ table gives the probability of a random outcome variable (or one random sample value) being greater or less than a chosen number or cut point.
Z. The probability of getting the value, Z, is the area in one or both tails and is derived from the body of the Z-table (standard normal distribution)
Describe how a Z-score is used.
A z-score reflects how many standard deviations above or below the population mean a raw score is.

For instance, a group of patients had a mean blood pressure of mu = 120 and a standard deviation sigma = 10, what is the probability of randomly selected patient having a BP of greater than x = 135? You must convert this nl distribution to the std nl distribution and obtain a z –score using z = (x-mu)/sigma.

In this case a z-score of (135-120)/10 = 1.5. You then take your z-score 1.5 (which indicates that the value 135 is 1.5 standard deviations above the mean) and get from the Z table that the probability of getting this value or greater is 0.0668.
What does the following statement define: means of multiple samples from a population tend to have a normal distribution regardless of how the data are distributed. This tendency increases as the number of observations in the samples increase
The central limit theorem
What is the difference between standard deviation and standard error of the mean
Use standard deviation when you're interested in showing the scatter of the data. Use the standard error of the mean when you want to show how well you know the mean (precision of the mean)
What is the formula for standard error of the mean
SEM = SD/sq rt of n
What is the formula for calculating the 95% confidence interval
Xbar +/-1.96(SD/sq rt of n)
Often one wants to increase the precision of the their estimate by decreasing the confidence interval. What three things can do in the formula to increase precision or decrease the CI
Decrease the alpha level, decrease the standard deviation, or increase the n.
The different types of probability distributions are divided into two groups. There are probability distributions for discrete data and probability distributions for continuous data. What are two common types of probability distributions for discrete and five common types ofr continuous data?
Discrete: binomial, poisson. Continuous: normal, standard normal (Z dist.), Student's T (T dist.), F distribution and Chi^2.
A classic example for a binomial distribution is a coin flip. Each trial must result in:
one of two possible mutually exclusive outcomes: success or failure. Furthermore, the events must be independent of each other.
A poisson distribution is for a. continuous or b. categorical variables
Poisson is for categorical variables. With Poisson, think counts, think rare discrete events, think Mean = Variance
What is the shape of the T distribution?
It is like the nl distribution only flatter.
What happens to the shape of the T. distribution as the degrees of freedom increase?
The distribution approaches the normal or z. the curve becomes less spread out and more peaked to match the nl distribution
In hypothesis testing the groups that can be compared are two different means or two different:
You typically test sameness or difference of Means or Proportions. mu = mu or p = p.
The null hypothesis states that there is no difference between the two groups compared. If we fail to reject (accept the null hypothesis) than any difference between the groups is due to:
Chance alone
The alternative hypothesis states that there is a difference between the two groups compared and that factors other than chance alone account for the:
Difference
If the P. value is greater than 0.05 then you fail to reject the null hypothesis and conclude that there is:
No difference between the groups
If the P. value is less than 0.05 then you reject the null hypothesis and conclude that the difference is
Due to some other factor and not from chance alone
An alpha error is associated with which type of error
Type I error
A beta error is associated with which type of error
Type II.
If you fail to reject the null, or your p-value is greater than 0.05 and you stated that there was no difference when there actually was a difference between the groups which type of error had been made?
This is a beta or type II error and is usually assoicated with a low power study or your n is too small (sts).
If you reject the null and your p-value is less than 0.05 and you state that there is a difference when there truly is no difference between the groups which type of error had been made?
This is the type I or alpha error
Power is equal to one minus beta. Beta is equal to the probability of failing to reject a true alternative hypothesis (incorrect). One minus beta is equal to the probability of accepting a true alternative hypothesis (correct)
Power is the ability of the study to detect the true difference of specified size that actually exists
In a good study your alpha . and beta are as small as possible
A good alpha value is 0.05 a good beta value is 0.20 or power is equal to 0.80
deleted question
The chi-squared distribution is a distribution is skewed to the right and approaches a symmetric distribution as the degrees of freedom increase
The shape of the F. test for ANOVA depends on the degrees of freedom and the number of groups being compared
When using confidence intervals for inference remember that if 95% confidence interval does not include the null value or 1 then reject the null hypothesis
If the 95% confidence interval does include the null value or 1 then you cannot reject the null hypothesis
What is the difference between parametric and non-parametric tests
Parametric statistics assume that data have come from a type of known probability distribution and makes inferences about the parameters of the population distribution. Non-Parametric statistics do not depend on a known distribution or known parameter (such as mean or SD). These tests are generally more complicated but can minimize the affect of outliers and assumptions such as the central limit theorem that may not apply to your data.
HYPOTHESIS TESTING. 3 quick associations:
1) Continuous data ~ Means and SDs ~ T-tests, ANOVA, pearson's correlation, linear regression.

2) Categorical data ~ ratios and proportions ~ Chi^2, Fisher's exact, logistic regression.

3) Non-Parametric tests (do not follow the normal or other probability distribution; "distribution free"; based on ranks) ~ Wilcoxon Rank Sum, Wilxoxon Signed Rank, Kruskal-Wallis, Spearman's Correlation
Other associations:
Two Sample t-test ~ Wilcoxon Rank Sum
Paired t-test ~ Wilxoxon Signed Rank (WSR, "Pair of Wizards")
ANOVA ~ Kruskal-Wallis
When choosing a hypothesis test statistic what are 4 factors to consider: (look for these features in the board question stems)
TYPE OF DATA VARIABLE:
1. Continuous
2. Categorical
3. Survival.

PURPOSE OF THE TEST:
1. Comparing means
2. Comparing proportions
3. Comparing survival curves

DISTRIBUTION OF THE OUTCOME
1. Normal or key distribution (parametric method)
2. Non-normal distribution (nonparametric method)

INDEPENDENT OR DEPENDENT OUTCOME
1. Independent outcome: each patient contributes an outcome or is tested only once.
2. Dependent outcome: each patient contributes more than one outcome or is tested more than once.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
T-DIST: (Parametric method)
ONE-SAMPLE T-TEST.
PURPOSE:
Compare sample mean to known population mean.

DISTRIBUTION:
T-distribution but approaches the nl dist when sample >30

NULL HYPOTHESIS:
Sample mean = Pop mean

INTERPRETATION:
When calculated T-value is greater than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is out in the tail and you reject the null and say that there is a difference.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
T-DIST: (Parametric method)
TWO-SAMPLE T-TEST.
PURPOSE:
Compare sample mean of two different populations for sameness or difference.

DISTRIBUTION:
T-distribution but approaches the nl dist when sample >30

NULL HYPOTHESIS:
group 1 mean = group 2 mean

INTERPRETATION:
When calculated T-value is greater than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is out in the tail and you reject the null and say that there is a difference.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
T-DIST: (Parametric method)
PAIRED T-TEST.
PURPOSE:
Compare mean of one population measured before and after exposure to a variable (example: drug treatment)

DISTRIBUTION:
T-distribution but approaches the nl dist when sample >30

NULL HYPOTHESIS:
Pop mean before = Pop mean after

INTERPRETATION:
When calculated T-value is less than the value in the table (the value associated with 0.05 and the appropriate degrees of freedom) then the value is less than your critical value and under the curve, not out in the tail. So, you would fail to reject the null and say that there is no difference.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
F-DIST: (Parametric method)
ANOVA. - Comparison of three or more variables (like two sample T-test except with more variables)
PURPOSE:
Compare means of three or more populations

DISTRIBUTION:
F-distribution but approaches the nl dist when sample >30

NULL HYPOTHESIS:
All means equal each other. The alternative hypothesis is that at least one mean is different than another.

INTERPRETATION:
Determines if there is at least one difference between means. Follow up with post hoc tests pairing population means to discover which means differ. Use the F-statistic to make the decision.
How is ANCOVA different than ANOVA
Its purpose is the same as ANOVA except that it can adjust for covariants like age, smoking, race etc.
TEST OF CORRELATION:
CONTINUOUS DATA:
NL DIST: (Parametric method)
PEARSON'S CORRELATION
PURPOSE:
Quantify the association between two different continuous variables

DATA:
Used when both X and Y variables have normally distributed histograms.

DISTRIBUTION:
Nl distribution

INTERPRETATION:
The goal of correlation analysis to understand the nature (positive or negative) and the strength (High r value or low r value).

Two independent variables, X. and Y., are plotted against each other.

Correlation is denoted as r.

The range of r is -1 to 1.

example: drowning deaths are most frequent during the months when ice cream sales are their highest.

0.75 to 1 equals high positive correlation.
0.50 to 0.57 is a moderately high positive correlation.
0.25 to 0.50 is a moderately little positive correlation.
Zero 20.25 is a low positive correlation

Correlation does not equal causation.

By eating ice cream you are more likely to get cramped and drowned while swimming. OR when people drown the mourners are more likely to console themselves by eating ice cream.
NON-PARAMETRIC TEST PROPERITES
DATA:
Continuous data.

DISTRIBUTION:
no distribution assumption

USEFUL FOR:
small sample sizes,
not normally distributed data, ordinal data,
nonparametric tests are less powerful than parametric tests.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
NON-NORMAL DIST: (Non-Parametric method) WILCOXON RANK SUM TEST
PURPOSE:
Rank all the oucomes in order from lowest to hightest. Then compare the sum of the ranks in group 1 vs the sum of the ranks in group 2.

COMPARED TO:
Two sample T-test

DISTRIBUTION:
Non-normally distributed

HYPOTHESIS:
Determine if one group is greater than another (one-tailed) or if the groups are different (two -tailed).

INTERPRETATION:
Determines if there is at least one difference between means. Follow up with post hoc tests pairing population means to discover which means differ. Use the F-statistic to make the decision.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
NON-NORMAL DIST: (Non-Parametric method) WILCOXON SIGNED RANK TEST
PURPOSE:
Compare one population before and after an intervention. Sum the ranks between all positive and all negative differences and compare the positive difference to the negative difference.

COMPARED TO:
Paired T-test
(Mnemonic: Pair of McNemar Wizards WSR)

DISTRIBUTION:
Non-normally distributed

HYPOTHESIS:
Null hypothesis is that there is no difference between measured outcomes before and after intervention.

INTERPRETATION:
If positive sum is very different from the negative sum than the p-value will be low and you can reject the null if it is less than 0.05.
TEST OF HYPOTHESIS:
CONTINUOUS DATA:
NON-NORMAL DIST: (Non-Parametric method)
SPEARMANS'S CORRELATION
PURPOSE:
Quantify the association between two different continuous variables when X and Y are not normally distributed variables.

DISTRIBUTION:
Non-normal distribution - (non-parametric)

INTERPRETATION:
The goal of correlation analysis to understand the nature (positive or negative) and the strength (High r value or low r value).

Two independent variables, X. and Y., are plotted against each other.

Correlation is denoted as r.

The range of r is -1 to 1.

0.75 to 1 equals high positive correlation.
0.50 to 0.57 is a moderately high positive correlation.
0.25 to 0.50 is a moderately little positive correlation.
Zero 20.25 is a low positive correlation

Correlation does not equal causation.
CONTINUOUS DATA:
LINEAR REGRESSION
PURPOSE:
Predict outcome, Y (dependent var) from predictor, X, (independent var)

TYPES:
Simple linear - predicts outcome from one independent var

Multiple linear - predicts outcome from multiple independent var.
What is the linear regression equation:
Y= B0 +B1(X1) (1 is subscript)
B0= y intercept
B1 = slope
X = Independant var (covariate)

Y = mX + B
Y = dependent var
m = slope
X = independent var (covariate)
B = y-intercept
Y = Dependent var

In multiple linear regression you add additional var B2(X2) + B3(X3)
The errors or residuals of a fitted line is:
the distance between the ploted point and the fitted line
TEST OF HYPOTHESIS:
CATEGORICAL DATA: (dichotomous)
DISTRIBUTION: CHI2 (Non-Parametric method)
CHI SQUARED TEST
PURPOSE:
Comparison of proportions of two or more samples

NULL HYPOTHESIS:
proportion 1 is equal to proportion 2

INTERPRETATION:
Calculate chi squared test value
Compare to chi squared critical value for p of 0.05.
if larger than critical value (in the tail) than fail to reject the null (accept the null) and conclude there is no difference between proportions.
TEST OF HYPOTHESIS:
CATEGORICAL DATA: (dichotomous)
DISTRIBUTION: CHI2 (Non-Parametric method)
CHI SQUARED TEST
PURPOSE:
It is applied to 2 × 2 contingency tables with a dichotomous trait, with MATCHED PAIRS of subjects, to determine if the proportion of success from treatement 1 is different from the proportion of success from treatment 2. It is unique in that it is used for paired subjects (in this way it is similar to the paired T-test and Wilcoxon Signed Rank test) "Pair of McNemar Wizards"


NULL HYPOTHESIS:
proportion of success from treatment 1 is equal to proportion of success from treatment 2

INTERPRETATION:
Calculate McNemar's chi squared test value
Compare to critical value for p of 0.05.
if larger than critical value (in the tail) than reject the null and conclude that there is a difference between the proportions.

Key is that it can be used with paired subjects such as before and after treatment
Which is used for continuous and which is used for categorical data: T-Test, WSR, Chi squared, ANOVA, Logistic regression, McNemar's?
Continuous data: T-Test, WSR, ANOVA

Categorical data: Chi squared, Logistic regression, McNemar's
CATEGORICAL DATA: (dichotomous or descrete)
LOGISTIC REGRESSION
PURPOSE:
Predict odds of success from predictors, X, independent variables.

MNEMONIC:
Y var, Outcome, Dependent - Predictor, Independent, X var
(YODa PIX)

Key is to associate Odds ratios with lOgistic regression
SURVIVAL ANALYSIS
PURPOSE:
Predict survival time or time to disease during a specific time period (to 5 years) from independent variables.

OUTCOME:
Survival time or time to develop disease

FEATURES:
Each subject has known entry point but this point can vary between subjects.
Each subject has well-defined end-point (death, aquired disease, lived beyond 10y).
Expressed as survival time after some event (infection, surgery, treatment).
Units are person years of observation before end-point
No continuous or categorical data suitable for standard statistical tests.
Other tests also not applicable becuase subjects censored, lost to follow-up and drop out.
Kaplan-Meier Life table
Be familiar with interpreting the graph. Hashes equal a censored subject, drop-offs equal a death. Many tests to compare two or more survival curves. Test names have Mantel ,Cox, Wilcoxon, Gehan, and Peto in them .
Cox Proportional Hazard Regression
Combines survival curve methods with logistic regression.

Can adjust for other factors.
Log Rank Test
Example interpretaton:
Breast cancer survival rate is significantly different between the presence and the abscence of lymph node metastasis subjects.
Meta Analyses
Forest Plots (plot with horizontal conficence interval bars). Small CI suggest large numbers of subjects

Funnel Plots (if pyramid shape than no publication bias) Axes are std error and log odds ratio