Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
82 Cards in this Set
- Front
- Back
- 3rd side (hint)
Design |
Plan how to obtain the data |
|
|
Description |
Summarize the data with graphs and numerical summaries |
|
|
Inference |
Use data from a random and representative sample to draw conclusions about the the population of interest |
|
|
Parameter |
Numerical summary of a population |
|
|
Statistic |
Numerical summary of a sample |
|
|
Subjects |
Persons, animals, or objects in our study/experiment |
|
|
Variables |
The characteristics that we measure on each subject |
|
|
Population |
All subjects of interest |
|
|
Sample |
Subjects for whom we have data |
|
|
Random Sampling |
Each member of population has the same chance of being included in the sample (representative of the population) |
|
|
Categorical Variables |
Summarize with counts and percentages Graphs: bar charts and pie charts |
|
|
Quantitative Variables |
Takes on numerical values Graphs: dotplots, histograms, stemplots, and box plots Measures of center: mean, median, mode Measures of spread: range, IQR, standard deviation |
|
|
Discrete Quantitative |
Take only a finite list of possible outcomes, such as a count (ex: year, absences) |
|
|
Discrete Quantitative |
Take only a finite list of possible outcomes, such as a count (ex: year, absences) |
|
|
Continuous Quantitative |
Has a an infinite list of possible values that form an interval (ex: exam scores) |
|
|
Dotplot |
Back (Definition) |
|
|
Stemplot |
Back (Definition) |
|
|
Histogram |
Back (Definition) |
|
|
Boxplot |
Back (Definition) |
|
|
Bar Chart |
Back (Definition) |
|
|
Pie Chart |
Back (Definition) |
|
|
Histogram Shapes |
Back (Definition) |
|
|
Mean |
The average of all the observations |
|
|
Median |
Observation “right in the middle” : M |
|
|
Mode |
Most frequently occurring value |
|
|
Range |
Range = maximum - minimum |
|
|
Variance |
-Average squared deviation from the mean: s^2 |
|
|
Standard Deviation |
Square root of the variance: s |
|
|
Empirical Rule |
In any bell-shaped and symmetric distribution you will find approx: -68% of the observations within one sdev of the mean -95% of the observations within 2 sdev of the mean -99.7% of the observations within 3 sdev of the mean |
|
|
Quartiles |
Divide the data set into four quarters |
|
|
IQR |
Measures the spread of the central 50% of the data IQR = Q3 - Q1 |
|
|
IQR |
Measures the spread of the central 50% of the data IQR = Q3 - Q1 |
|
|
Five number summary |
Minimum, Q1, median, Q3, maximum |
|
|
Z-Score |
Fill in…. |
|
|
Scatterplot |
Back (Definition) |
|
|
Explanatory Variable |
X axis |
|
|
Response Variable |
Y axis |
|
|
DOTS (Scatterplot) |
Direction Outliers Trend Strength |
|
|
Correlation |
-The direction and strength of the straight line in relationship between x and y. -Represented by r -r is always between 1 and -1 (no units) -Interpretation: strong/weak, positive/negative -Outliers can have strong effect of r |
|
|
Regression Equation |
y(hat) = a+bx |
|
|
B (slope of regression line) |
Average change in y for one unit change in x |
|
|
A (slope of regression) |
Y-intercept: expected value of y when x=0. Only interpret if x=0 makes sense and is close to values of x observed in data |
|
|
Residuals |
Prediction of errors for each observation Residuals = y - y(hat) |
|
|
Residuals |
Prediction of errors for each observation Residuals = y (observed y) - y(hat) (predicted y) |
|
|
Least Squares Method |
Finds the line that minimizes the sum of the squared residuals |
|
|
R^2 |
R^2= (r)^2 Proportion of the variability in y that is explained by the regression on x |
|
|
Cautions |
Influential Outlier: points that have an x value far away from the rest Correlation (or Association): does not imply causation Extrapolation: extend the application to an unknown situation by assuming that existing trends will continue or similar methods will be applicable Simpson’s Paradox: a lurking variable can reverse the association between two categorical variables in a contingency table |
|
|
Contingency Tables |
-Both explanatory and response variables are categorical -Display counts (frequencies) on the table -Compute % to determine association |
|
|
Experiments |
The researcher assigns subjects to certain experimental treatments |
|
|
Experiments |
The researcher assigns subjects to certain experimental treatments |
|
|
Observational studies |
Researcher does nothing to subjects but observe x and y |
|
|
Experimental unit |
Subjects involved in the experiment |
|
|
Experimental unit |
Subjects involved in the experiment |
|
|
Treatments |
Experimental conditions given to the subjects |
|
|
Random Phenomenon |
Distinct predictable pattern after many outcomes |
|
|
Probability |
Long-run relative frequency |
|
|
Independent Trials |
The outcome of one trial is not affected by the outcome of another |
|
|
Sample Space |
The set of all possible outcomes |
|
|
Event |
An outcome or group of outcomes, a subject of the sample space |
|
|
Biased samples |
Systematically favor certain outcomes, not representative of the population of interest |
|
|
Margin of error |
1/square root of n |
|
|
Placebo |
Dummy treatment |
|
|
Placebo |
Dummy treatment |
|
|
Blind study |
Subjects do not know if they receive treatment or placebo |
|
|
Blind study |
Subjects do not know if they receive treatment or placebo |
|
|
Double blind study |
Neither the subject or those in contact with the subject know who gets the treatment or placebo |
|
|
Randomization |
Use mechanical method to select subjects and assign them to treatments |
|
|
Randomization |
Use mechanical method to select subjects and assign them to treatments |
|
|
Replication |
Number of experimental units that get each treatment |
|
|
Cross-sectional studies |
Sample surveys that just want to take a snapshot of the population at the current time |
|
|
Case-control studies |
Retrospective studies (backward looking) in which we match each case (positive outcome) with a control (negative outcome) and then ask questions about the explanatory variable |
|
|
Prospective studies |
Forward looking and follow subjects into the future |
|
|
Complement of an event |
The rest of the sample place, written as A^c P(A^c)=1-P(A) |
|
|
Disjoint events A and B |
P(A or B)= P(A) + P(B) |
|
|
Conditional probability |
P(A|B) = P(A and B) / P(B) |
|
|
Independent events A and B |
If two events are independent, knowledge about one event tells us nothing about the other event Definition: P(A|B) = P(A) Multiplication rule: P(A and B) = P (A) x P(B) |
|
|
Discrete Random Variables |
Finite number of possible values Prob distribution: list, graph or formula with all possible values of X and their probabilities Population mean: u=sum of xP(x) |
|
|
Continuous Random Variables |
-Infinite number of possible values -Probabilities are areas under a density curve (smooth) with a total area of 1 -Assign probabilities to intervals, not individual values of X |
|
|
Normal Probability Distributions |
- Bell-shaped curves, indexed by their mean and standard deviation -Follows empirical rule |
|
|
Binomial distribution |
-Each of n trials can have two possible outcomes: success or failure -Probability of success for each trial is the same: p (independent events) -Binomial Random Variable X counts the number of successes Mean: u(mu)=np Standard deviation: square root of np(1-p) |
|
|
Binomial distribution |
-Each of n trials can have two possible outcomes: success or failure -Probability of success for each trial is the same: p (independent events) -Binomial Random Variable X counts the number of successes Mean: u(mu)=np Standard deviation: square root of np(1-p) |
|
|
P(at least one) |
1-P(none) |
|