• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/81

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

81 Cards in this Set

  • Front
  • Back
The four parts of statistics
It's the study of data analysis-defining the problem, collecting the data, analyzing and summarizing data, and drawing inferences from data.
Distribution
A list of the possible values of a variable together with how often each value occurs
Types of data
Quantitative and categorical
Left Skewed
The tail is on the left, mean is less than the median.
Right Skewed
Tail on the right, the mean is greater than the median
Histogram and boxplot
A right skewed histogram will match with a boxplot that has a long right whisker
Outliers
A data point that is quite a bit removed from the rest of the data
Mean
A measure of center. The "balance point". Computed by adding numbers and dividing by how many there are
Median
Measure of center, cuts the data in half. Order the data and find middle observation. Outliers have no affect on the median.
Five number summary
Each quartile is 25 percent. Minimum, median and maximum
Standard Deviation
Measures the variability of the data about the mean; in a sense, it is like the average distance of the date from the mean.
Effects of outliers on standard deviation.
Since outliers affect the mean and deviations from the mean are used to compute standard deviation, then outliers can make the standard deviation bigger than it should be.
When to use five number summary or mean and standard deviation.
Use the five-number summary in the presence of outliers.
When can a normal distribution be used to model a data set.
The Normal distribution can be used whenever the shape of the histogram of the data resembles the normal curve.
How do you obtain a proportion or a probability for a value from a normal curve.
First, convert the value to a z score and look the z score up in table A. Be sure that if you want the "greater than" percentage you subtract the probability given in the table from one.
68-95-99.7 rule.
68% of the observation are within one standard deviation, 95% within 2 st dev., 99.7 within three.
Z-score
The z-score tells how many standard deviations a value is from the mean. If a score from a test has a z score of 1.8 that means the score is 1.8 standard deviations above the mean.
Explanatory Variable
The variable we are assessing or testing in an experiment.
Response Variable
The variable we want to predict (y) and the explanatory variable is the x or the variable we use to do the predicting.
Correlation
Correlation is a measure of the linear relationship between the x and y variables. R is the symbol for correlation coefficient.
Corellation (r)
r is always between -1 and 1. Values close to zero indicate littler or no linear relationship. Values close to -1 show strong negative,+1 strong positive. r has no unit of measure. r only measures strenghth of linear relationships.
Least squares regression line
The line obtained by minimizing the sum of the squared residuals.
Purpose of regression equations.
Regression equations are used to model relationships between quantitative variables and also for prediction.
Residual
Observed y minus predicted y
Slope in a least squares regression line.
slope tells us the average increase(or deacrease if negative) in y for each one unit increase in x.
Roles of x and y in regression
X is the explanatory and y is the response variable
r-squared
Tells the percentage of total variation in the y's that can be explained by the x's
Residual Plots
1) Uniform scatter 2)outlers (normality) 3) Megaphone (equal variance) 4) A smile or frown (non linearity)
Necessary Conditions for testing Significance using slope
Normality-Variance is constant
Independent-Linear
Tests on slope
Ho:B=0 versus Ha:B does not equal 0. 1. No linear relationship exists
2. A significant relationship exists.
s in regression output
The s in regression output measures the standard deviation of the observed y's about the regression line.
Extrapolation
Using an x value outside of the range of the observed x's used to obtain the reqression equation to predict a y value.
CI vs PI
PI is wider
Lurking Variable
Affects the relationship between the response variable and explanatory variable but is not part of the study. They are dangerous because they can suggest relationships that do not really exist.
Marginal Distribution
row total divided by table total
Conditional distrubtion
cell count divided by the row.
* if the distributions are equal we say the variables are NOT related.
Voluntary response sample
Samples obtained by having respondents call in or write in voluntarily
*biased, not probability sample
Convenience Sample
Researchers contact subjects that are convenient to contact.
*biased, not probability sample
Population of interest vs. sample
The entire group of interest-population
The subgroup of individual from the population about which the researcher actually obtains information-sample
Response Variable
The observation recorded (measured) on each individual.
Bias
The amount that sample results systematically differ from what they should be. Bias can be eliminated by taking probability samples, using careful wording on survey questions etc.
SRS
Sampling from the entire population
Stratified sampling
Sampling from withing froups of a population or sampling from withing different populations
Multistage
First sampling groups and then sampling from within those groups.
SRS & Stratified & Multistage
All probability samples where every member of the population has a known non-zero chace of being selected. SRS gives each possible sample of size n an equal chance of being selected and a stratified sample gives each member of the stratea an equal chance within its strata of being selected.
Cautions of surveying
Uncercoverage, non-response, response bias, wording of questions
Methods to reduce bias
Using probability samples, avoiding undercoverage, reducing non response and avoiding poor wording on questions.
Observational Study
Studies where information is gathered on the population but nothing is inflicted on the subjects.
Sample Surveys
Observational studies not experiments.
Advantage of experiment over observational study.
You can establish causation
Control group
Those that recieve the placebo
Completely randomized experiment
An experiment where all experimental units are allocated at random among the treatment groups
Replication
having more than one experimental unit per treatment
Double blind
Niether the subjects nor the diagnostician know who is recieving the treatment or placebo.
Matched Pairs
Taking two measurements on each individual
Completely random
Two completely seperate groups. In the experiment. If individuals are randomly allocated to treatments you have a completely random design. In experiment if order in which treatments are applied to individuals is randomized you have a matched pairs.
Sampling Distribution
A list of the possible valued of the statistic together with the probabilities of each value. A collection of all statistics values from all possible samples.
Increase preicision
by increasing sample size
x-bar
sample mean
P hat
sample proportion
When is Phat normal
when np is greater than 10 and when n*1-P is greater than 10.
Inferential Statistics
Using information from a sample to draw inferences about a population.
Two most used types of inferential statistics
Confidence interval and test of hypothesis
Confidence Interval
Gives a range of plausible values of the parameter being estimated by the confidence interval
95 % confidence means?
The procedure provides intervals containing the parameter value for 95% of all samples. Confidence is in the procedure not the interval.
p-value
The probability of getting a test statistic as extreme or more extreme than the value actually ovserved if the null hypothesis were true.
alpha
The level of significance, probability of rejecting the null hypothesis when it is true or the largest risk the researcher is willing to take in rejecting a true null hypothesis.
multiple analyses
two or more tests of significance perforemd. Inflates the type 1 error rate
Power
Increasing alpha descreases beta and increases power. Increasing sample size decreases beta and increases power
T distribution vs. Normal distribution
t distribution is more spread out than normal.
First rule in data analysis
plot the data
When are t procedures robust
When there are no outliers.
Chi-Square Hypothesis
Ho: no relationship
Ha: relationship
Expected Count
Coumng total times row total divided by table total
When is using Chi-square appropriate
Expected counts are greater than 5.
DF for two way table
(r-1)(c-1)
What does Chi-Square test?
Measures the amount of discrepancy between the observed counts and the expected counts where expected counts are computed assuming the null hypothesis of no relationship is true.
Anova Hypothesis
Ho: All means equal
Ha: at least one mean differs
When do you use ANOVA
When the explanatory variable is categorical and the response variable is quantitative OR the problem says you are comparing three or more means.
When do you use Chi-square
When both variables are categorical.
What symbols to use for means
mu for means, p symbols for proportions, beta for slope in regression.