Definition:
the science of collecting, organizing, and interpreting data 
statistics


Enumerative or Analytical?
we have a finite population 
enumerative study


Enumerative or Analytical?
we have an infinite/conceptual population 
analytical study


Parameters or statistics?
numerical summary of a population 
parameters


Parameters or statistics?
numerical summary of a sample 
statistics


Definition:
drawing conclusions from the information based on a sample 
statistical inference


Classification of variables:
categorical/qualitative or quantitative? places an individual into one of several groups or categories 
categorical/qualitative variable


Classification of variables:
categorical/qualitative or quantitative? a number 
quantitative variable


Discrete and continuous variables are a subcategory of _________ variables.

quantitative


Discrete or continuous variables?
has either a finite number of possible values or a countable number of values 
discrete variable


Discrete or continuous variables?
has an infinite number of possible values (takes values in intervals or a continuum) 
continuous variable


Observational or experimental study?
investigator's role is basically passive  no attempt is made to manipulate or influence the variables of interest 
observational study


Observational or experimental study?
investigator's role is active  variables are manipulated, the study environment is regulated 
experimental study


Observational or experimental study?
treatments are applied to experimental units to try to determine the effects of the treatment on the response variable 
experimental study


Definition:
 the goal is to obtain individuals in such a way that accurate information may be obtained about the population 
sampling


Name the 5 possible sampling biases.

Selection Bias
Measurement Bias Response Bias Nonresponse Bias Question Bias 

How do we avoid selection bias?

Simple random sample (SRS)


Definition:
size n from a population, random selection of sample population 
Simple random sample


Definition:
individuals easily obtained, most popular example is when individuals are selfselected 
Convenience sample


These are requirements of ________.
intervals must be nonoverlapping intervals must be contiguous intervals must be equal width 
Histograms


Matching:
 easy to calculate  easy to work with algebraically  highly affected by outliers  not resistant to extreme observations 
Mean


Matching:
 more resistant to a few extreme observations  robust 
Median


Matching:
 where is the peak  the most frequent value in the data  possible to have more than one  important for categorical data 
Mode


Given a rightskewed histogram, list the mean, median, and mode in order from smallest to largest.

mode < median < mean


Given a leftskewed histogram, list the mean, median, and mode in order from smallest to largest.

mean < median < mode


Name the effects of adding a constant to all data points on the mean, median, variance, and standard deviation.

 add the same constant to the mean
 add the same constant to the median  variance and standard remain the same 

Name the effects of multiplying all data points by a constant on the mean, median, variance, and standard deviation.

 multiply the mean, median, and mode by the same constant
 variance will be the constant squared x the original variance (s^2*const^2)  standard deviation will be the constant x the absolute value of the original st. dev. 

Name the effects of adding to the maximum data point on the mean, median, variance, and standard deviation.

 increases mean
 median stays the same  variance and std. dev. will increase 

The Empirical Rule only applies to what type of histogram?

symmetric, unimodal, bellshaped


According the Empirical Rule, roughly _____ % of data will fall between xs and x+s (x = mean, s = st. dev)

68


According to the Empirical Rule, roughly _____ % of data will fall between x2s and x+2s (x = mean, s = st. dev)

95


According to the Empirical Rule, roughly _____ % of data will fall between x3s and x+3s (x = mean, s = st. dev)

99.7


Definition:
observations that lie outside the overall pattern of a distribution 
outliers


Definition:
 median of the observations who are less than the overall median 
Q1


Definition:
 median of the observations who are greater than the overall median 
Q3


The interquartile range (IQR) is given by:

Q3Q1


The upper fence is given by:

Q3 + 1.5*IQR


The lower fence is given by:

Q1  1.5*IQR


Anything above the upper fence or below the lower fence is considered a(n):

outlier


What points are included in the 5 number summary?

min, Q1, Q2, Q3, max


Definition:
 variable that is monitored as characterizing system performance/behavior 
response variable


Definition:
 variable over which an investigator exercises power, choosing a setting(s) for use in the study 
supervised (managed) variable


Name the 2 kinds of supervised variables.

Controlled, experimental variables


Definition:
 supervised variable with one setting (held constant) 
controlled variable


Definition:
 supervised variable with more than one setting 
experimental variable


Definition:
 categorical variables whose effect on the response variable we want to investigate (ex: brand) 
factors


Definition:
 the values of a factor (ex: brand A, brand B) 
levels


Definition:
 a combination of the values (levels) of each factor 
treatment


Name the 4 ways to deal with experimental error.

controlled variables
randomization blocking replication 

Definition:
 variables kept constant across experimental units 
controlled variables


Definition:
 chance determines the assignment of treatments  does not reduce experimental error, but averages out effects of lurking variables over treatments 
randomization


Definition:
 chosen to be fairly homogeneous  controls for differences in each because all the outcomes in each are affected similarly 
blocking


Definition:
 multiple experimental units per treatment (not just multiple measurements of experimental units, which only captures measurement error)  to be able to see trends vs. "flukes"  to quantify the amount of experimental error 
replication


What are the goals of using controlled variables?

 to keep the effects of a controlled variable from affecting conclusions about treatment effects
 to reduce experimental error 

What are the goals of using replication?

 to generalize to a larger population
 to allow us to quantify experimental error 

Definition:
 experimental design with 2 or more categorical experimental variables (factors) 
Factorial Design


3x4
Give the number of factors and each factor's corresponding number of levels. Give the number of treatments. 
2 factors.
Factor 1 has 3 levels, factor 2 has 4 levels. 3x4 = 12 treatments 

Definition:
 an effect attributable to combinations of variables above and beyond what can be predicted from the variables considered individually 
Interaction Effects


True or False?
The standard deviation can be negative. 
False


True or False?
The median is less sensitive than the mean to outliers. 
True


A television station is interested in predicting whether or not voters in its listening area are in favor of federal funding for abortions. It asks its viewers to phone in and indicate whether they are in favor of or opposed to this. Of the 2241 viewers who phoned in, 1547 (70.24%) were opposed to federal funding for abortions. The number 70.24% is
A. a statistic B. a parameter C. a sample D. a population 
A. a statistic


A survey records many variables of interest to the researchers conducting survey. Below are some of the variables from a survey conducted by the U.S. Postal Service. Which of the following variables in categorical?
A. country of residence B. number of people, both adults and children, living in the household C. total household income, before taxes, in 1993 D. age of respondent 
A. country of residence


In order to determine if smoking causes cancer, researchers surveyed a large sample of adults. For each adult they recorded whether the person had smoked regularly at any period in their life and whether the person had cancer. They then compared the proportion of cancer cases in those who had smoked regularly at some time in their lives with the proportion of cases in those who had never smoked regularly at any point in their lives. The researchers found there was a higher proportion of cancer cases among those who had smoked regularly than among those who had never smoked regularly. This is
a. an observational study b. a designed experiment c. a block design d. a controlled study 
a. an observational study


Sicklecell disease is a painful disorder of the red blood cells that affects mostly African Americans in the United States. To investigate whether the drug hydroxyurea can reduce the pain associated with sicklecell disease, a study by the National Institute of Health gave the drug to 150 sicklecell suffers and a placebo to another 150. The researchers then counted the number of episodes of pain reported by each subject. The response variable is
A. the drug hydroxyurea B. the number of episodes of pain C. the presence of sicklecell disease D. the number of red blood cells 
B. the number of episodes of pain


What are the goals of using randomization in an experiment?

To provide protection against a systematic effect of extraneous/lurking variables from affecting conclusions about treatment effects
