What are samples?

One kind of observational study


What are sample surveys?

select a part of population of interest to represent the whole are one type of observational study.


what is an observational study?

observes individuals and measures variables of interest but does not attempt to influence the responses. The purpose of an observational study is to describe some group or situation


what is an experiment?

deliberately imposes some treatment on individuals in order to observe their responses. The purpose of an experiment is to study whether the treatment causes a change in the response


what is cofounding?

Two variables (explanatory variables or lurking variables) are confounded when their effects on a response variable cannot be distinguished from each other


_____________ of the effect of one variable on another often fail because the explanatory variable is confunded with lurking variables

Observational studies


what is a population?

The population in a statistical study is the entire group of individuals about which we want information


what is a sample?

is a part of the population from which we actually collect information. We use a sample to draw conclusions about the entire population


what consists of a proper sample survey?

what population we want to describe
what we want to measure (give exact definitions of our variables) 

what is a sampling design?

is a specific method for choosing a sample from the population


what is a conviennce sample?

A sample selected by taking the members of the population that are easiest to reach .


what is a bias?

the design of a statistical study is biased if it systematically favors certain outcomes.


what is a voluntary response sample?

consists of people who choose themselves by responding to a broad appeal. Voluntary response samples are biased because people with strong opinion are most likely to response.


what is a simple random sample? (SRS)

A simple random sample of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.


What is a table of random digits?

is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with two properties


what are the two properties of a table of random digits?

Each entry in the table is equally likely to be any of the 10 digits 0 through 9.
The entries are independent of each other. That is knowledge of one part of the table gives no information about any other part. 

How do you use table B to choose an SRS?

Step 1. label. Give each member of the population a numerical label of the same length.
Step 2. Table. To choose an SRS, read from Table B successive groups of digits of the length you used as labels. Your sample contains the individuals whose labels you find in the table. 

what is a probability sample?

is a sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has.


what is a statified random sample?

first classify the population into groups of similar individuals called Strata, then choose a separate SRS in each stratum and combine these SRSs to form the full sample.
(Multistage samples) 

what is undercoverage?

occurs when some groups in the population are left out of the process of choosing the sample.


what is nonresponse?

occurs when an individual chosen for the sample cannot be contracted or refuses to participate.


what is a response bias?

 people would intend to lie or not tell the truth


____________ give more accurate results than smaller samples

Larger random samples


what are subjects?

The individuals studied in an experiment, particularly when they are people.


what are factors?

 The explanatory variables in an experiment.


what is treatment?

 is any specific experimental condition applied to the subjects if an experiment has several factors, a treatment is a combination of specific values of each factor.


what is a randomized comparative experiment?

An experiment that uses both comparison of two or more treatments and chance assignment of subjects to treatments


what is a Completely randomized experimental design?

In a completely randomized experimental design, all the subjects are allocated at random among all the treatments


What are the principles of experimental design?

The basic principles of statistical design of experiments are
Control the effects of lurking variables on the response, most simply by comparing two or more treatments. Randomize  use impersonal chance to assign subjects to treatments Use enough subjects in each group to reduce chance variation in the results. 

what is statisically significant?

An observed effect so large that it would rarely occur by chance


The logic of a randomized comparative experiment depends on ...

our ability to treat all the subjects identically in every way except for the actual treatments being compared.


what is a placebo?

dummy treatment


what is a placebo effect?

The response to a dummy treatment


what is Doubleblind experiment,?

neither the subjects nor the people who interact with them know which treatment each subject is receiving


The most serious potential weakness of experiments is lack of realism:

the subjects or treatments or setting an experiment may not realistically duplicate the conditions we really want to study.


________________of an experiment cannot tell us how far the results will generalize.

Statistical analysis


what is a block?

A block is a group of individuals that are known before the experiment to be similar in some way that is expected to affect the response to the treatments
In a block design, the random assignment of individuals to treatment is carried out separately within each block. 

A ___________ combines the idea of creating equivalent treatment groups by matching with the principle of forming treatment groups at random. _______ are another form of control.

block design
blocks 

The design of an experiment describes the choice of treatments and the manner ...

in which the subjects are assigned to the treatments.


what are matched pairs?

are a common form of blocking for comparing just two treatments.
The simplest form of control is comparison 

what is randomization?

uses chance to assign subjects to the treatments, creates treatment groups that are similar before the treatments are applied.


Which of the following is true?
(a) Points that lie below the leastsquares regression line will have positive residuals. (b) The sum of the leastsquares residuals is zero. (c) A regression line should be used to make predictions for x values that lie far outside the range of the data you collected. (d) r2 can take any value between –1 and + 1. 
(b) The sum of the leastsquares residuals is zero.


Two variables are said to be negatively associated if
a) above average values of one variable tend to accompany below average values of the other b) below average values of one variable tend to accompany below average values of the other c) above average values of one variable tend to accompany above average values of the other d) below average values of one variable can be accompanied by either above or below average values of the other 
a) above average values of one variable tend to accompany below average values of the other


You want to determine if there is a relationship between the lengths of feet in centimeters and the lengths of thigh bones in centimeters. You take 23 measurements on junior males at UNM, plot a scatterplot, and find that the correlation, r, is .67. The correlation would have units
a) centimeters b) inches c) centimeters times centimeters d) none of the above 
d) none of the above


A lurking variable is
a) a variable that is not among the variables studied but that affects the response variable b) the true cause of a response c) any variable that produces a large residual d) the true variable that is explained by the explanatory variable 
a) a variable that is not among the variables studied but that affects the response variable


When possible, the best way to establish that an observed association is the result of a causeandeffect relation is by means of
a) the least squares regression line b) a well designed experiment c) the correlation coefficient d) the square of the correlation coefficient 
b) a well designed experiment


A study to determine whether or not kicking a football filled with helium traveled farther than one filled with air found that, while the football filled with helium went, on average, farther than the one filled with air, the difference was not statistically significant. The response variable in this study
a) is the gas, air or helium, with which the football is filled b) does not exist without the statistical significance c) is the number of kickers d) is the distance the footballs travelled 
d) is the distance the footballs travelled


In a linear regression problem,
a) The distinction between the response variable and the explanatory variable is not important, and either variable may go on the X or Y axis. b) The distinction between the response variable and the explanatory variable is not important but the correlation coefficient does depend upon which one is the response variable. c) The distinction between the response variable and the explanatory variable is important and the response variable should be plotted on the Xaxis. d) The distinction between the response variable and the explanatory variable is important and the response variable should be plotted on the Yaxis. 
d) The distinction between the response variable and the explanatory variable is important and the response variable should be plotted on the Yaxis.


Once you have the regression equation for regressing the Y variable on the X variable, if you start at any point on the regression line and move a distance sx along the X axis, the distance you move along the Y axis is given by
a) rsy b) sy c) r d) rsx 
a) rsy


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 Assume that the standard deviations aaa = 13.5 and bbb = 17.9. Then the value of b, the slope in the regression equation, is a) .540 b) .949 c).677 d) .758 
b) .949


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 Assume that aaa and bbb were such that you calculated the slope of the regression line, b, to be 1.00. Then the intercept of the regression line, a, would be a) 47.2 b) 47.2 c) 6 d) .60 
(c) 6


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 If the regression equation turned out to be yhat = .8X + 26.6, what would be the predicted value for the Mortality ratio in a country where the Smoking ratio was 110. a) 114.6 b) 110 c) 103 d) 109 
(a) 114.6


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 If the regression equation turned out to be yhat = .8X + 26.6, and one of the countries had a Smoking rate of 100 and a Mortality rate of 100, the residual for this country would be a) 100 b) 6.6 c) 6.6 d) 100 
(b) 6.6


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 If the regression equation turned out to be yhat = .8X + 26.6, and the equation predicted a Mortality Rate of 113, what would have been the Smoking Rate for that country? a) 98 b) 100 c) 108 d)117 
(c) 108


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 aaa Mortality 109 bbb r = 0.716 What percent of the total variation in the Mortality Ratio is accounted for by the linear regression on the Smoking Ratio? a) 71.6% b) 84.6% c) .716% d) 51.3% 
(d) 51.3%


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 17.2 Mortality 109 26.1 r = 0.716 (3 points) What is the slope of the regression line to predict Mortality rate from Smoking rate? 
1.09
r(Sx/Sy)= slope 

Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 17.2 Mortality 109 26.1 r = 0.716 (4 points) What is the equation of the leastsquares regression line for predicting Mortality rate from Smoking rate? (If you don't know, put in some equation so you can answer questions c), d), and e)). 
90.25+1.09x
a=ybx y=a+bx 

Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 17.2 Mortality 109 26.1 r = 0.716 (3 points) Use the equation above to predict the relative mortality rate for an occupational group with a relative smoking rate of 110. 
calculate solution


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 17.2 Mortality 109 26.1 r = 0.716 (3 points) What relative smoking rate would have predicted a relative mortality rate of 100? 
calculate solution


Government statisticians conducted a study of the relationship between smoking and lung cancer in 25 occupational groups. “Smoking” is a ratio that is 100 if men in an occupation are exactly average in their smoking, below 100 if they smoke less than average, and above 100 if they smoke more than average. “Mortality” is also measured relative to the entire population of men of the same ages as those studied, and is greater or less than 100 when there are more or fewer deaths from lung cancer than would be expected based on the experience of all English men.
Summary Statistics Variable Mean Std Dev Smoking 103 17.2 Mortality 109 26.1 r = 0.716 (2 point) One of the countries in the sample study had values "Smoking Rate = 123, Mortality Rate = 115. What is the residual for this country? 
calculate solution


1) A news release for a diet products company reports: “There's good news for the 65 million Americans currently on a diet.” Its own study showed that people who lose weight can keep it off. The sample was 20 graduates of the company’s program who endorsed the program in commercials. The results of the sample are probably
a. biased, overstating the effectiveness of the diet. b. biased, understating the effectiveness of the diet c. unbiased since the people in the sample are nationally recognized individuals. d. unbiased but they could be more accurate. A larger sample size should be used. 
a. biased, overstating the effectiveness of the diet.


To do this, we will use the numerical labels attached to the names above and the following list of random digits, reading the list from the left to the right, starting at the beginning of the list.
11793 20495 05912 11384 44982 20051 27498 12009 45287 71753 70707 84533 2) The simple random sample of employees is a) 117 b) Bechhofer, Bechhofer again, and Taylor c) Bechhofer, Taylor, Vargas d) Kesten, Montoya, Taylor 
c) Bechhofer, Taylor, Vargas


3) Which of the following statements is true if we want to use a list of random digits to select a simple random sample?
a) If we use another list of random digits to select the sample, we would get the same result as that obtained with the list actually used. b) If we use another list of random digits to select the sample, we would get a completely different sample than that obtained with the list actually used. c) If we use another list of random digits to select the sample, we would get, at most, one name in common with that obtained with the list acutally used. d) If we use another list of random digits to select the sample, the result obtained with the list actually used would be just as likely to be selected as any other set of three names. 
d) If we use another list of random digits to select the sample, the result obtained with the list actually used would be
just as likely to be selected as any other set of three names. 

4) Which of the following is an example of simple random sampling to select a sample of size 100 from a population of
10000 individuals. a) Use the table of random digits to select a sample of 100 which contains 50 males and 50 females. b) From a 100page telephone book with 100 individuals on each page, use the table of random digits to select a number between 1 and 100, then select the individual with that number from each page of the telephone book. c) Select a sample of size 100 from the population by labeling every individual with a number between 0000 and 9999 and use the table of random digits to choose the 100 individuals. d) all of the above are examples of simple random sampling. 
c) Select a sample of size 100 from the population by labeling every individual with a number between 0000 and
9999 and use the table of random digits to choose the 100 individuals. 

5) A Stat 145 class has 55 students of which 25 are males and 30 are females. I select a random sample of 5 males and 6 females. The two samples are combined to give an overall sample of 11 students. The overall sample is
a) a simple random sample b) a stratified random sample c) a multistage sample d) a simple random sample involving blocking. 
b) a stratified random sample


6) A Senator wants to know what the voters of his state think of proposed legislation on gun control. She selects a simple random sample of 2500 voters and mails them a questionnaire on the subject. Her staff reports that 448 questionnaires have been returned, of which 343 support the legislation. The population is
a) the 448 letters received b) the voters in the state c) the 343 letters supporting the legislatioin d) the 2500 voters receiving the questionnaire 
b) the voters in the state


USE THE FOLLOWING TO ANSWER QUESTIONS 7 and 8
We wish to choose a simple random sample of size 3 from the following employees : 1. Bechhofer 4. Kesten 7. Taylor 2. Brown 5. Montoya 8. Wald 3. Ito 6. O'hara 9. Vargas To do this, we will use the numerical labels attached to the names above and the following list of random digits, reading the list from the left to the right, starting at the beginning of the list. 11793 20495 05912 11384 44982 20051 27498 12009 45287 71753 70707 84533 8) An experiment in which all the subjects are allocated at random among all the treatments is called a a) matched pairs design b) completely randomized design c) double blind experiment d) blocking design 
b) completely randomized design


USE THE FOLLOWING TO ANSWER QUESTIONS 7 and 8
We wish to choose a simple random sample of size 3 from the following employees : 1. Bechhofer 4. Kesten 7. Taylor 2. Brown 5. Montoya 8. Wald 3. Ito 6. O'hara 9. Vargas To do this, we will use the numerical labels attached to the names above and the following list of random digits, reading the list from the left to the right, starting at the beginning of the list. 11793 20495 05912 11384 44982 20051 27498 12009 45287 71753 70707 84533 7) An experiment was conducted by some students to explore the nature of the relationship between a person's heart rate (measured in beats per minute) and the frequency at which that person stepped up and down on steps of various heights. Three rates of stepping and three different step heights were used. A subject performed the activity (stepping at one of the three stepping rates at one of the three heights) for three minutes. Heart rate was then measured at the end of the period. In this experiment, the numbers of response variables, factors and treatments (in that order) are a) 2, 6, 3 b) 1, 2, 3 c) 1, 2, 9 d) 2, 9, 1 
c) 1, 2, 9


9) A researcher conducts a study to investigate the effects of exercise and diet on mood. The factors in the study are
a) whether randomization and placeboes were used b) whether the experiment was double blind c) the number of subjects d) exercise and diet 
d) exercise and diet


when the words "total variation", "linear regression", or just plain "regression", what type of equation should be used?

r^2
