 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
Reading...
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
Play button
Play button
128 Cards in this Set
 Front
 Back
inference

the process of learning about a population by studying a sample


sample regression

estimates the association between x and y in the entire population


regression line

an estimate from a sample trying to describe the true regression line from the population


observational study

a statistical study in which the subjects are not modified (just observed) so that researchers can measure and record certain characteristics


experiment (experimental study)

A statistical study in which a "treatment" is applied to the subjects (i.e. they are modified) and researchers measure the effect of the treatment


lurking variable (confounding variable)

other variables that may influence the response that are not studied


explanatory variable

variable that explains or causes the differences in another variable, ( "x" or independent variable)


response variable

variable which is thought to depend on the value of the explanatory variable, ("y", dependent variable)


study question

the question about the population that the study is attempting to answer


population

the complete set of all individuals/objects the study is attempting to answer a question about, the whole group of individuals we are interested in


study subjects

the individuals actually measured in the study (i.e. the selected sample of individuals/objects from the population)


treatment

what the research does/gives to some or all of the study subjects; the factor whose effect is under study; also called the explanatory variable


response variable

the quantity or characteristic that is measured to determine the treatment effect


control group

group of subjects that have the same sources of variability as those receiving the treatment but does NOT receive treatment; sometimes called the placebo group


confounding factor

any factor other than the experimental treatment that can affect the response variable in the experiment


completely randomized design

a design in which the treatments in the experiment are randomly assigned to the experimental units without using matched pairs or blocks


researchers

people who make measurements


single blinding

subject doesn't know if he/she is in the treatment or control group


double blinding

neither RESEARCHERS nor SUBJECTS know where the participants are assigned between the control and treatment group


matched pair design

makes two measures on each subject


blocking design

extension of completely randomized design
 put similar subjects into blocks, expect the blocks to differ with respect to the response variable then do a completely randomized experiment within each block 

block

a group of subjects that are similar in some way


"blocks" refers to ...

individuals


"experimental units" refers to...

repeated time periods in which the blocks receive the varying treatments


scatter plot

used to compare variables
must measure two variables on a common individual (an individual can be a person, place, or even time) then plot the two variables 

positive association

this type of association occurs when the value of one variable tends to increase as the value of the other variable increases


negative association

this type of association occurs when the value of one variable tends to decrease as the value of the other variable tends to increase


nonlinear association

this type of association occurs when there is no linear relationship between two values


correlation

a number that indicates the strength and the association of a straightline relationship between two quantitative variables


strength of correlation

determined by the absolute value of the correlation, indicates the overall closeness of the points to a straight line


direction of the correlation

determined by the sign of the correlation


magnitude of r

absolute value of r, indicates the strength of the relationship


r = 1 or r = 1

indicates that there is a perfect linear relationship and all data points fall in the straight line


squared correlation, r²

this is the proportion of variation in the response variable that is explained by the explanatory variable. It is positive between 0 and 1.
Referring to a correllation 

r

correlation coefficient, used to measure linear relationship between x and y


the line of best fit

this estimates the average value of y when you know x and individual's values will vary around the predicted value
 can be used to give a prediction of a value of y, given a specific value of x 

randomization test

a test on two groups when paired data is NOT available


sampling frame

a list of all individuals in the population


in hypothesis testing, population parameter =

null value


null hypothesis

the statement being tested
a statement that describe some aspect of the statistical behavior of a set of data this statement is treated as valid unless the actual behavior of the data contradicts this assumption 

null value

the specific # the parameter equals if the null hypothesis is true
 value of population parameter being tested in the null hypothesis 

alternative hypothesis

 a statement that something is happening
 researchers want to prove this  it may be a statement that the assumed status quo is false, or that there is a relationship, or there is a difference 

two types of alternative hypothesis

one sided test, two sided test


onesided test

when Ha specifies a single direction


twosided test

when Ha includes values in both directions


pvalue

the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming Ho is true


level of significance

(α) is the border line for deciding that the pvalue is low enough to justify choosing the alternative hypothesis


hypothesis testing about paired differences

matched pairs design


matched pairs design

taking two measures on the same subject to see if there is a difference between the two measurements


paired ttest

a onesample ttest used on the sample of differences to examine whether the sample mean difference is significantly different from 0


sampling distribution

describes the possible values the statistic might have when random samples are taken from a population
the distribution of statistics ("xbar" or "p hat") for all possible samples from the same population of a given sample size (n) 

statistical inference

gives us methods for drawing conclusions about a population based on data from samples


confidence interval

an interval of values computed from sample data that is likely to include the true population


standard error

is the estimated standard deviation of the sample distribution of the statistic


confidence level

proportion of samples for which the confidence interval will capture the true parameters, % of time we expect the procedure to work, determines how frequently the observed interval contains the parameter


standard error of sample mean

(s) is the sample standard deviation


statistic

a number summarized by the same characteristic of the sample data, computed from the sample values, a known value that varies from sample to sample


is the distribution of possible values of the statistic for repeated samples of the same size taken from the same population

sampling distribution


mean of a sampling distribution

the average of all possible values of the statistic for repeated samples of the same size from a population


the standard deviation(SD) of a sampling distribution

measures the average distance of the possible values of the statistic from the mean of the sampling distribution, roughly speaking


there is a difference between N and n!
N= n= 
n= sample size (number of values in one sample/subgroup)
N= number of samples (number of subgroups) 

Law of Large Numbers (LLN)

as you average more observations, sample mean settles down at population mean


graphs used for categorical variables

1. pie chart
2. bar graph 

graphic representations for quantitative variables

1. histogram
2. stemandleaf plot 3. box plot 

standard deviation

a value that measures the variability (spread) of data.


density curve

the outline of the histogram which approximates the overall pattern of a distribution
1. Its always on or above the horizontal axis 2. It has area of exactly 1 underneath it 

standard normal distribution

this is a normal distribution with a mean of 0 and a standard deviation of 1
all other normal distributions are compared to this 

zscore

(a standardized value) that is the distance between a specified value and the mean, measured in number of standard deviations


observation (individual)

an individual or the value of a single measurement


variable

a characteristic that can differ from one individual to the next


categorical variables

the observational units are being divided into units, there is no special ordering of the categories


ordinal variables

the observational units are being divided into categories which have an order
basically a categorical variable with ordered categories 

quantitative variables

variables that take numerical values
 you should be able to do mathematical operations with these numbers such as adding, multiplying, etc. (A social security number would not be one of these) 

graphs for quantitative variables

1. Histogram
2. StemandLeaf Plot 3. Dot Plot 

Pie Chart

each slice of a pie corresponds to a category and the size of the angle of the slice shows the percentage of the individuals in the corresponding category


Bar Graph

each category is presented as a bar
 the height of the bar represents the number (or percentage) of individuals in the corresponding category 

range

highest value subtract the lowest value


histogram

bar graphs for a quantitative range of possible value are broken into categories


frequency

actual number of individuals who fall into each interval (of a histogram)


relative frequency

proportion or percentage that are in an interval (of a histogram)


stem and leaf plot

every individual data value is shown


dot plot

display a dot for each observation along a number line


distribution

the overall pattern of how often the possible values occur


shape of a distribution

shows how values are distributed in a distribution


center

location, average, mean and median measure this


outlier

unusual values that do not fit with the rest of the pattern
(may be due to data entry errors or may be actual unusual values) 

symmetric distribution

one half of the distribution is the mirror image of the other (bell shape)


bimodal distributions

has two peaks which can be caused by two or more groups of values in the sample


multimodal distribution

distribution with several peaks


median

the middle number of the data when it is ordered, 50% of the data is above it and 50% of the data is below it


two measures of the center

mean and median


symmetric distribution
(mean ? median) 
mean = median


right skewed distribution
(mean ? median) 
mean>median
mean is greater than median 

left skewed distribution
(mean ? median) 
mean<median
mean is less than median 

First Quartile (Q1)

25% of the data is at or below this number


Third Quartile (Q3)

75% of the data is at or below this number


InterQuartile Range (IQR)

A value describing the spread over approximately the middle 50% of the data


the five number summary includes

1) maximum
2) minimum 3) Q1 4) median 5) Q3 

boxplot

a graphical representation of the 5 number summary


1.5*IQ Rule

an outlier is any value that lays more than one and a half times the length of the box


variance

measures the distance of all individuals from the mean


strata

sub groups of population which might have different responses to the question of interest


stratified sample

is a collection of samples taken in each stratum of the population


cluster samples

sampling technique used when natural groups are evident in a statistical population


systematic samples

select ever kth individual from the sampling frame


under coverage

sampling frame does not include all the population


over coverage

sampling frame includes individuals who are not in the population being examined


data entry errors

person recording the data makes mistakes


question wording error

the set up of the question can have a big influence on the answers


definition of statistics

a collection of procedures and principles for gathering data and analyzing information to help people make decisions when face with uncertainty


individuals

the objects described by the data set
(each student in the class is an observational unit or individual) 

variables

characteristics of the individuals
(max speed, sex of the students, height, time of sleep) 

sample

subgroup of the population examined to measure the variables and gather information


parameter

a number that describes a characteristic of the population. It is mostly a summary of a population. It's value is unknown.


statistic

summary of a sample, the value of this is usually known


census

taken to measure ALL individuals in the population


selection bias

this method of selection of participants favors a particular outcome


non response bias

some part of the individuals in the sample cannot be reached or do not respond, this creates a bias because respondents may differ in meaningful ways from nonrespondents.


response bias

participants give incorrect information


response rate

the proportion of the sample that responded to the question


nonresponse rate

the proportion of the sample that didn't respond to the question


convenience samples

investigators choose individuals that are easy to reach


volunteer response samples

individuals decide whether to answer the questions or not


simple random sample

definition?


statistical significance

a result is unlikely to have occurred just by chance


practical significance

the difference from the claimed value we observe is actually meaningful


numbers in"stem"column of stem and leaf plot

first digit of each number in the data set


numbers in"leaf"column of stem and leaf plot

contains only the last digit of the # regardless of whether it falls before or after the decimal point
