Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
90 Cards in this Set
- Front
- Back
alternative hypothesis
|
a statement about the value of a parameter that is either "less than" or "greater than" or "not equal to" a hypothesized number or another parameter; the hypothesis the researcher usually wants to prove or verify
|
|
Analysis of Variance
|
ANOVA; a procedure used to test equality of three or more means
|
|
approximate sampling distribution
|
the distribution of the x bar values obtained from repeatedly taking SRSs of the same size from the same population
|
|
approximate t test
|
a test for comparing the means of two independent samples or two treatments where the test statistic has an approximate t distribution. this is the preferred two sample t test, but requires statistical software
|
|
association
|
for quantitative data, large values of one variable tend to occur with large (or small) values of another variable. For categorical data, certain responses for one variable tend to occur with certain responses of the other variable.
|
|
association vs. causation
|
we can only argue causation from association if the results having significant association are from an experiment
|
|
bar graph
|
a graphical representation of categorical data. Names of each category are listed on the x axis and a bar that has height representing the frequency or percentage in that category is placed over each category name.
|
|
bias
|
a condition that occurs when the design of a study systematically favors certain outcomes
|
|
bivariate data
|
two measurements are made on each unit
|
|
block
|
a group of experimental units sharing some common characteristic. In a randomized complete block design, random allocation of treatments is carried out separately within each group.
|
|
boxplot
|
A plot of data that incorporates the maximum observation, the minimum observation, the first quartile, the second quartile (median) and the third quartile
|
|
categorical (or qualitative) variable
|
a variable that can be classified into groups or categories, such as gender and religion
|
|
causation
|
changes in the explanatory variable directly affect the response variable. experiments are needed to verify causation
|
|
census
|
the enumeration of every unit in a population
|
|
center
|
a summary number about which observations tend to cluster. measures of center include the mean and the median
|
|
center line
|
the middle line on a control chart. its value is the target value of the mean when the process is in control
|
|
Central Limit Theorem
|
CLT; the name of the theorem stating that the sampling distribution of a statistic is approximately normal whenever the sample is large and random
|
|
chi-distribution
|
the theoretical distribution that models the test statistic for doing chi-square tests
|
|
chi-square test statistic
|
a test statistic computed from data that has an approximate chi-square distribution
|
|
claimed parameter value
|
the value of the parameter as given in the null hypothesis
|
|
collapsed table
|
a contingency table where the counts in two (or more) rows or two (or more) columns have been added to form a single row or column
|
|
completely randomized design
|
an experimental design where all experimental units are assigned at random to treatments
|
|
comparison study
|
a study that compares only active treatments to determine which works best
|
|
conditions
|
the basic premises that must be checked using a statistical procedure
|
|
conditional distribution
|
the distribution of one variable restricted to a single row (or column) of another variable in a two way table. a conditional distribution is found by dividing the values in the row (or column) total.
|
|
conditional percentage
|
in a contingency table, the percentage of a category in a row (or column) found by dividing the appropriate cell count by the row (or column) total.
|
|
confidence interval
|
an estimate of the value of a parameter in interval form with an associated level of confidence. it gives a list of plausible values for the parameter based on the value of the statistic
|
|
confidence level
|
the percentage of all possible samples for which the confidence intervals will contain the parameter being estimated; selected subjectively by the researcher
|
|
confounding
|
a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable
|
|
conservative t test
|
a test for comparing the means from two independent samples or two treatments where the degrees of freedom are taken to be the minimum (n_1-1) and (n_2-1). The approximate t test is recommended when using statistical software
|
|
control treatment
|
a treatment where no experimental condition is applied to the units in order to determine whether the active treatments affect the response. This enables the researcher to "control' for lurking variables
|
|
control chart
|
a chart having a center line and upper and lower control limits used to determine whether a process is in control or out of control
|
|
control limits
|
lines on either side of the center line computed using (μ − 3σ)/sqrt(n) and (μ + 3σ)/sqrt(n). A sample mean outside these bounds signals the process is out of control.
|
|
convenience sample
|
a sample type where the researcher contacts those subjects who are readily available and does not use any random selection. The results are almost always biased
|
|
correlation coefficient
|
a measure of the strength of the linear relationship between two quantitative variables
|
|
data
|
info collected on individuals
|
|
degrees of freedom
|
a characteristic of the t-distribution (and other distributions like f and chi square) indicating the amount o f information available in the data.
|
|
density curve
|
a mathematical model used to describe the overall pattern of the distribution of a random variable
|
|
deviation
|
the difference (distance) between an observation and the mean of all the observations in a data set, or the difference between an observation and the corresponding regression model estimate.
|
|
direction of relationship
|
a characteristic of data in a scatterplot that is identified as either a positive or negative association
|
|
distribution
|
a list of the possible values of a variable together with the frequency (or probability) of each value
|
|
dotplot
|
a one dimensional plot of a quantitative data set where each value in the data set is represented by a dot above its corresponding location on the x axis
|
|
double blind study
|
an experiment where neither the subjects nor the diagnosticians know which treatment is administered to whom
|
|
equal variance
|
(equal standard deviation) variances for each of the treament groups or samples in ANOVA are all equal. In regression, the variances of the ys at each x are all assumed to be equal
|
|
estimate of a parameter
|
a single value or a range of values used to estimate a prameter
|
|
expected count
|
an estimate of how many observations should be in a cell of a two way table if there were no association between the row and column variables
|
|
experiment
|
a study where treatments are deliberately imposed on the individuals in the study before data is gathered in order to observe their responses to the treatment
|
|
explained variation
|
the amount of total variation in the ys that is accounted for by a regression model; it is equal to Σ(y^ −y)2
|
|
explanatory variable
|
a variable that may or may not explain the outcomes (responses) of a study, also called independent or predictor variable
|
|
extrapolation
|
using a model to predict a y value for an x value that is outside the range of observed xs. Extrapolation is dangerous and strongly discouraged because the relationship between x and y may be different outside the range of observed xs.
|
|
factor
|
a term synonymous with explanatory variable
|
|
fail to reject H_o
|
the appropriate statistical conclusion in hypothesis testing when the p-value is greater than alpha; equivalently conclude that "there is not enough evidence to believe H_a
|
|
failure
|
any category that is not of primary interest in a qualitative data set
|
|
f distribution
|
the distribution that models the ratio of two variance estimates; used in ANOVA for obtaining the p-value for testing equality of three or more means
|
|
five number summary
|
these five values: minimum, Q1, median, Q3, maximum;
preferred numerical summary when data are very skewed or outliers are present. |
|
follow-up analysis
|
the analysis performed on data after an overall test on the equality of multiple means or the equality of multiple proportions is found to be significant. It determines which means or which proportions differ from which
|
|
form of relationship
|
a description of data in a scatterplot indicating whether the data have a linear relationship, a curved relationship or no relationship
|
|
f test statistic
|
a test statistic that has an F distribution
|
|
histogram
|
a graphical display of a quantitative data set; data are grouped into intervals (usually of equal width) and a bar is drawn over each interval having height proportional to the frequency (or percentage) of values in the interval. Values of the variable are given on the x axis and frequencies (or percentages) are given on the y axis. Histograms are examined to determine shape, center and spread
|
|
in control
|
a process functioning within acceptable limits
|
|
independent sample
|
SRSs collected separately from each of two (or more) disjoint populations; matched pairs data are considered to be dependent samples
|
|
individual
|
each object or unit described in a data set
|
|
inference
|
using results from a sample statistic value to draw conclusions about the population parameter
|
|
influential point
|
an observation that substantially alters the fitted regression equation
|
|
interaction
|
a situation that occurs in an experiment when the effect of one explanatory variable on the response variable is not the same across all levels of another explanatory variable
|
|
interquartile range
|
The difference between Q3 and Q1 (i.e., Q3 - Q1); the length of the box in a boxplot.
|
|
interviewer bias
|
bias introduced into survey results by body language, voice intonation, gender, race, etc. of an interviewer
|
|
lack of realism
|
a weakness in experiment where the setting of the experiment does not realistically duplicate the conditions we really want to study.
|
|
law of large numbers
|
the fact that the average (x bar) of observed values in a sample will tend to get closer and closer to the true mean as the sample size increases
|
|
leas squares regression line
|
the line that minimizes the sum of squared residuals
|
|
left skewed
|
a density curve where the left side of the distribution extends in a long tail (mean<median)
|
|
left-tailed alternative hypothesis
|
an alternative hypothesis that states the parameter value is less than some number or the parameter from another treatment or population
|
|
location measure
|
a summary number that tells the location (typically the center) of a data set on the number line
|
|
lower tailed alternative hypothesis
|
another name for left tailed
|
|
lurking variable
|
a variable that the researcher is not necessarily interested in studying but which affects the relationship between the explanatory variable and the response variable
|
|
marginal distribution
|
the distribution of only one variable in a two way table by using counts found by summing over the categories of the other variable
|
|
marginal percentage
|
the percentage for a row (or column) total in a two way table found by dividing the row (or column) total by the table total
|
|
multiple analyses
|
performing two or more tests of significance on the same data set. this inflates the overall alpha for the tests
|
|
multi-stage sample
|
a type of sample from a population that has groups and sub-groups
|
|
observed affect
|
the difference between the observed value of the statistic and the hypothesized value of the corresponding parameter
|
|
observed statistic
|
the value of the statistic computed from the data
|
|
pooled sample proportion
|
The value used for p$ when computing
sample proportion z test statistic. To compute, add the number of successes in both samples and divide by the sum of the two sample sizes. |
|
power (1-β)
|
the probability of rejecting a false null hypothesis
|
|
r squared
|
the percentage of total variation in y, the response variable, that is accounted for by the regression of y on x
|
|
robust
|
a statistical procedure that is insensitive to moderate deviations from an assumption upon which it is based; e.g. t procedures give p values and confidence levels that are very close to correct even when the data are not normally distributed
|
|
significance level
|
(α): Probability of a Type I error, i.e. probability of rejecting a true null hypothesis; the largest risk of rejecting a true null hypothesis that a researcher is willing to take
|
|
Simpson's paradox
|
A condition leading to misinterpretation of the direction of association between
two variables caused by ignoring a third variable that is associated with both of the studied variables. (airline example with weather) |
|
t distribution
|
a distribution specified by degrees of freedom used to model test statistics for the sample mean, differences between sample means, etc. where sigma is unknown.
|
|
test of homogeneity
|
A chi-square test on data collected from independent SRS’s from each of several
populations. The null hypothesis states that the proportions for each of the categories of the response variable is the same for all populations. |
|
test of independence
|
A chi-square test on data collected from a single SRS with two categorical
measurements on each individual. The null hypothesis states that there is no relationship between the two categorical variables. |