• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/90

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

90 Cards in this Set

  • Front
  • Back
alternative hypothesis
a statement about the value of a parameter that is either "less than" or "greater than" or "not equal to" a hypothesized number or another parameter; the hypothesis the researcher usually wants to prove or verify
Analysis of Variance
ANOVA; a procedure used to test equality of three or more means
approximate sampling distribution
the distribution of the x bar values obtained from repeatedly taking SRSs of the same size from the same population
approximate t test
a test for comparing the means of two independent samples or two treatments where the test statistic has an approximate t distribution. this is the preferred two sample t test, but requires statistical software
association
for quantitative data, large values of one variable tend to occur with large (or small) values of another variable. For categorical data, certain responses for one variable tend to occur with certain responses of the other variable.
association vs. causation
we can only argue causation from association if the results having significant association are from an experiment
bar graph
a graphical representation of categorical data. Names of each category are listed on the x axis and a bar that has height representing the frequency or percentage in that category is placed over each category name.
bias
a condition that occurs when the design of a study systematically favors certain outcomes
bivariate data
two measurements are made on each unit
block
a group of experimental units sharing some common characteristic. In a randomized complete block design, random allocation of treatments is carried out separately within each group.
boxplot
A plot of data that incorporates the maximum observation, the minimum observation, the first quartile, the second quartile (median) and the third quartile
categorical (or qualitative) variable
a variable that can be classified into groups or categories, such as gender and religion
causation
changes in the explanatory variable directly affect the response variable. experiments are needed to verify causation
census
the enumeration of every unit in a population
center
a summary number about which observations tend to cluster. measures of center include the mean and the median
center line
the middle line on a control chart. its value is the target value of the mean when the process is in control
Central Limit Theorem
CLT; the name of the theorem stating that the sampling distribution of a statistic is approximately normal whenever the sample is large and random
chi-distribution
the theoretical distribution that models the test statistic for doing chi-square tests
chi-square test statistic
a test statistic computed from data that has an approximate chi-square distribution
claimed parameter value
the value of the parameter as given in the null hypothesis
collapsed table
a contingency table where the counts in two (or more) rows or two (or more) columns have been added to form a single row or column
completely randomized design
an experimental design where all experimental units are assigned at random to treatments
comparison study
a study that compares only active treatments to determine which works best
conditions
the basic premises that must be checked using a statistical procedure
conditional distribution
the distribution of one variable restricted to a single row (or column) of another variable in a two way table. a conditional distribution is found by dividing the values in the row (or column) total.
conditional percentage
in a contingency table, the percentage of a category in a row (or column) found by dividing the appropriate cell count by the row (or column) total.
confidence interval
an estimate of the value of a parameter in interval form with an associated level of confidence. it gives a list of plausible values for the parameter based on the value of the statistic
confidence level
the percentage of all possible samples for which the confidence intervals will contain the parameter being estimated; selected subjectively by the researcher
confounding
a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable
conservative t test
a test for comparing the means from two independent samples or two treatments where the degrees of freedom are taken to be the minimum (n_1-1) and (n_2-1). The approximate t test is recommended when using statistical software
control treatment
a treatment where no experimental condition is applied to the units in order to determine whether the active treatments affect the response. This enables the researcher to "control' for lurking variables
control chart
a chart having a center line and upper and lower control limits used to determine whether a process is in control or out of control
control limits
lines on either side of the center line computed using (μ − 3σ)/sqrt(n) and (μ + 3σ)/sqrt(n). A sample mean outside these bounds signals the process is out of control.
convenience sample
a sample type where the researcher contacts those subjects who are readily available and does not use any random selection. The results are almost always biased
correlation coefficient
a measure of the strength of the linear relationship between two quantitative variables
data
info collected on individuals
degrees of freedom
a characteristic of the t-distribution (and other distributions like f and chi square) indicating the amount o f information available in the data.
density curve
a mathematical model used to describe the overall pattern of the distribution of a random variable
deviation
the difference (distance) between an observation and the mean of all the observations in a data set, or the difference between an observation and the corresponding regression model estimate.
direction of relationship
a characteristic of data in a scatterplot that is identified as either a positive or negative association
distribution
a list of the possible values of a variable together with the frequency (or probability) of each value
dotplot
a one dimensional plot of a quantitative data set where each value in the data set is represented by a dot above its corresponding location on the x axis
double blind study
an experiment where neither the subjects nor the diagnosticians know which treatment is administered to whom
equal variance
(equal standard deviation) variances for each of the treament groups or samples in ANOVA are all equal. In regression, the variances of the ys at each x are all assumed to be equal
estimate of a parameter
a single value or a range of values used to estimate a prameter
expected count
an estimate of how many observations should be in a cell of a two way table if there were no association between the row and column variables
experiment
a study where treatments are deliberately imposed on the individuals in the study before data is gathered in order to observe their responses to the treatment
explained variation
the amount of total variation in the ys that is accounted for by a regression model; it is equal to Σ(y^ −y)2
explanatory variable
a variable that may or may not explain the outcomes (responses) of a study, also called independent or predictor variable
extrapolation
using a model to predict a y value for an x value that is outside the range of observed xs. Extrapolation is dangerous and strongly discouraged because the relationship between x and y may be different outside the range of observed xs.
factor
a term synonymous with explanatory variable
fail to reject H_o
the appropriate statistical conclusion in hypothesis testing when the p-value is greater than alpha; equivalently conclude that "there is not enough evidence to believe H_a
failure
any category that is not of primary interest in a qualitative data set
f distribution
the distribution that models the ratio of two variance estimates; used in ANOVA for obtaining the p-value for testing equality of three or more means
five number summary
these five values: minimum, Q1, median, Q3, maximum;
preferred numerical summary when data are very skewed or outliers are present.
follow-up analysis
the analysis performed on data after an overall test on the equality of multiple means or the equality of multiple proportions is found to be significant. It determines which means or which proportions differ from which
form of relationship
a description of data in a scatterplot indicating whether the data have a linear relationship, a curved relationship or no relationship
f test statistic
a test statistic that has an F distribution
histogram
a graphical display of a quantitative data set; data are grouped into intervals (usually of equal width) and a bar is drawn over each interval having height proportional to the frequency (or percentage) of values in the interval. Values of the variable are given on the x axis and frequencies (or percentages) are given on the y axis. Histograms are examined to determine shape, center and spread
in control
a process functioning within acceptable limits
independent sample
SRSs collected separately from each of two (or more) disjoint populations; matched pairs data are considered to be dependent samples
individual
each object or unit described in a data set
inference
using results from a sample statistic value to draw conclusions about the population parameter
influential point
an observation that substantially alters the fitted regression equation
interaction
a situation that occurs in an experiment when the effect of one explanatory variable on the response variable is not the same across all levels of another explanatory variable
interquartile range
The difference between Q3 and Q1 (i.e., Q3 - Q1); the length of the box in a boxplot.
interviewer bias
bias introduced into survey results by body language, voice intonation, gender, race, etc. of an interviewer
lack of realism
a weakness in experiment where the setting of the experiment does not realistically duplicate the conditions we really want to study.
law of large numbers
the fact that the average (x bar) of observed values in a sample will tend to get closer and closer to the true mean as the sample size increases
leas squares regression line
the line that minimizes the sum of squared residuals
left skewed
a density curve where the left side of the distribution extends in a long tail (mean<median)
left-tailed alternative hypothesis
an alternative hypothesis that states the parameter value is less than some number or the parameter from another treatment or population
location measure
a summary number that tells the location (typically the center) of a data set on the number line
lower tailed alternative hypothesis
another name for left tailed
lurking variable
a variable that the researcher is not necessarily interested in studying but which affects the relationship between the explanatory variable and the response variable
marginal distribution
the distribution of only one variable in a two way table by using counts found by summing over the categories of the other variable
marginal percentage
the percentage for a row (or column) total in a two way table found by dividing the row (or column) total by the table total
multiple analyses
performing two or more tests of significance on the same data set. this inflates the overall alpha for the tests
multi-stage sample
a type of sample from a population that has groups and sub-groups
observed affect
the difference between the observed value of the statistic and the hypothesized value of the corresponding parameter
observed statistic
the value of the statistic computed from the data
pooled sample proportion
The value used for p$ when computing
sample proportion z test statistic. To compute, add the number of successes in both samples and divide
by the sum of the two sample sizes.
power (1-β)
the probability of rejecting a false null hypothesis
r squared
the percentage of total variation in y, the response variable, that is accounted for by the regression of y on x
robust
a statistical procedure that is insensitive to moderate deviations from an assumption upon which it is based; e.g. t procedures give p values and confidence levels that are very close to correct even when the data are not normally distributed
significance level
(α): Probability of a Type I error, i.e. probability of rejecting a true null hypothesis; the largest risk of rejecting a true null hypothesis that a researcher is willing to take
Simpson's paradox
A condition leading to misinterpretation of the direction of association between
two variables caused by ignoring a third variable that is associated with both of the studied variables. (airline example with weather)
t distribution
a distribution specified by degrees of freedom used to model test statistics for the sample mean, differences between sample means, etc. where sigma is unknown.
test of homogeneity
A chi-square test on data collected from independent SRS’s from each of several
populations. The null hypothesis states that the proportions for each of the categories of the response
variable is the same for all populations.
test of independence
A chi-square test on data collected from a single SRS with two categorical
measurements on each individual. The null hypothesis states that there is no relationship between the
two categorical variables.