Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
statistics

Statistics

by kumquatc11, Apr. 2008

Subjects: stats

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/90

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

90 Cards in this Set

Front
Back

	alternative hypothesis	a statement about the value of a parameter that is either "less than" or "greater than" or "not equal to" a hypothesized number or another parameter; the hypothesis the researcher usually wants to prove or verify
	Analysis of Variance	ANOVA; a procedure used to test equality of three or more means
	approximate sampling distribution	the distribution of the x bar values obtained from repeatedly taking SRSs of the same size from the same population
	approximate t test	a test for comparing the means of two independent samples or two treatments where the test statistic has an approximate t distribution. this is the preferred two sample t test, but requires statistical software
	association	for quantitative data, large values of one variable tend to occur with large (or small) values of another variable. For categorical data, certain responses for one variable tend to occur with certain responses of the other variable.
	association vs. causation	we can only argue causation from association if the results having significant association are from an experiment
	bar graph	a graphical representation of categorical data. Names of each category are listed on the x axis and a bar that has height representing the frequency or percentage in that category is placed over each category name.
	bias	a condition that occurs when the design of a study systematically favors certain outcomes
	bivariate data	two measurements are made on each unit
	block	a group of experimental units sharing some common characteristic. In a randomized complete block design, random allocation of treatments is carried out separately within each group.
	boxplot	A plot of data that incorporates the maximum observation, the minimum observation, the first quartile, the second quartile (median) and the third quartile
	categorical (or qualitative) variable	a variable that can be classified into groups or categories, such as gender and religion
	causation	changes in the explanatory variable directly affect the response variable. experiments are needed to verify causation
	census	the enumeration of every unit in a population
	center	a summary number about which observations tend to cluster. measures of center include the mean and the median
	center line	the middle line on a control chart. its value is the target value of the mean when the process is in control
	Central Limit Theorem	CLT; the name of the theorem stating that the sampling distribution of a statistic is approximately normal whenever the sample is large and random
	chi-distribution	the theoretical distribution that models the test statistic for doing chi-square tests
	chi-square test statistic	a test statistic computed from data that has an approximate chi-square distribution
	claimed parameter value	the value of the parameter as given in the null hypothesis
	collapsed table	a contingency table where the counts in two (or more) rows or two (or more) columns have been added to form a single row or column
	completely randomized design	an experimental design where all experimental units are assigned at random to treatments
	comparison study	a study that compares only active treatments to determine which works best
	conditions	the basic premises that must be checked using a statistical procedure
	conditional distribution	the distribution of one variable restricted to a single row (or column) of another variable in a two way table. a conditional distribution is found by dividing the values in the row (or column) total.
	conditional percentage	in a contingency table, the percentage of a category in a row (or column) found by dividing the appropriate cell count by the row (or column) total.
	confidence interval	an estimate of the value of a parameter in interval form with an associated level of confidence. it gives a list of plausible values for the parameter based on the value of the statistic
	confidence level	the percentage of all possible samples for which the confidence intervals will contain the parameter being estimated; selected subjectively by the researcher
	confounding	a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable
	conservative t test	a test for comparing the means from two independent samples or two treatments where the degrees of freedom are taken to be the minimum (n_1-1) and (n_2-1). The approximate t test is recommended when using statistical software
	control treatment	a treatment where no experimental condition is applied to the units in order to determine whether the active treatments affect the response. This enables the researcher to "control' for lurking variables
	control chart	a chart having a center line and upper and lower control limits used to determine whether a process is in control or out of control
	control limits	lines on either side of the center line computed using (μ − 3σ)/sqrt(n) and (μ + 3σ)/sqrt(n). A sample mean outside these bounds signals the process is out of control.
	convenience sample	a sample type where the researcher contacts those subjects who are readily available and does not use any random selection. The results are almost always biased
	correlation coefficient	a measure of the strength of the linear relationship between two quantitative variables
	data	info collected on individuals
	degrees of freedom	a characteristic of the t-distribution (and other distributions like f and chi square) indicating the amount o f information available in the data.
	density curve	a mathematical model used to describe the overall pattern of the distribution of a random variable
	deviation	the difference (distance) between an observation and the mean of all the observations in a data set, or the difference between an observation and the corresponding regression model estimate.
	direction of relationship	a characteristic of data in a scatterplot that is identified as either a positive or negative association
	distribution	a list of the possible values of a variable together with the frequency (or probability) of each value
	dotplot	a one dimensional plot of a quantitative data set where each value in the data set is represented by a dot above its corresponding location on the x axis
	double blind study	an experiment where neither the subjects nor the diagnosticians know which treatment is administered to whom
	equal variance	(equal standard deviation) variances for each of the treament groups or samples in ANOVA are all equal. In regression, the variances of the ys at each x are all assumed to be equal
	estimate of a parameter	a single value or a range of values used to estimate a prameter
	expected count	an estimate of how many observations should be in a cell of a two way table if there were no association between the row and column variables
	experiment	a study where treatments are deliberately imposed on the individuals in the study before data is gathered in order to observe their responses to the treatment
	explained variation	the amount of total variation in the ys that is accounted for by a regression model; it is equal to Σ(y^ −y)2
	explanatory variable	a variable that may or may not explain the outcomes (responses) of a study, also called independent or predictor variable
	extrapolation	using a model to predict a y value for an x value that is outside the range of observed xs. Extrapolation is dangerous and strongly discouraged because the relationship between x and y may be different outside the range of observed xs.
	factor	a term synonymous with explanatory variable
	fail to reject H_o	the appropriate statistical conclusion in hypothesis testing when the p-value is greater than alpha; equivalently conclude that "there is not enough evidence to believe H_a
	failure	any category that is not of primary interest in a qualitative data set
	f distribution	the distribution that models the ratio of two variance estimates; used in ANOVA for obtaining the p-value for testing equality of three or more means
	five number summary	these five values: minimum, Q1, median, Q3, maximum; preferred numerical summary when data are very skewed or outliers are present.
	follow-up analysis	the analysis performed on data after an overall test on the equality of multiple means or the equality of multiple proportions is found to be significant. It determines which means or which proportions differ from which
	form of relationship	a description of data in a scatterplot indicating whether the data have a linear relationship, a curved relationship or no relationship
	f test statistic	a test statistic that has an F distribution
	histogram	a graphical display of a quantitative data set; data are grouped into intervals (usually of equal width) and a bar is drawn over each interval having height proportional to the frequency (or percentage) of values in the interval. Values of the variable are given on the x axis and frequencies (or percentages) are given on the y axis. Histograms are examined to determine shape, center and spread
	in control	a process functioning within acceptable limits
	independent sample	SRSs collected separately from each of two (or more) disjoint populations; matched pairs data are considered to be dependent samples
	individual	each object or unit described in a data set
	inference	using results from a sample statistic value to draw conclusions about the population parameter
	influential point	an observation that substantially alters the fitted regression equation
	interaction	a situation that occurs in an experiment when the effect of one explanatory variable on the response variable is not the same across all levels of another explanatory variable
	interquartile range	The difference between Q3 and Q1 (i.e., Q3 - Q1); the length of the box in a boxplot.
	interviewer bias	bias introduced into survey results by body language, voice intonation, gender, race, etc. of an interviewer
	lack of realism	a weakness in experiment where the setting of the experiment does not realistically duplicate the conditions we really want to study.
	law of large numbers	the fact that the average (x bar) of observed values in a sample will tend to get closer and closer to the true mean as the sample size increases
	leas squares regression line	the line that minimizes the sum of squared residuals
	left skewed	a density curve where the left side of the distribution extends in a long tail (mean<median)
	left-tailed alternative hypothesis	an alternative hypothesis that states the parameter value is less than some number or the parameter from another treatment or population
	location measure	a summary number that tells the location (typically the center) of a data set on the number line
	lower tailed alternative hypothesis	another name for left tailed
	lurking variable	a variable that the researcher is not necessarily interested in studying but which affects the relationship between the explanatory variable and the response variable
	marginal distribution	the distribution of only one variable in a two way table by using counts found by summing over the categories of the other variable
	marginal percentage	the percentage for a row (or column) total in a two way table found by dividing the row (or column) total by the table total
	multiple analyses	performing two or more tests of significance on the same data set. this inflates the overall alpha for the tests
	multi-stage sample	a type of sample from a population that has groups and sub-groups
	observed affect	the difference between the observed value of the statistic and the hypothesized value of the corresponding parameter
	observed statistic	the value of the statistic computed from the data
	pooled sample proportion	The value used for p$ when computing sample proportion z test statistic. To compute, add the number of successes in both samples and divide by the sum of the two sample sizes.
	power (1-β)	the probability of rejecting a false null hypothesis
	r squared	the percentage of total variation in y, the response variable, that is accounted for by the regression of y on x
	robust	a statistical procedure that is insensitive to moderate deviations from an assumption upon which it is based; e.g. t procedures give p values and confidence levels that are very close to correct even when the data are not normally distributed
	significance level	(α): Probability of a Type I error, i.e. probability of rejecting a true null hypothesis; the largest risk of rejecting a true null hypothesis that a researcher is willing to take
	Simpson's paradox	A condition leading to misinterpretation of the direction of association between two variables caused by ignoring a third variable that is associated with both of the studied variables. (airline example with weather)
	t distribution	a distribution specified by degrees of freedom used to model test statistics for the sample mean, differences between sample means, etc. where sigma is unknown.
	test of homogeneity	A chi-square test on data collected from independent SRS’s from each of several populations. The null hypothesis states that the proportions for each of the categories of the response variable is the same for all populations.
	test of independence	A chi-square test on data collected from a single SRS with two categorical measurements on each individual. The null hypothesis states that there is no relationship between the two categorical variables.

Share This Flashcard Set

Set the Language

Statistics

Add to Folders

Upgrade to Cram Premium

Card Range To Study

90 Cards in this Set