Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Stats AP Fall Exam

Stats Ap Fall Exam

by jgriffith11, Nov. 2008

Subjects: exam

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/71

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

71 Cards in this Set

Front
Back

	variables	characteristics recorded about each individual quantitative: variables in numbers categorical: names the category
	case	individual about whom or which we have data
	frequency table	organizing counts
	relative frequency table	similar to a normal frequency table, but this give percentages
	distribution	names the different possible categories and how frequently they may occur
	area principle	the area occupied by a part of the graph should correspond to the magnitude of the value it represents
	bar chart	displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison
	pie chart	shows the relative portion/ the whole group of cases as a circle.
	contingency table	used when looking at two categorical variables together. shows how the individuals are distributed along each variable
	marginal distribution	th distribution of one of the variables in a contingency table
	conditional distribution	a distribution of one variable for only those individuals satisfying some condition on another variable
	independent	in a contingency table when the distribution of one variable is the same for all categories of another
	segmented bar chart	used rather than a bar chart. each bar represents a "whole" and is divided into the separate parts
	Simpson's paradox	when averages are taken across different groups, they can appear contradictory
	distribution	the bins and the counts in each bin give the distribution for a quantitative variable
	histogram	shows the distribution as the heights of bars
	relative frequency histogram	displays the percentage of cases in each bin instead of the count
	stem-and-leaf displays	contains all the information found in a histogram and satisfies the area principle ad show the distribution. Preserves the individual data values
	dotplot	a simple display the places a dot for each case in the data
	Describing distribution of a Histogram	unimodal: one main peak bimodal: two peaks multimodal: three of more peaks uniform: all bars are approximately the same height symmetric: fold along the middle and the sides will match tails: thinner ends of a distribution skewed: if a tail stretches out really far than the histogram is skewed to the side of the longer tail outliers: stragglers that stand off away from the body of the distribution
	timeplot	shows data that change over time
	center of distribution	median: the middle value that divides the histogram into two equal areas (n+1)/2 n is the number of numbers
	spread	describes the distribution numerically which measures the spread along with its center
	range	max-min
	quartiles	split the data at the median and find the medians of those two halves
	interquartile range/IQR	upper quartile-lower quartile
	percentiles	the lower and upper quartiles are aka the 25th and the 75th percentiles of the data
	the five-number-summary	Median Quartiles : Q1 and Q3 Maximum Minimum
	mean	add up the numbers and divided by n the point in which the histogram would balance for skewed data, use the median instead
	deviation	how far the data value is from the mean
	standard deviation	1) you find the mean (ybar) 2) subtract the mean from each value 3) square each value 4) add these numbers and divide by n-1 5) take the square root of this number
	z-scores	how many standard deviations something is from the mean z= (y-ybar)/s s is the standard deviation
	normal model	appropriate for distributions whose shapes are unimodal and roughly symmetric mean=0 SD=1
	standardized value	found by subtracting the mean and dividing by the standard deviation
	68-95-99.7 Rule	In a Normal model, 68% of the values fall within one standard deviation of the mean, 95% fall within two standard deviations of the mean, and 99.7% fall within three standard deviations of the mean
	normal percentile	gives the percentage of values in a standard normal distribution found at that z-score or below
	normal probability plot	if the plot is normal, then a diagonal straight line should form
	scatterplot	shows the relationship between two quantitative variables measured on the same cases
	direction of scatterplots	runs from upper left to the lower right- negative lower left to upper right-positive
	response and explanatory variable	explanatory is on the x-axis response is on the y-axis
	correlation	strength of the linear association between two quantitative variables between -1 and 1
	how to find the correlation coefficient	1) (x-xbar)(y-ybar) do this for each pair 2) divide this sum by the product of (n-1) x slilx x slily
	predicted value	the value of y-hat found for each x-value in the data. Found by substituting the x-value in the regression equation.
	residual	the difference between the observed value and its associated predicted value y - y-hat= residual
	linear model	y-hat = b0 + b1x y-hat is the predicted value b0 is the y-intercept b1 is the slope
	R-squared	the correlation between y and x the fraction of variability accounted for by the least squares regression on c how successful the regression is in linearly relating y to x
	least squares	specifies the unique line that minimizes the variance of the residuals or, the sum of the squared residuals
	good linear model?	1) scatterplot is linear 2) calculate r. if r is close to 1 or -1 then it is a good model 3) look at residual plot and see no pattern
	leverage	outliers that exert high leverage on a linear model. Pulls the line close, so that they can have a large effect on the line
	Influential point	a point that has high leverage
	random	an event is random if we know what outcomes could happen, but not which particular values did or will happen
	simulation	models random events by using random numbers to specify event outcomes with relative frequencies that correspond to the true-world relative frequencies we are trying to model
	Census Sample	survey of the whole population
	SRS	sample size n in which n elements in the population has an equal chance of selection
	Convenience Sample	biased. Individuals who are conveniently available
	Systematic Sampling	every nth number
	Stratified Sample	divides the population into strata (groups). Into homogeneous strata within each group do a SRS
	Cluster Sampling	divide into heterogeneous groups and take a SRS
	Multistage Sampling	Sampling done in stages
	Forms of Bias	1) Response Bias: wording of the question 2) Non-Response Bias: you are present, but chose not to respond 3) Voluntary Response: responding because you have a strong feeling 4) Undercoverage: you are not present and hence not represented in the sample
	Observational Study	no manipulation of factors has been employed
	Retrospective Study	subjects are selected and then their previous conditions or behaviors are determined
	Prospective Study	observational study in which subjects are followed to observe future outcomes
	experiment	manipulates factor levels to create treatments and compare the responses of the subject groups across treatment levels
	Principles of Experimental Design	1) Control: areas we know may have an effect, but are not factors being studied 2) Randomize 3) Replication Block: reduces variation (not required)
	Statistically Significant	an observed difference is too large for us to believe that it is likely to have occurred naturally
	Control Group	group given the placebo or baseline treatment
	single-blind	either those who can influence the results do not know or those who evaluate the results dont know
	double-blind	when everyone is blinded
	block	when groups of experimental units are similar, we gather them into blocks. this isolate variability so that we may see the differences more clearly
	confounded	the effects of two or more factors are associated with eachother

Share This Flashcard Set

Set the Language

Stats Ap Fall Exam

Add to Folders

Upgrade to Cram Premium

Card Range To Study

71 Cards in this Set