Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Stat Final

Stat Final

by sarahmorales899, Apr. 2010

Subjects: 221 stat

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/81

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

81 Cards in this Set

Front
Back

	The four parts of statistics	It's the study of data analysis-defining the problem, collecting the data, analyzing and summarizing data, and drawing inferences from data.
	Distribution	A list of the possible values of a variable together with how often each value occurs
	Types of data	Quantitative and categorical
	Left Skewed	The tail is on the left, mean is less than the median.
	Right Skewed	Tail on the right, the mean is greater than the median
	Histogram and boxplot	A right skewed histogram will match with a boxplot that has a long right whisker
	Outliers	A data point that is quite a bit removed from the rest of the data
	Mean	A measure of center. The "balance point". Computed by adding numbers and dividing by how many there are
	Median	Measure of center, cuts the data in half. Order the data and find middle observation. Outliers have no affect on the median.
	Five number summary	Each quartile is 25 percent. Minimum, median and maximum
	Standard Deviation	Measures the variability of the data about the mean; in a sense, it is like the average distance of the date from the mean.
	Effects of outliers on standard deviation.	Since outliers affect the mean and deviations from the mean are used to compute standard deviation, then outliers can make the standard deviation bigger than it should be.
	When to use five number summary or mean and standard deviation.	Use the five-number summary in the presence of outliers.
	When can a normal distribution be used to model a data set.	The Normal distribution can be used whenever the shape of the histogram of the data resembles the normal curve.
	How do you obtain a proportion or a probability for a value from a normal curve.	First, convert the value to a z score and look the z score up in table A. Be sure that if you want the "greater than" percentage you subtract the probability given in the table from one.
	68-95-99.7 rule.	68% of the observation are within one standard deviation, 95% within 2 st dev., 99.7 within three.
	Z-score	The z-score tells how many standard deviations a value is from the mean. If a score from a test has a z score of 1.8 that means the score is 1.8 standard deviations above the mean.
	Explanatory Variable	The variable we are assessing or testing in an experiment.
	Response Variable	The variable we want to predict (y) and the explanatory variable is the x or the variable we use to do the predicting.
	Correlation	Correlation is a measure of the linear relationship between the x and y variables. R is the symbol for correlation coefficient.
	Corellation (r)	r is always between -1 and 1. Values close to zero indicate littler or no linear relationship. Values close to -1 show strong negative,+1 strong positive. r has no unit of measure. r only measures strenghth of linear relationships.
	Least squares regression line	The line obtained by minimizing the sum of the squared residuals.
	Purpose of regression equations.	Regression equations are used to model relationships between quantitative variables and also for prediction.
	Residual	Observed y minus predicted y
	Slope in a least squares regression line.	slope tells us the average increase(or deacrease if negative) in y for each one unit increase in x.
	Roles of x and y in regression	X is the explanatory and y is the response variable
	r-squared	Tells the percentage of total variation in the y's that can be explained by the x's
	Residual Plots	1) Uniform scatter 2)outlers (normality) 3) Megaphone (equal variance) 4) A smile or frown (non linearity)
	Necessary Conditions for testing Significance using slope	Normality-Variance is constant Independent-Linear
	Tests on slope	Ho:B=0 versus Ha:B does not equal 0. 1. No linear relationship exists 2. A significant relationship exists.
	s in regression output	The s in regression output measures the standard deviation of the observed y's about the regression line.
	Extrapolation	Using an x value outside of the range of the observed x's used to obtain the reqression equation to predict a y value.
	CI vs PI	PI is wider
	Lurking Variable	Affects the relationship between the response variable and explanatory variable but is not part of the study. They are dangerous because they can suggest relationships that do not really exist.
	Marginal Distribution	row total divided by table total
	Conditional distrubtion	cell count divided by the row. * if the distributions are equal we say the variables are NOT related.
	Voluntary response sample	Samples obtained by having respondents call in or write in voluntarily *biased, not probability sample
	Convenience Sample	Researchers contact subjects that are convenient to contact. *biased, not probability sample
	Population of interest vs. sample	The entire group of interest-population The subgroup of individual from the population about which the researcher actually obtains information-sample
	Response Variable	The observation recorded (measured) on each individual.
	Bias	The amount that sample results systematically differ from what they should be. Bias can be eliminated by taking probability samples, using careful wording on survey questions etc.
	SRS	Sampling from the entire population
	Stratified sampling	Sampling from withing froups of a population or sampling from withing different populations
	Multistage	First sampling groups and then sampling from within those groups.
	SRS & Stratified & Multistage	All probability samples where every member of the population has a known non-zero chace of being selected. SRS gives each possible sample of size n an equal chance of being selected and a stratified sample gives each member of the stratea an equal chance within its strata of being selected.
	Cautions of surveying	Uncercoverage, non-response, response bias, wording of questions
	Methods to reduce bias	Using probability samples, avoiding undercoverage, reducing non response and avoiding poor wording on questions.
	Observational Study	Studies where information is gathered on the population but nothing is inflicted on the subjects.
	Sample Surveys	Observational studies not experiments.
	Advantage of experiment over observational study.	You can establish causation
	Control group	Those that recieve the placebo
	Completely randomized experiment	An experiment where all experimental units are allocated at random among the treatment groups
	Replication	having more than one experimental unit per treatment
	Double blind	Niether the subjects nor the diagnostician know who is recieving the treatment or placebo.
	Matched Pairs	Taking two measurements on each individual
	Completely random	Two completely seperate groups. In the experiment. If individuals are randomly allocated to treatments you have a completely random design. In experiment if order in which treatments are applied to individuals is randomized you have a matched pairs.
	Sampling Distribution	A list of the possible valued of the statistic together with the probabilities of each value. A collection of all statistics values from all possible samples.
	Increase preicision	by increasing sample size
	x-bar	sample mean
	P hat	sample proportion
	When is Phat normal	when np is greater than 10 and when n*1-P is greater than 10.
	Inferential Statistics	Using information from a sample to draw inferences about a population.
	Two most used types of inferential statistics	Confidence interval and test of hypothesis
	Confidence Interval	Gives a range of plausible values of the parameter being estimated by the confidence interval
	95 % confidence means?	The procedure provides intervals containing the parameter value for 95% of all samples. Confidence is in the procedure not the interval.
	p-value	The probability of getting a test statistic as extreme or more extreme than the value actually ovserved if the null hypothesis were true.
	alpha	The level of significance, probability of rejecting the null hypothesis when it is true or the largest risk the researcher is willing to take in rejecting a true null hypothesis.
	multiple analyses	two or more tests of significance perforemd. Inflates the type 1 error rate
	Power	Increasing alpha descreases beta and increases power. Increasing sample size decreases beta and increases power
	T distribution vs. Normal distribution	t distribution is more spread out than normal.
	First rule in data analysis	plot the data
	When are t procedures robust	When there are no outliers.
	Chi-Square Hypothesis	Ho: no relationship Ha: relationship
	Expected Count	Coumng total times row total divided by table total
	When is using Chi-square appropriate	Expected counts are greater than 5.
	DF for two way table	(r-1)(c-1)
	What does Chi-Square test?	Measures the amount of discrepancy between the observed counts and the expected counts where expected counts are computed assuming the null hypothesis of no relationship is true.
	Anova Hypothesis	Ho: All means equal Ha: at least one mean differs
	When do you use ANOVA	When the explanatory variable is categorical and the response variable is quantitative OR the problem says you are comparing three or more means.
	When do you use Chi-square	When both variables are categorical.
	What symbols to use for means	mu for means, p symbols for proportions, beta for slope in regression.

Share This Flashcard Set

Set the Language

Stat Final

Add to Folders

Upgrade to Cram Premium

Card Range To Study

81 Cards in this Set