Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Stats exam 1

Stats Exam 1

by njesski, Jan. 2017

Subjects: exam 1

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/60

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

60 Cards in this Set

Front
Back

	Parameter	is a numerical measurement describing some characteristic of a population - population size of 241,472,385 is a parameter. because it is based on the entire population of all adults in the US population; parameter
	Statistic	is a numerical measurement describing some characteristic of a sample sample; statisitc
	Statistic or a parameter? in a AAA Foundation for Traffic safety survey, 21% of the residents said that they recently texted or e-mailed while driving	Statistic: sample of the population not the whole population
	Sample	is a subcollection of members selected from a population the objective is to use the sample data as a basis for drawing a conclusion about the population of all adults, and methods of statistics are helpful in drawing such conclusions
	Quantitative Data	( or numerical) data consists of numbers representing counts or measurements - the ages in years of survey participents
	Qualitative Data	(or Categorical/attribute) data consists of names or labels that are not numbers representing counts or measurements - party affiliation, jersey numbers
	Statistic or parameter? a study was conducted of all 2223 passengers aboard the titanic when it sank.	Parameter: the whole population of all the passengers, not a sample of some of them
	Simple Random Sample	a sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen
	Random Sample	each member of the population has an equal chance of being selected - computers are often used to generate random telephone numbers
	Systematic Sampling	select some starting point, then select every kth (such as every 50th) element in the population
	Convenience Sampling	use results that are easy to get
	Stratified Sampling	subdivide the population into at least 2 diff subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket) then drawa sample from each subgroup
	Cluster Sampling	divide the population into sections (or clusters), then randomly select some of those clusters, and then choose all members from those selected clusters
	Simple random sample? every 100oth pill is selected and tested	not a simple random sample because every 1000th pill is selected, some samples have no chance to of being selected. for example, two consecutive pills has no chance of being selected and this violates the requirement of a simple random sample
	page 37 #9
	How are graphs able to deceive? Nonzero axis	Nonzero axis: exaggerates the difference and can create a false impression
	How are graphs able to deceive? pictographs	Pictographs: data that are one dimensional in nature (such as budget amts) are often depicted with two or three dimensional objects. pictographs can create flase impressions that grossly distort differences by using these simple principles of basic geometry: (1) when you double each side of a square, the area doesn't merely double; it increases by a factor of four (2) cube sides doubled: area increases by a factor of 8
	Normal Distibution	(1) the frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency (2) the distibution is approximately symmetric, with frequencies preceding the maximum being roughly a mirror image of those that follow the maximum
	Relative Frequency Distribution	in which each class frequency is replaced by a relative frequency (or proportion) or a percentage frequency for a class/sum of all frequencies x 100%
	Class boundaries	are the numbers used to seperate the classes, but without gaps created by class limits 50-69 70-89 90-109 110- 129 130-149 49.5,69.5, 89.5, 109.5,129.5,149.5
	Class width	is the difference between two consecutive lower class limits (or two lower class boundaries) in a frequency distribution 50-69 70-89 20 class width
	frequency distribution	shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency of data values in each of them
	a complete list of all 241,424 adults in the US is compiled and every 150th name is selected	Systematic
	a complete list of all 241,424 adults in the US is compiled and 1500 adults are randomly selected from this group	Random
	the US is partitioned into regions with 100 adults in each region. then 15 of those regions are randomly selected	cluster
	The US is partitioned into 150 regions with approximately the same number of adults in each region. then 10 people are randomly selected from each of the 150 regions	stratisfied
	a survey is mailed to 10,000 randomly selected adults, and 1500 response's were used	convenience
	Why construct a frequency distribution?	(1) so that we can summarize large data sets (2) so that we can analyze the data to see the distribution and identify outliers (3) so that we have a basis for constructing graphs
	Cumulative Frequency Distribution	use original frequencies i.e. 2,33,35,7,1 Cumulative, add up like this: 2+33= 35 2+33+35= 70 70+7=77 77+1=78
	the presence of gaps in frequency can suggest....	that the data are from two or more different populations
	Histogram	Is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars corresponds to the frequency values
	the distribution of data is skewed if...	it is not symmetric and extends more to one side than the other
	Skewed to the right	positively skewed: have a longer right tail: more common than data skewed to the left because it's often easier to get exceptionally large values than values that are exceptionally small.
	example of annual incomes and where they are skewed	with annual incomes, impossible to get values below zero, but there are a few people who earn millions or billions each year. Annual incomes tend to be skewed to the right
	skewed to the left	negatively skewed: have a longer left tail
	Scatterplot	or scatter diagram plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis: the pattern of the plotted points is often helpful in determining whether there is a correlation between two variables
	Time-Series Graph	a graph of time-series data which are quantitative data that have been collected at different points in time, such as yearly or monthly
	Dot-Plot	consists of a graph in which each data value is plotted as a point along a horizontal scale of values
	Stemplot	represents quantitative data by separating each value into two parts; the stem (left most digit) and the leaf (rightmost digit) Advantage: (1)we see the distinction of data while keeping the original data values (2) constructing a stemplot is a quick way to sort data, which is required for some statistical procedures (such as finding a median)
	Bar Graph	uses bars of equal width to show frequencies of categories or categorical (qualitative) data Vertical side: represents frequencies or relative frequencies Horizontal: identifies categories of qualitative data
	Principles of Tufte	1. small data sets of values of 20 or fewer; use a table instead of a graph 2. a graph of data should make us focus on the true nature of the data, not on other elements, such as eye catching designs 3. do not distort data; construct a graph to reveal the true nature of the data
	Pie Chart	a graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category
	Mean	of a data set is the measure of center found by adding the data values and dividing the total by the number of data values
	Important properties of Mean	1. sample means drawn from the same population tend to vary less than other measure of center 2. the mean of the data set uses every data value 3. a disadvantage of the mean is that just one outlier can change the value of the mean significantly (it is not a resistant measure of center)
	Σ denotes the sum of a data set x the variable usually used to represent the individual data set n represents the number of data values in a sample N represents the number of data sets in a population x ̃=εx/N is the mean of a set of sample values µ=Σ_x/N is the mean of all values in a population
	Median	"middle value" half of the values in a data set are less than the median and half are greater than the median.
	Mode	of a data set is the value that occurs with the greatest frequency Bimodal: 2 modes Multimodal: more than two modes No mode: when no data value is repeated
	Midrange	of a data set is the measure of center that is the value midway between the maximum and the minimum values in the original data set. Max +Min/2
	round off rules	for the mean, median and midrange, carry one decimal place than is present in the original set of values for the mode, leave the value as is (because values of the mode are the same as some of the original data value)
	Mean for frequency distribution	x ̃=(∑(f ̇⋅x ̇ )/(∑f) Frequency x Class midpoint = F x X Add up all frequencies of Ef and add up all E(FxX) Divide E(FxX) by Ef to get the mean from frequency distribution
	Measures of Variation	1. range 2. standard deviation 3. variance
	Range	of a data set is the difference between the maximum data value and the minimum data value - very sensetive to outliers
	standard deviation	*most commonly used in statistics of a set of sample values, denoted by s, is a measure of how much data values deviate away from the mena.
	Important properties of Standard Deviation	(1) the SD is a measure of how much data values deviate away from the mean (2) the value of the standard deviation, s, is usually positive. it is zero only when all values of the data values are the same number. NEVER negative (3) the value of the standard deviation can increase dramatically with the inclusion of one or more outliers (4) the units of the SD s (such as minutes, feet, pounds) are the same as the units of original data values (5) he sample standard deviation s is a biased estimator of the population standard deviation denoted by σ
	three concepts that can help us determine and interpret values of standard deviation	(1) the range rule of thumb (2) the empirical rule (3) Chebyshev's theorem
	Range rule of thumb	tool for interpreting Standard Deviation: It is based on the principle that for many data sets, the vast majority (such as 95%) of sample values lie within 2 Standard Deviations of the mean. Minimum usual value = (mean) - 2 x (standard deviation) Max usual value = (mean) + 2 x (standard devaition)
	Estimating a value of the standard devaition	s ≈ range/4 with range being the max-min
	Variance	of a set of values is a measure of variation equal to the square of the standard deviation s^2 = sample variance σ ^2 = population variance
	Important properties of Variance	(1) the units of variance are the squares of the units of the original data. if the values are in feet, the varaince will have units of ft^2 (2) the value of variance can increase dramatically with the inclusion of one or more outliers (3) the value of variance is usually positive. it is zero only when all of the data values are the same number (4) the sample variance s^2 is an unbiased estimator of the population variance σ ^2
	don't forget units of measurement after each data!!!!!

Share This Flashcard Set

Set the Language

Related Flashcards

Stats Exam 1

Add to Folders

Upgrade to Cram Premium

Card Range To Study

60 Cards in this Set