• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/60

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

60 Cards in this Set

  • Front
  • Back

Parameter

is a numerical measurement describing some characteristic of a population


- population size of 241,472,385 is a parameter. because it is based on the entire population of all adults in the US


population; parameter

Statistic


is a numerical measurement describing some characteristic of a sample


sample; statisitc

Statistic or a parameter?


in a AAA Foundation for Traffic safety survey, 21% of the residents said that they recently texted or e-mailed while driving

Statistic: sample of the population not the whole population

Sample




is a subcollection of members selected from a population


the objective is to use the sample data as a basis for drawing a conclusion about the population of all adults, and methods of statistics are helpful in drawing such conclusions

Quantitative Data

( or numerical) data consists of numbers representing counts or measurements


- the ages in years of survey participents

Qualitative Data

(or Categorical/attribute) data consists of names or labels that are not numbers representing counts or measurements


- party affiliation, jersey numbers

Statistic or parameter?


a study was conducted of all 2223 passengers aboard the titanic when it sank.


Parameter: the whole population of all the passengers, not a sample of some of them

Simple Random Sample

a sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen

Random Sample

each member of the population has an equal chance of being selected


- computers are often used to generate random telephone numbers

Systematic Sampling

select some starting point, then select every kth (such as every 50th) element in the population

Convenience Sampling

use results that are easy to get

Stratified Sampling

subdivide the population into at least 2 diff subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket) then drawa sample from each subgroup

Cluster Sampling

divide the population into sections (or clusters), then randomly select some of those clusters, and then choose all members from those selected clusters

Simple random sample?


every 100oth pill is selected and tested

not a simple random sample because every 1000th pill is selected, some samples have no chance to of being selected.


for example, two consecutive pills has no chance of being selected and this violates the requirement of a simple random sample

page 37 #9

How are graphs able to deceive?


Nonzero axis

Nonzero axis: exaggerates the difference and can create a false impression



How are graphs able to deceive?


pictographs

Pictographs: data that are one dimensional in nature (such as budget amts) are often depicted with two or three dimensional objects. pictographs can create flase impressions that grossly distort differences by using these simple principles of basic geometry:


(1) when you double each side of a square, the area doesn't merely double; it increases by a factor of four


(2) cube sides doubled: area increases by a factor of 8

Normal Distibution

(1) the frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency


(2) the distibution is approximately symmetric, with frequencies preceding the maximum being roughly a mirror image of those that follow the maximum

Relative Frequency Distribution

in which each class frequency is replaced by a relative frequency (or proportion) or a


percentage


frequency for a class/sum of all frequencies


x 100%

Class boundaries

are the numbers used to seperate the classes, but without gaps created by class limits


50-69


70-89


90-109


110- 129


130-149


49.5,69.5, 89.5, 109.5,129.5,149.5

Class width

is the difference between two consecutive lower class limits (or two lower class boundaries) in a frequency distribution


50-69


70-89


20 class width

frequency distribution

shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency of data values in each of them
a complete list of all 241,424 adults in the US is compiled and every 150th name is selected
Systematic
a complete list of all 241,424 adults in the US is compiled and 1500 adults are randomly selected from this group
Random
the US is partitioned into regions with 100 adults in each region. then 15 of those regions are randomly selected
cluster
The US is partitioned into 150 regions with approximately the same number of adults in each region. then 10 people are randomly selected from each of the 150 regions
stratisfied
a survey is mailed to 10,000 randomly selected adults, and 1500 response's were used
convenience
Why construct a frequency distribution?

(1) so that we can summarize large data sets


(2) so that we can analyze the data to see the distribution and identify outliers


(3) so that we have a basis for constructing graphs

Cumulative Frequency Distribution

use original frequencies i.e. 2,33,35,7,1


Cumulative, add up like this:


2+33= 35


2+33+35= 70


70+7=77


77+1=78

the presence of gaps in frequency can suggest....
that the data are from two or more different populations

Histogram

Is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars corresponds to the frequency values
the distribution of data is skewed if...
it is not symmetric and extends more to one side than the other

Skewed to the right

positively skewed:


have a longer right tail:


more common than data skewed to the left because it's often easier to get exceptionally large values than values that are exceptionally small.

example of annual incomes and where they are skewed
with annual incomes, impossible to get values below zero, but there are a few people who earn millions or billions each year. Annual incomes tend to be skewed to the right

skewed to the left

negatively skewed:


have a longer left tail

Scatterplot

Scatterplot

or scatter diagram


plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis:


the pattern of the plotted points is often helpful in determining whether there is a correlation between two variables

Time-Series Graph

a graph of time-series data which are quantitative data that have been collected at different points in time, such as yearly or monthly
Dot-Plot

Dot-Plot

consists of a graph in which each data value is plotted as a point along a horizontal scale of values
Stemplot

Stemplot

represents quantitative data by separating each value into two parts; the stem (left most digit) and the leaf (rightmost digit)


Advantage:


(1)we see the distinction of data while keeping the original data values


(2) constructing a stemplot is a quick way to sort data, which is required for some statistical procedures (such as finding a median)

Bar Graph

Bar Graph

uses bars of equal width to show frequencies of categories or categorical (qualitative) data


Vertical side: represents frequencies or relative frequencies


Horizontal: identifies categories of qualitative data

Principles of Tufte

1. small data sets of values of 20 or fewer; use a table instead of a graph


2. a graph of data should make us focus on the true nature of the data, not on other elements, such as eye catching designs


3. do not distort data; construct a graph to reveal the true nature of the data

Pie Chart

a graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category
Mean

Mean

of a data set is the measure of center found by adding the data values and dividing the total by the number of data values
Important properties of Mean

1. sample means drawn from the same population tend to vary less than other measure of center


2. the mean of the data set uses every data value


3. a disadvantage of the mean is that just one outlier can change the value of the mean significantly


(it is not a resistant measure of center)

Σ denotes the sum of a data set


x the variable usually used to represent the individual data set


n represents the number of data values in a


sample


N represents the number of data sets in a


population


x ̃=εx/N is the mean of a set of sample values


µ=Σ_x/N is the mean of all values in a population




Median

"middle value" half of the values in a data set are less than the median and half are greater than the median.

Mode

of a data set is the value that occurs with the greatest frequency


Bimodal: 2 modes


Multimodal: more than two modes


No mode: when no data value is repeated

Midrange

of a data set is the measure of center that is the value midway between the maximum and the minimum values in the original data set.


Max +Min/2

round off rules

for the mean, median and midrange, carry one decimal place than is present in the original set of values


for the mode, leave the value as is (because values of the mode are the same as some of the original data value)

Mean for frequency distribution

Mean for frequency distribution

x ̃=(∑(f ̇⋅x ̇ )/(∑f)


Frequency x Class midpoint = F x X


Add up all frequencies of Ef and add up all E(FxX)


Divide E(FxX) by Ef to get the mean from frequency distribution

Measures of Variation

1. range


2. standard deviation


3. variance

Range

of a data set is the difference between the maximum data value and the minimum data value


- very sensetive to outliers

standard deviation

*most commonly used in statistics


of a set of sample values, denoted by s, is a measure of how much data values deviate away from the mena.

Important properties of Standard Deviation

(1) the SD is a measure of how much data values deviate away from the mean


(2) the value of the standard deviation, s, is usually positive. it is zero only when all values of the data values are the same number. NEVER negative


(3) the value of the standard deviation can increase dramatically with the inclusion of one or more outliers


(4) the units of the SD s (such as minutes, feet, pounds) are the same as the units of original data values


(5) he sample standard deviation s is a biased estimator of the population standard deviation denoted by σ

three concepts that can help us determine and interpret values of standard deviation

(1) the range rule of thumb


(2) the empirical rule


(3) Chebyshev's theorem

Range rule of thumb

tool for interpreting Standard Deviation:


It is based on the principle that for many data sets, the vast majority (such as 95%) of sample values lie within 2 Standard Deviations of the mean.


Minimum usual value = (mean) - 2 x (standard deviation)


Max usual value = (mean) + 2 x (standard devaition)

Estimating a value of the standard devaition
s ≈ range/4 with range being the max-min

Variance

of a set of values is a measure of variation equal to the square of the standard deviation


s^2 = sample variance


σ ^2 = population variance

Important properties of Variance

(1) the units of variance are the squares of the units of the original data. if the values are in feet, the varaince will have units of ft^2


(2) the value of variance can increase dramatically with the inclusion of one or more outliers


(3) the value of variance is usually positive. it is zero only when all of the data values are the same number


(4) the sample variance s^2 is an unbiased estimator of the population variance σ ^2

don't forget units of measurement after each data!!!!!