Statistics

Science of Collecting/ interpreting data


Population

Study of complete set of people being studied


population parameters

Characteristics of the population


sample

subset of population from data are actually obtained


samples Statistic

Sample found by consolidating or summarizing the raw data


Margin of Error

describes range of values likly to contain the pop. parameter
Sample stat margin of error sampple stat+ margin of error 

Steps in Statisitcal study

1 identify goal
2 choose sample 3 collect data 4 use sample to make inferences 5 draw conclusions 

census

collection of data from every population


represtative data

sample has relevent characteristics of the general population


bias if.....

tend to favor certain results


Simple random sample

sample item in such way that every sample of same size has equal chance of being selected


Systematic sampling

Simple system to choose sample..... 10th or every 50th


Convenience sampling

sample that happens to be convenient


Cluster Sampling

Divide pop into groups, then pick groups randomly


stratified sampling

use when concerned about differences amoung sub groups... draw random samples from the individual strata


Subjects

People or objects chosen from sample


participant

if subjects are people, called participants


Observational Study

Observe or measure characteristics of the subjects, but do not attempt to influence


Experiment

Apply some treatment and observe


meta Analysis

Study topic that has been the subject of many other studies


Treatment group

group which recieves the treatment being tested


control group

an experiment is the group of subjects who do not receive the treatment being tested


Randomized Experiment

Subjects are assinged to the treatment group or control group at random so that each has = chance of being assigned to either group.


placebo effect

lacks the active ingredient of a treatment being tested, identical in apperence to the treatment.


placebo effect

Situation which patients improve simply because thehy believe they are reciving a useful treatment


single blind

Do not know whether they are members of treatment group or members of control Experimenters know!


double blind

Niether participants nor any experimenters know who belongs to treatment or control


case control

Observational study natually divides into two or more groups.
*people who behave inder study form case * people who do not behave are controls 

selection bias

Researchers select their sample in bias way


participantion bias

occurs anytime participation in a study is voluntary


self selected survey

people decide for themselves whether to be included in the survey


Variable

Item that can vary or take on different values


Data Types

Qualitative data
Quantitative data 

Qualitative data

Values that can be placed into nonumerical categories


quantitative

Values representing counts or measures


continuous

an take on any value in a given interval


discrete

can take on only particular values and not other values


nominal level

data that consist names, lables or catorgories.... Qualitiative and and can't be rank or ordered


Ordinal level

applies to qualitative data that can be arrange in some order( high to low)


Interval level

Qualititative data  intervals are important


Ratio level

applies to Quantitative data intervals and ratios are important


Random error

Unpredictable events in the measurment process


systematic errors

Problem in measurment system


Absolute error

how far measured value lies from the true value
Absolute= measured  true value 

Relative error

measured value true value
 x 100 true value 

accuracy

how close a measurment approximates a true value


Precision

Amount of detail in a measurement


Absolute Difference

Absolute diff = compard  refernce val


relative difference

compard value ref val
 x100 Ref value 

Index numbers

Index numbers= Value
 x 100 ref val 

consumer price index

Computed monthly, based on prices in a sample of more than 60,000 services


Frequency tables

1. Categories
2. Frequency 

bin

group catorgies


relative frequency

proportion or percentage of the data value that falls in category
Relative Freq= Freq in categoriey  total freq 

Cumulative Freq

number of data values in that category and all preceding categories


Distribution

way values are spread over all possiable values


bar gragh

bars representing freq


dotplot

dots represent freq


pareto chart

Bars arranged in freq order( nominal level)


histogram

bar graph which show distributions for quanitiative data


stem leaf plot

Histogram turned sideways.. no bars.. see indivdual data


line chart

distrobutions of quantitative data as a series of dots onnected by lines


Mean

Average value


median

middle value


mode

Most common value


outlier

data set which is much higher or much lower than mos values


weighted mean

sumof (each data value x its weight)
 sum of all weights 

A distribution is Symmetric if

left half is a mirror image of its right


Left Skewed

Value are more spread out n the left side


right skewed

values are more spread out to the right side


varation

how widly spread out about the center of data set


lower Quartile

divides lower fouth from upper fouth


Middle Quartile

Overall Median


Upper Quartile

divides lower fourths three fourths from upper three fouths


five number summary

Distribution consists of the following
1 lower value 2 lower value 3 median 4 upper Quartil 5 high quartil 

nth percentil

Divides the bottom % of data values from the top (100n)%
Percentile of data value= number of values lessthat this data value  Total number of values in data set X 100 

normal distribution

symmetric, bell shaped distribution with single peak


Relative Freq and Normal Distribution

area under normal distribution curve corresonding to the range of vales on the horizontal axis is relative frequency of those values
Total relative freq must be 1. area under the normal distribution curve must equal 1 

condition for normal distribution

1.data values clustered near mean= single peaked
2.Values spread evenly around mean making symmetric 3.Large deviation from mean becomes incresingly rare= producing tapering tails 4.Indiviual data results from comnination of many different factors such as genetic and enviormental factors 

68%

Data falls within 1 Standard Deveiation of the mean


95%

Data points fall within 2 standard deviation of the mean


99.7%

Fall within 3 standard deviation of the mean


Standard scores

data values that lies above or below the mean
z= standard score= data valuemean  Standard Deviation 