individuals

are the objects described by a set of data.


variable

any characteristic of an individuals.


categorical variable

places an individual into one of several groups or categories. (distribution is measured in count or percent)


quantitative variable

takes numerical values for which arithmetic operations such as adding and averaging make sense.


distribution

tells us what values the variable takes and how often it takes these values.
use words like center,spread and shape. 

outlier

an individual observation that falls outside the overall pattern of the graph.


symmetric

the left and right sides are mirror images.


skewed to the right

if the right sides extends farther out then the left side. (tail off to the right)


skewed left

if the left side of the histogram extends much farther out than the right side. (tail off to the left)


Pth percentile

the value such that p percent of the observations fall at or below it.


relative frequency

count per class/total x 100%


cumulative frequency

add the counts of the current class and all classes below that level (add classes together as each class increases)


relative cumulative frequency
(ojive graph) 
cumulative frequency per class/total x100%


IQR

Q3Q1= ?
where Q3 is the median of all observations to the RIGHT of the overall median and Q1= median of all observations to the LEFT of the overall median. 

upper outlier cut off

Q3 + (1.5 X IQR)


lower outlier cut off

Q1  ( 1.5 x IQR )


5 number summary

minimum , Q1, Median, Q3, Maximum


standard deviation

the distance of #s from the mean


density curve

is always on or above the horizontal axis and has an area of exactly 1 underneath it. THE MEDIAN SPLITS THE CURVE IN HALF.
THE MEAN IS THE POINT AT WHICH THE CURVE WOULD BALANCE. 

response variable

measures an outcome of a study


exploratory variable

attempts to explain the observed outcomes. (independent variable)


scatterplot

shows the relationship between two quantitative variables measured on the same individuals.
notice the: overall pattern, strength, form, and outliers 

regression line

a straight line that describes how response variable y changes as an explanatory variable x changes.


coefficent of determination

r2ed


residuals

the difference between an observed value and predicted value.


linear growth

increases by a fixed amount


exponential growth

increases by a fixed percentage


extrapolation

the use of a regression line for prediction far outside the domain of values of the x and y


lurking variable

not among the x and y variables but may influence the interpretations of the relationship between x and y


are usually too high when applied to individuals

correlations based on averaged data.


simpsons paradox

the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group.


observational study

observes individuals and measures variables of interest but does not influence responses.


experiment

deliberately imposes some treatment on individuals in order to observe their responses.


population

the entire group of individuals we want information about.


sample

part of the population that is actually examined.


census

attempts to contact every individual in the entire population.


voluntary response sample

consists of people who choose themselvers by responding to a general appeal. BIASED  brings out especially those with negative opinions


convenience sampling

chooses individuals easiest to reach  BIASED


biased

systematically favors certain outcomes.


SRS or simple random sample

eveyrone in the population has an eaqual chance of being chosen.


strata

groups of similar individuals
