• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/62

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

62 Cards in this Set

  • Front
  • Back

Individuals and Variables

Individuals- Objects described by set of data


i.e. people, animals




Variables- what is measured of individuals

Observational Study

-observes individuals


-measures variables of interest


-DOES NOT attempt to influence responses

Population

-statistical study


-Entire group of individuals we want to info about

Sample

-part of population


-actually collect info


-Use info from sample to draw conclusions about entire population

Census

-sample survey


-attempts to include entire population in sample

Experiments

-deliberately imposes some treatment on individuals


-to measure responses


-purpose --> study whether treatment causes change in response

Biased sample design

systematically favors certain outcomes




i.e. convenience sampling


voluntary response

Convenience sampling

-Biased


-selection of individuals easiest to reach

Voluntary response sample

-Biased


-chooses itself by responding to general appeal


-write-in opinion polls


-call-in opinion polls

Simple Random Sample

-allows impersonal chance


-n individuals from pop


-every set of n individuals has equal chance of being selected

Table of Random Digits

long string of digits 0-9




1. Each entry in table is equally likely to be any digit




2. Knowledge of one part of table gives no info about other part

How to choose SRS

1. Label- give each member of pop a number label of same length




2. Software or Table


-read consecutive groups of digits of


-appropriate length from


-table of digits




OR




use software to select labels at random

Sources of Error

-Undercoverage


-Nonresponse


-Response bias


-Wording of questions

Undercoverage

some groups in pop left out of process of choosing sample

Nonresponse

Individual chosen for sample


-can't be contacted


-refuses to participate

Response Bias

-systematic pattern


-incorrect responses


-sample survey

Parameter

-number that describes pop


-fixed number


-don't know actual value

Statistic

-number describing sample


-value known b/c of sample


-changes for every sample


-estimate unknown parameter

Sampling Variety

Value of statistic varies in repeated random sampling

Bias

-consistent


-repeated


-deviation of sample stat


-many samples

Variability

-describes how spread out values of sample stat are


-many samples


-Large Variability = result not repeatable




Good Sampling: Small bias and small variability

How to Reduce Bias

Random Sampling


-values of stat from SRS don't consistently overestimate or underestimate value of parameter

How to Reduce Variability

of SRS...




Use larger sample

Margin of Error

Sampling Variability --> Confidence of results




1/square root of n




Does not cover nonsampling errors

Confidence Statements

How much to trust result of one sample




1. Margin of error


--> How close sample stat is to pop parameter




2. Level of confidence


--> What percent of all samples satisfies margin of error

Sampling Errors

-caused by act of taking sample


-cause sample results to be different from results of census

Random Sampling Errors

-Deviation between sample stat and pop parameter


-caused by chance in selecting random sampling

Nonsampling Errors

-not related to act of selecting sample from pop


-can be present in census

Sampling Frame

list of individuals from which sample is drawn

Undercoverage

Sampling error




some groups in pop left out of process of choosing sample

Processing Errors

Nonsampling error




mistakes in mechanical tasks




i.e. arithmetic


entering mistakes into computer

Nonresponse

Nonsampling error




-failure to obtain data from individual selected for sample


-subjects can't be contacted


-refuse to cooperate

How to Deal With Nonsampling Errors

1. Substitute other households for nonresponders




2. Weight responses using stat methods to correct bias


--> increases variability

How to Choose Stratified Random Sample

1. Divide sampling frame into groups of individuals (strata)


2. Take separate SRS in each strata


3. Combine to make complete sample

Advantages of Internet Surveys

-collect large amounts of survey data at lower cost


-large-scale data collection


-multimedia

3 Major Problems with Internet Surveys

-Voluntary response


-Undercoverage


-Nonresponse

Questions to Ask Before Believing Poll (8)

1. Who carried out survey?


2. What was pop?


3. How was sample collected?


4. How large was sample?


5. What was response rate?


6. How subjects contacted?


7. When survey conducted?


8. What exact questions asked?

Data Tables

-dealing with large sets of data




*summarize info*




-clearly labeled


-heading and date


-variables and units


-source

Distribution of Variable

-What values it takes


-How often it takes values

Roundoff Errors

-When table entries are percentages or proportions


-Total may sum to value slightly different from 100% or 1

Pie Chart

Whole divided into parts

Bar Graph

compare categories

Pictogram

Misleading!!




width varies

Line Graph

Change over time




-time on horizontal scale


-variable measuring on vertical




Look for:


-overall pattern


-trend

Seasonal Variation

-pattern


-repeats at regular intervals of time

Seasonally Adjusted

-expected seasonal variation


-removed before data published

Histogram

Distribution of quantitative variable




1. Divide range of data into classes of equal width


2. Count number of individuals per class


3. Draw histogram

Histograms - Number of Classes

No right choice


-Too few --> Skyscraper (tall bars)


-Too many --> Pancake (most classes have one or no observations)

Interpreting Histograms

-Overall patterns


-Deviations from pattern


-Outliers

Outlier

-graph of data


-individual observation


-falls outside overall pattern of graph

Symmetric and Skewed Distributions


(Histograms)

Symmetric- left and right sides approximately mirror images




Skewed right- right side extends much farther than left




Skewed left- left side extends much farther than right

Stemplot

Graphical display of distributions


-small data sets


-quick to make and present detailed info




aka stem-and-leaf plot

How to Make Stemplot

1. For each observation:


Stem= all digits but last


Leaf= last digit




2. Write stems in vertical column w/ smallest at top


--> vertical line to right of column




3. Write leaves in row to right of its stem


--> increasing order from stem

Stemplot vs. Histogram

Look for overall pattern and outlier




Stem:


-displays actual values


-faster to draw


-DOES NOT work well w/ large data sets




His:


-choose classes

Median

1. Arrange observations in order of size


2. If odd --> Middle number


3. If even --> Avg. of 2 middle numbers

Quartiles

1. Find Median


2. Q1 --> M of 1st half (not including M)


3. Q3 --> M of 2nd half (not including M)

Five-Number Summary

-Minimum


-Q1


-M


-Q2


-Maximum

Boxplot

Graph of Five-Number Summary


-central box spans quartiles


-line in box marks median


-lines extend from box out to smallest and largest


-less detail


--> side by side comparison of multiple distributions

Mean

average




sum of observations/number of observations

Standard Deviation

(s) Measures avg. distance of observations from mean




1. Find distance of each observation from mean


--> square distances




2. Avg squared distances by dividing sum by n-1


--> variance




3. s is square root of this avg. squared distance

Properties of Standard Deviation

-measures spread about mean


--> mean= center


-s=0 when no spread


--> all observations same value


-as observations become more spread out around mean


--> s gets larger

Choosing Numerical Descriptions

-mean and S.D. strongly affected by outliers or skewed distribution


-median and quartiles less affected




-Five-Number Summary better than mean and S.D. for skewed distribution or outliers


-mean and S.D. for reasonably symmetric distributions w/o outliers