Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
62 Cards in this Set
- Front
- Back
Individuals and Variables |
Individuals- Objects described by set of data i.e. people, animals Variables- what is measured of individuals |
|
Observational Study |
-observes individuals -measures variables of interest -DOES NOT attempt to influence responses |
|
Population |
-statistical study -Entire group of individuals we want to info about |
|
Sample |
-part of population -actually collect info -Use info from sample to draw conclusions about entire population |
|
Census |
-sample survey -attempts to include entire population in sample |
|
Experiments |
-deliberately imposes some treatment on individuals -to measure responses -purpose --> study whether treatment causes change in response |
|
Biased sample design |
systematically favors certain outcomes i.e. convenience sampling voluntary response |
|
Convenience sampling |
-Biased -selection of individuals easiest to reach |
|
Voluntary response sample |
-Biased -chooses itself by responding to general appeal -write-in opinion polls -call-in opinion polls |
|
Simple Random Sample |
-allows impersonal chance -n individuals from pop -every set of n individuals has equal chance of being selected |
|
Table of Random Digits |
long string of digits 0-9 1. Each entry in table is equally likely to be any digit 2. Knowledge of one part of table gives no info about other part |
|
How to choose SRS |
1. Label- give each member of pop a number label of same length 2. Software or Table -read consecutive groups of digits of -appropriate length from -table of digits OR use software to select labels at random |
|
Sources of Error |
-Undercoverage -Nonresponse -Response bias -Wording of questions |
|
Undercoverage |
some groups in pop left out of process of choosing sample |
|
Nonresponse |
Individual chosen for sample -can't be contacted -refuses to participate |
|
Response Bias |
-systematic pattern -incorrect responses -sample survey |
|
Parameter |
-number that describes pop -fixed number -don't know actual value |
|
Statistic |
-number describing sample -value known b/c of sample -changes for every sample -estimate unknown parameter |
|
Sampling Variety |
Value of statistic varies in repeated random sampling |
|
Bias |
-consistent -repeated -deviation of sample stat -many samples |
|
Variability |
-describes how spread out values of sample stat are -many samples -Large Variability = result not repeatable Good Sampling: Small bias and small variability |
|
How to Reduce Bias |
Random Sampling -values of stat from SRS don't consistently overestimate or underestimate value of parameter |
|
How to Reduce Variability |
of SRS... Use larger sample |
|
Margin of Error |
Sampling Variability --> Confidence of results 1/square root of n Does not cover nonsampling errors |
|
Confidence Statements |
How much to trust result of one sample 1. Margin of error --> How close sample stat is to pop parameter 2. Level of confidence --> What percent of all samples satisfies margin of error |
|
Sampling Errors |
-caused by act of taking sample -cause sample results to be different from results of census |
|
Random Sampling Errors |
-Deviation between sample stat and pop parameter -caused by chance in selecting random sampling |
|
Nonsampling Errors |
-not related to act of selecting sample from pop -can be present in census |
|
Sampling Frame |
list of individuals from which sample is drawn |
|
Undercoverage |
Sampling error some groups in pop left out of process of choosing sample |
|
Processing Errors |
Nonsampling error mistakes in mechanical tasks i.e. arithmetic entering mistakes into computer |
|
Nonresponse |
Nonsampling error -failure to obtain data from individual selected for sample -subjects can't be contacted -refuse to cooperate |
|
How to Deal With Nonsampling Errors |
1. Substitute other households for nonresponders 2. Weight responses using stat methods to correct bias --> increases variability |
|
How to Choose Stratified Random Sample |
1. Divide sampling frame into groups of individuals (strata) 2. Take separate SRS in each strata 3. Combine to make complete sample |
|
Advantages of Internet Surveys |
-collect large amounts of survey data at lower cost -large-scale data collection -multimedia |
|
3 Major Problems with Internet Surveys |
-Voluntary response -Undercoverage -Nonresponse |
|
Questions to Ask Before Believing Poll (8) |
1. Who carried out survey? 2. What was pop? 3. How was sample collected? 4. How large was sample? 5. What was response rate? 6. How subjects contacted? 7. When survey conducted? 8. What exact questions asked? |
|
Data Tables |
-dealing with large sets of data *summarize info* -clearly labeled -heading and date -variables and units -source |
|
Distribution of Variable |
-What values it takes -How often it takes values |
|
Roundoff Errors |
-When table entries are percentages or proportions -Total may sum to value slightly different from 100% or 1 |
|
Pie Chart |
Whole divided into parts |
|
Bar Graph |
compare categories |
|
Pictogram |
Misleading!! width varies |
|
Line Graph |
Change over time -time on horizontal scale -variable measuring on vertical Look for: -overall pattern -trend |
|
Seasonal Variation |
-pattern -repeats at regular intervals of time |
|
Seasonally Adjusted |
-expected seasonal variation -removed before data published |
|
Histogram |
Distribution of quantitative variable 1. Divide range of data into classes of equal width 2. Count number of individuals per class 3. Draw histogram |
|
Histograms - Number of Classes |
No right choice -Too few --> Skyscraper (tall bars) -Too many --> Pancake (most classes have one or no observations) |
|
Interpreting Histograms |
-Overall patterns -Deviations from pattern -Outliers |
|
Outlier |
-graph of data -individual observation -falls outside overall pattern of graph |
|
Symmetric and Skewed Distributions (Histograms) |
Symmetric- left and right sides approximately mirror images Skewed right- right side extends much farther than left Skewed left- left side extends much farther than right |
|
Stemplot |
Graphical display of distributions -small data sets -quick to make and present detailed info aka stem-and-leaf plot |
|
How to Make Stemplot |
1. For each observation: Stem= all digits but last Leaf= last digit 2. Write stems in vertical column w/ smallest at top --> vertical line to right of column 3. Write leaves in row to right of its stem --> increasing order from stem |
|
Stemplot vs. Histogram |
Look for overall pattern and outlier Stem: -displays actual values -faster to draw -DOES NOT work well w/ large data sets His: -choose classes |
|
Median |
1. Arrange observations in order of size 2. If odd --> Middle number 3. If even --> Avg. of 2 middle numbers |
|
Quartiles |
1. Find Median 2. Q1 --> M of 1st half (not including M) 3. Q3 --> M of 2nd half (not including M) |
|
Five-Number Summary |
-Minimum -Q1 -M -Q2 -Maximum |
|
Boxplot |
Graph of Five-Number Summary -central box spans quartiles -line in box marks median -lines extend from box out to smallest and largest -less detail --> side by side comparison of multiple distributions |
|
Mean |
average sum of observations/number of observations |
|
Standard Deviation |
(s) Measures avg. distance of observations from mean 1. Find distance of each observation from mean --> square distances 2. Avg squared distances by dividing sum by n-1 --> variance 3. s is square root of this avg. squared distance |
|
Properties of Standard Deviation |
-measures spread about mean --> mean= center -s=0 when no spread --> all observations same value -as observations become more spread out around mean --> s gets larger |
|
Choosing Numerical Descriptions |
-mean and S.D. strongly affected by outliers or skewed distribution -median and quartiles less affected -Five-Number Summary better than mean and S.D. for skewed distribution or outliers -mean and S.D. for reasonably symmetric distributions w/o outliers |