Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
118 Cards in this Set
- Front
- Back
What is measurement error? |
Assigning an incorrect value Not actually measuring what you claim you are measuring |
|
What types of measurement error can occur |
Random vs systematic (aka measurement bias) |
|
What are sources of measurement error? |
Human mistakes or poor operationalization |
|
What is sampling error? |
The difference between 'true' population parameters and point estimates of your indicators - any error resulting from your sample |
|
What is sampling bias? |
Any relevant characteristics that are over represented or underrepresented |
|
What are sources of sampling error? |
non-probability sampling (selection bias/volunteer bias), sampling frame underrepresented portion of population, low response rate |
|
What can we use to estimate sampling error? |
Inferential statistics (as long as the error is not systematic) |
|
What is an incorrect inference? |
Overgeneralization - generalizing beyond population of study, stay as close as possible to indicators Correlation does not mean causation |
|
What is a meaningless calculation? |
Performing calculations that do not make sense |
|
What are the five main dangers of statistics? |
1) Measurement Error 2) Sampling Error 3) Incorrect Inferences 4) Meaningless Calculations 5) Politics Behind Study |
|
What is operationalization? |
Identifying unit of analysis, indicators, and possible values |
|
In the concept of sexism, what is the unit of analysis? |
The individual |
|
In the concept of sexism, what is the indicator? |
Agreement with sexist statement |
|
In the concept of sexism, what are possible values? |
Strongly agree to Strongly Disagree |
|
What can nominal variables do? |
Categorize? |
|
What can ordinal variables do? |
Categorize & rank |
|
What can interval variables do? |
Categorize, rank & have equal distance between values |
|
What can ratio variables do? |
Categorize, rank, have equal distance between values & a natural zero point |
|
Why are higher levels of measurement preferred? |
They allow more opportunities for data analysis |
|
What is a dichotomous (or binary) variable? |
A variable that can only assume two values (ie: yes/no) |
|
What are continuous variables? |
Can assume any possible non-integer value - more precision is always possible |
|
What is a discrete variable? |
Can only assume a limited number of (typically only integer) values |
|
What is coding? |
Assigning a numerical value to response values |
|
What are the two types of missing data? |
User-missing = data missing because the respondent gave no answer or no useful answer System-missing - data missing because the question has not been asked to the respondent |
|
Why is it important to recode missing data? |
So that they do not taint the analysis and interpretation of the findings |
|
What is the purpose of visualization? |
To present the data in tables and figures and provide initial observation of findings, mistakes and outliers |
|
What is a frequency distribution? |
A tabular display of response categories and the number of people or objects within each category |
|
What are the most common visualizations for univariate statistics? |
Frequency distributions, bar charts and line graphs |
|
What is the difference between bar charts and histograms? |
Histograms are barcharts of continuous variables - there is no space between the bars |
|
What is a measure of central tendency? |
What is the "average" value? |
|
What is the measure of dispersion? |
How much variation is there between cases? How average are the average values? |
|
What measure of central tendency can be calculated for a nominal variable? |
Mode |
|
What measure of central tendency can be calculated for ordinal variables? |
Median & Mode |
|
What measure of central tendency can be calculated for interval/ratio variables? |
Mean, Median & Mode |
|
What does the mode tell us? |
The value with the highest frequency/most common value |
|
What does the median tell us? |
The value of the middle case |
|
What does the mean tell us? |
The arithmetic average |
|
What is a disadvantage of calculating mode? |
It give little information when the spread is large |
|
What is a disadvantage of calculating median? |
It is misleading when extreme values are common |
|
What is a disadvantage of calculating mean? |
It is susceptible to outliers |
|
What do univariate statistics examine? |
A numerical summary of the values of one variable for the cases under study |
|
What is a measure of dispersion? |
Gives information on the heterogeneity of the cases under investigation In other words, gives information about the variability in a set of cases on a specific variable |
|
What does it mean when the measure of dispersion is larger? |
The more the cases differ from each other |
|
What measure of dispersion can be calculated for nominal variables? |
Variation Ratio |
|
What measure of dispersion can be calculated for ordinal variables? |
Range, Quintile Range & Variation Ratio |
|
What measure of dispersion can be calculated for interval/ratio variables? |
Standard Deviation, Quintile Range & Variation Ratio |
|
What does the variation ratio measure? |
The proportion of valid cases that does not have the modal value |
|
What does range measure? |
The difference between the highest and lowest value |
|
What does quintile measure? |
The range of the middle 60% |
|
When is quintile range more useful than range? |
When the total number of cases is very large because the range becomes less and less useful |
|
What does standard deviation measure? |
The average distance to the mean |
|
What do higher values of measures of dispersion tell us? |
The more diverse the cases are |
|
What can probability theory be used for? |
To determine how likely it is that our sample represents the population |
|
What are inferential statistics? |
Calculations aimed at generalizing findings from our sample to the popuation |
|
What numbers are probability measured between? |
0 = certainty it will not occur & 1=certainty it will occur |
|
What is theoretical probability? |
Probability calculated on the basis of theory |
|
What is empirical probability? |
Probability calculated on the basis of empirical investigation |
|
What does the law of large numbers state? |
That empirical probabilities will become very similar to theoretical probabilities as the number of empirical observations increases |
|
What is an independent event? |
The outcome of the first event DOES NOT influence the outcome of the second |
|
What is a dependent event? |
the outcome of the first event DOES influence the outcome of the second event |
|
What is a mutually exclusive event? |
Events cannot both occur at the same time |
|
What is a non-mutually exclusive event? |
Events can occur at the same time |
|
What is the basis of inferential statistics? |
Probability Theory |
|
What does a normal distribution look like? |
- The shape of a bell - It is symmetrical - Mode = Median = Mean - The further we move from the mean, the lower the frequeny |
|
What three characteristics are crucial for describing the normality of a distribution? |
1) Unimodality 2) Mesokurtosis 3) Symmetry |
|
What is Central Limit Theorem? |
For random sampling with a large sample size, the sampling distribution of the sample mean is approximately a normal distribution |
|
In a normal distribution, what % of vases fall within 1, 2, and 3 standard deviations of the mean? |
1 - 68%, 2 - 95%, 3 - 99.7% |
|
What is a confidence interval? |
Range of values within which we believe the parameter value to fall (54.3% - 57.7%) |
|
What is a confidence level? |
Probability that our confidence interval indeed contains the parameter value (95%) |
|
How do the confidence level and the confidence interval relate to each other? |
The higher the confidence level, the wider the confidence interval |
|
What is standard error? |
Standard deviation of a sampling distribution |
|
What kind of statistics are confidence intervals and levels used for? |
Univariate statistics |
|
What is statistical significance? |
If we feel that we can safely exclude the possibility that our findings were produced by change, we can call them statistically significant |
|
How do we express statistical significance? |
As the probability that we would reach the same conclusions under the conditions of randomness |
|
What is the Alternative Hypothesis? |
Expectation we want to investigate |
|
What is the Null Hypothesis? |
There is no effect, no difference, no significant finding |
|
What does the p value tell us about rejecting hypotheses? |
If p is low - we reject H0 and accept H1 If p is high (>0.5), we do not feel comfortable rejecting H0 but failing to reject H0 does not mean accepting H0 |
|
What is a type 1 error |
rejecting a 'true' alternative hypothesis (giving wrong info) |
|
What is a type 2 error |
Failing to reject a false alternative hypothesis (not giving enough info |
|
Which type of error do we prefer, type 1 or type 2? |
Type 2 |
|
What do you have to make sure H0 and H1 are to eachother? |
Mutually exclusive and collectively exhaustive |
|
If the hypothesized values of H0 DO NOT fall in the calculated confidence interval, we can therefore ______ |
Reject H0 |
|
If the hypothesized values DO fall within the confidence interval, we can therefore _____ |
Not Reject H0 |
|
What does bivariate analysis look at? |
The relationship between two variables |
|
What is covariation? |
As the values on one variable change, the values on the other variable change as well |
|
What five stages does bivariate analysis proceed through? |
1) formulated hypotheses 2) visualize data 3) calculate point estimate 4) conduct test of statistical significance 5) draw conclusions |
|
What is ANOVA? |
Analysis of Variance |
|
What is a measure of association? |
A statistic measuring the strength of the relationship between two variables |
|
What types of visualizations are used for bivariate analysis? |
Boxplots, scatterplots, and crosstabs |
|
Where does the dependent variable go on a cross tab? |
Rows |
|
Where does the independent variable go on a cross tab? |
Columns |
|
What measures of association can be calculated for nominal variables? |
Cramers V and Lambda |
|
What measures of association can be calculated for ordinal variables? |
Gamme and Tau |
|
What statistical significance test can be calculated for nominal variables? |
Chi Square |
|
What do the values tell us about Cramers V? |
0 = no relationship at all 1 = perfect relationship less than 0.1 = weak 0.1 - 0.3 = moderate greater than 0.3 = strong |
|
What is a Proportional Reduction in Error (PRE) statistic? |
Expresses the relationship between two (or more) variables by stating how much better we are at guessing the value on the DV if we would know the value on the IVs |
|
What is a marginal prediction for Cramers V? |
What is our best guess of the value on the DV if we did not know anything about the case? Modal value on DV |
|
What is a relational prediction for Cramers V? |
What is our best guess of the value on the DV if we knew the value of the IV? Modal value on DV for each separate value on the IV |
|
What do PRE statistics compare |
How much error we make in each prediction, and express the reduction in error as a proportion of the error in the marginal prediction |
|
How are PRE statistics measured |
Scale of 0-1 Value shows proportionate reduction in error If we multiply the value by 100% it tells us by which % our guesses improve once we know the IV and what % of the variation between cases in their value on the DV can be explained by the IV |
|
What is the distinct advantage of lambda for measuring the relationship between two nominal variables? |
is easy to interpret |
|
What is the distinct advantage of Cramers V for measuring the relationship between two nominal variables? |
is sensitive to associations that are not captured by differences in modal values that Lambda does not capture |
|
What does Kruskalls Gamma predict? |
Whether high values on the IV tend to be associated with high or low values on the DV |
|
How does Gamma make its prediction? |
By looking at pairs of cases |
|
What are concordant pairs |
Pairs that suggest a positive relationship |
|
What are discordant pairs |
Pairs that suggest a negative relationship |
|
What is the problem with Gamma and how is it overcome? |
It ignores ties, use tau Tends to overestimate the strength of the relationship by excluding some info |
|
When do you use tau b? |
When the cross tab is square |
|
When do you use tau c? |
When the cross tab is not square |
|
What is the problem with bivariate analysis? |
It misrepresents reality when it omits and important third variable |
|
What three outcomes can you see from using a control variable? |
1) Replication 2) Explanation and Interpretation 3) Specification |
|
If the measure of association stays the same after CV what do we see? |
replication, the CV and IV are unrelated |
|
If the measure of association becomes lower after CV what do we see? |
explanation, CV causes IV and DV - confounding/spurious originally interpretation, IV influences DV through CV, intervening |
|
If the measure of association moves in different directions for different values on CV what do we see? |
Specification - relationship between IV and DV is different for different values on CV - statistical interaction, CV is a suprresor |
|
What does Pearson's R tell us? |
How much better we are able to predict values on the DV is we know the values on the IV - used for interval/ratio variables |
|
What are the values of Pearsons R? |
R = -1 = perfect negative relationship R = 0 = absence of a relationship R = 1 = perfect positive correlation R squared = % of variance in the DV explained by the IV |
|
What is another term for Pearson's R |
The correlation coefficient |
|
How does regression analysis model the relationship between the IV and the DV |
As a straight line |