Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
20 Cards in this Set
- Front
- Back
Why use descriptives?
|
- Quick way of summarizing a data set in a way that is comparable to other data sets
- Sometimes there are too many observations to read them all and understand it - Provide the basis for inferential statistics (chi-sq., t-test, ANOVA, correlation) |
|
All descriptive statistics start with very simple number cruncing processes, such as:
|
- Counting number of observations
- Sorting data into rank from high to low - Adding up sum of observations - Subtracting the difference between each observation and some other number - Dividing by another number |
|
Categorical Distributions
|
- Is it all in one category or evenly spread across the three categories?
- Bar chart, pie chart |
|
Quantitative Variables in Distribution
|
- Which observations were most frequent?
- How does that change as one moves from low to high observations? |
|
Frequency Distributions are observed in...
|
- A histogram
|
|
Probability Distributions are expectations for...
|
- Hypothesis testing
- Idealized curve, mathematical abstraction |
|
Types of quantitative probability distributions:
|
1. Normal distribution
--> Bell-shaped curve --> Concept: skew (ex.Long tail on right is skewed right) --> Ex) Many biological traits 2. Uniform distribution --> Similar frequency across all observations --> Many regular activities --> Makes a rectangle 3. Gamma distribution - Long tail in only one direction - Measuring frequency of events over time - Ex) Phone data |
|
Central Tendency:
Mean, Median, and Mode |
1. Mean
- quantitative data only - average - very sensitive to extreme outliers - bar charts compare group means 2. Median - ordinal and quantitative data - position of 50th percentile/2nd quartile - not very sensitive to outliers 3. Mode - nominal, ordinal, and quantitative data - most frequent observation - not sensitive to outliers at all |
|
Spread: (How much is the data spread out?)
Range, Interquartile range, variance, SD, error bars |
1. Range
--> = Max - Min 2. Interquartile Range --> = 3rd quartile - 1st quartile 3. Variance --> Take the difference of each observation from the mean and then square it, sum up all of these squared differences, and then divide by the sample size 4. Standard Deviation --> Square root of the variance - How dispersed every observation is from the mean 5. Error Bars --> Errors on graphs |
|
Population Variance vs. Sample Variance
|
Population Variance
--> Assumes whole pop. in data set and seeing all variance there is to see (SumSquaredDiffs/N) Sample Variance --> Assumes there's more variance out there in pop. that's unobserved in your sample, b/c the severe outliers are rare and unlikely to be caught in a small sample (SumSquaredDiffs/[N-1]) --> Gives bigger variance |
|
Why use statistical hypothesis tests?
|
- To avoid interpreting random events as things that happen for another reasons
- There are ALWAYS differences b/t group means and associations between variables due to chance - Need a logical way of seperating the random associations from the consistent associations (which are the basis for discovering causation) |
|
4 Rules for Causation:
|
1. Cause is associated with effect
--> Association 2. Cause comes before effect --> Precedence 3. Most simple and clear explanation that fits the available data; no other alternative cause makes more sense --> Parsimony (no other questions laying around) 4. Plausible and validated mechanism of action --> Believability |
|
Hypotheses about Association:
Which methods compare quantitative and categorical variables? |
1. Correlation --> Quantitative
2. T-Test, Anova --> Categ. vs. Quant. 3. Chi-Sq. --> Categ. vs. Categ. |
|
What is the difference between Parametric Statistics and Non-Parametric Statistics?
|
- Parametric statistics use parameters like mean and variance to model an assumed distribution.
- Non-Parametric statistics don't make these assumptions about the distribution and can be used for ordinal/ranked and nominal data |
|
Null Hypothesis
|
A mathematical model of what we think the random situation is like (when there isn't any causal agent affecting our outcome measure)
|
|
Alternative Hypothesis
|
What exactly is the difference from randomness that we expect?
Ex) Higher rates of this category, different means in this category, linear slope on a scatter plot |
|
Alpha (b/t 0 and 1) is intended to exactly model the probability of type 1 error. How?
|
- Large alpha = high type 1 error
- Small alpha = small type 1 error - Inverse relationship w/ type 2 error but not easily calculated |
|
When calculating the test, what exactly are you doing?
|
- Summing up the difference from expectation and dividing by the uncertainty
|
|
What is the p-value?
|
- The probability of seeing at least this much deviation from expectation in the data, given the assumptions
|
|
Criterion for rejecting the Null:
|
- Comparing P Value to Alpha
- If p < alpha, reject the null (supports alt. hypothesis) |