• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/20

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

20 Cards in this Set

  • Front
  • Back
Why use descriptives?
- Quick way of summarizing a data set in a way that is comparable to other data sets
- Sometimes there are too many observations to read them all and understand it
- Provide the basis for inferential statistics (chi-sq., t-test, ANOVA, correlation)
All descriptive statistics start with very simple number cruncing processes, such as:
- Counting number of observations
- Sorting data into rank from high to low
- Adding up sum of observations
- Subtracting the difference between each observation and some other number
- Dividing by another number
Categorical Distributions
- Is it all in one category or evenly spread across the three categories?
- Bar chart, pie chart
Quantitative Variables in Distribution
- Which observations were most frequent?
- How does that change as one moves from low to high observations?
Frequency Distributions are observed in...
- A histogram
Probability Distributions are expectations for...
- Hypothesis testing
- Idealized curve, mathematical abstraction
Types of quantitative probability distributions:
1. Normal distribution
--> Bell-shaped curve
--> Concept: skew (ex.Long tail on right is skewed right)
--> Ex) Many biological traits

2. Uniform distribution
--> Similar frequency across all observations
--> Many regular activities
--> Makes a rectangle

3. Gamma distribution
- Long tail in only one direction
- Measuring frequency of events over time
- Ex) Phone data
Central Tendency:
Mean, Median, and Mode
1. Mean
- quantitative data only
- average
- very sensitive to extreme outliers
- bar charts compare group means

2. Median
- ordinal and quantitative data
- position of 50th percentile/2nd quartile
- not very sensitive to outliers

3. Mode
- nominal, ordinal, and quantitative data
- most frequent observation
- not sensitive to outliers at all
Spread: (How much is the data spread out?)
Range, Interquartile range, variance, SD, error bars
1. Range
--> = Max - Min

2. Interquartile Range
--> = 3rd quartile - 1st quartile

3. Variance
--> Take the difference of each observation from the mean and then square it, sum up all of these squared differences, and then divide by the sample size

4. Standard Deviation
--> Square root of the variance
- How dispersed every observation is from the mean

5. Error Bars
--> Errors on graphs
Population Variance vs. Sample Variance
Population Variance
--> Assumes whole pop. in data set and seeing all variance there is to see (SumSquaredDiffs/N)

Sample Variance
--> Assumes there's more variance out there in pop. that's unobserved in your sample, b/c the severe outliers are rare and unlikely to be caught in a small sample (SumSquaredDiffs/[N-1])
--> Gives bigger variance
Why use statistical hypothesis tests?
- To avoid interpreting random events as things that happen for another reasons
- There are ALWAYS differences b/t group means and associations between variables due to chance
- Need a logical way of seperating the random associations from the consistent associations (which are the basis for discovering causation)
4 Rules for Causation:
1. Cause is associated with effect
--> Association

2. Cause comes before effect
--> Precedence

3. Most simple and clear explanation that fits the available data; no other alternative cause makes more sense
--> Parsimony (no other questions laying around)

4. Plausible and validated mechanism of action
--> Believability
Hypotheses about Association:
Which methods compare quantitative and categorical variables?
1. Correlation --> Quantitative
2. T-Test, Anova --> Categ. vs. Quant.
3. Chi-Sq. --> Categ. vs. Categ.
What is the difference between Parametric Statistics and Non-Parametric Statistics?
- Parametric statistics use parameters like mean and variance to model an assumed distribution.
- Non-Parametric statistics don't make these assumptions about the distribution and can be used for ordinal/ranked and nominal data
Null Hypothesis
A mathematical model of what we think the random situation is like (when there isn't any causal agent affecting our outcome measure)
Alternative Hypothesis
What exactly is the difference from randomness that we expect?
Ex) Higher rates of this category, different means in this category, linear slope on a scatter plot
Alpha (b/t 0 and 1) is intended to exactly model the probability of type 1 error. How?
- Large alpha = high type 1 error
- Small alpha = small type 1 error
- Inverse relationship w/ type 2 error but not easily calculated
When calculating the test, what exactly are you doing?
- Summing up the difference from expectation and dividing by the uncertainty
What is the p-value?
- The probability of seeing at least this much deviation from expectation in the data, given the assumptions
Criterion for rejecting the Null:
- Comparing P Value to Alpha
- If p < alpha, reject the null (supports alt. hypothesis)