Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
22 Cards in this Set
- Front
- Back
What are the two types of data you can encounter? |
1) Categorical 2) Continous |
|
What's referred as categorical data? |
Data points consist of distinct categories.
No inherent order or numerical meaning. |
|
What's referred as continous data? |
Measurements of any numerical value within a range. |
|
What are the measurements for categorical data? (4) |
1) Frequency count 2) Mode 3) Proportion 4) Cross-tabulation (contingency table) |
|
What's cross-tabulation? |
Also known as contingency table puts in relation two categorical variables.
One variable forming the rows (e.g. male/female) and one forming the columns (e.g. employed/unemployed). |
|
What are the measurements for continuous data? (4) |
1) Central Tendency 2) Spread 3) Shape (Skewness) 4) Correlation and Regression |
|
What are the measures of Central Tendency? (3) |
1) Mean 2) Median 3) Mode |
|
What are the measures of Spread? (5) |
1) Range 2) Interquartile Range (IQR) 3) Variance 4) Standard Deviation 5) Coefficient of Variation |
|
What are the measures of Shape? (2) |
1) Skewness (symmetry) 2) Kurtosis (flatness) |
|
Moments are statistical ___ used to describe ___ of a ___ distribution.
Why the term moment? |
Measurements, characteristics, probability
A moment of a mathematical function describe its shape and behavior. |
|
Enumerate the moments of statistics (4) |
1) Central Tendency: Mean (mu) 2) Spread: Variance (sigma squared) 3) Symmetry: Skewness (gamma) 4) Flatness: Kurtosis (kappa) |
|
Null hypothesis (H0) claims that there is NO significant ___ between ___, or NO ___ between ___. The word "null" stands for the ___ of ___. |
Correlation, variables, effect, treatments
Absence, relationship |
|
The p-value is the ___ of the observed data to ___ if the null hypothesis was ___. You choose a ___ level (a) in advance if ___ < ___ then it's OK to ___ the null hypothesis. |
Probability, occure, true
Significance, p < a, reject |
|
Correlation is the measure of ___ and ___ of a linear ___ between two ___.
Correlation doesn't imply ___; it simply indicates the ___ to which changes in one variable are ___ with changes in other one. |
Strentgh and direction Relationship, variables
Causation Degree, associated |
|
Regression is used to ___ the relationship between a ___ variable and one or more ___ variables.
Regression helps ___ the value of a ___ variable based on the values of ___ variables. |
Model, dependent, independent
Predict, dependent, independent |
|
What are the measures of correlation? (2) |
1. Correlation coefficients 2. Scatter plots |
|
What are the measures of regression? (5) |
1. Regression analysis 2. Coefficient of determination 3. Model evaluation 4. Residual analysis 5. Significance testing |
|
Make an example of Regression Analysis |
Regression analysis can help you build a mathematical model that, for instance, relates years of experience (independent variable) to salary (dependent variable) and make predictions for new individuals. E.g. Salary = 1000 + 50 * Years / 4 |
|
Make an example of Coefficient of Determination |
After performing a regression analysis, you calculate an R-squared value of let's say 0.85. This means that 85% of the variability in the dependent variable can be explained by independent variable, indicating a strong correlation between the two. |
|
Make an example of Residual Analysis |
Residual analysis involves plotting the differences between the predicted prices and actual prices to check for patterns or trends. If you notice a pattern in the residuals, it may indicate that your model has systematic errors that need to be addressed. |
|
Make an example of Significance Testing |
You perform a significance test, such as a t-test or ANOVA, to determine if there is a statistically significant difference between the treatment and control groups. This helps you determine whether the drug has a real effect or if the results could have occurred by chance. |
|
Make an example of Model Evaluation |
You have trained a model to classify emails as spam or not. To evaluate its performance, you use a dataset of 1,000 emails with known labels. You measure its accuracy, precision, recall, and F1-score to assess how well it classifies emails correctly. |