Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
47 Cards in this Set
- Front
- Back
Statistics |
analysis and interpretation of data to objectively evaluate the reliability of conclusions based on the data |
|
Goals of Statistics |
Estimate values of important parameters and test hypotheses about those parameters (can assess differences between groups and the relationship between variables) |
|
Descriptive statistics |
procedures to summarize/organize data statistics of location (measure of central tendency, some average value [mean, median, mode]) statistics of dispersion, spread, variability of data (variance, standard deviation) shape of frequency distribution |
|
Inferential statistics |
enables generalized conclusions from data (using sample data to make statements/conclusions/inferences about whole population) evaluates two kinds of questions: 1. How reliable are the results (via setting confidence limits to sample stats) 2. How probable is it that the results are due to chance alone? (differences between observed and expected) Involves hypothesis testing via statistical tests (t-tests, ANOVA, Goodness of Fit) to decide whether or not to reject the underlying null hypothesis |
|
Hypothesis testing |
comparing two or more statistics' values to determine if they are the same or different |
|
H0 null hypothesis |
no difference between the observed and expected |
|
Halt alternative hypothesis |
different between observed and expected |
|
variable |
characteristic that differes among individuals/ smapling units of a population (e.g. length, weight, # of bristles, genotype, N-content, speed) |
|
population |
total set of individuals about which one wishes to draw conclusions (make inferences) |
|
sample |
a collection of observations/measurements (data) Subset of population (pop. usually too big to sample all) |
|
parameter |
parametric statistic parametric value for an attribute true value for the whole population (usually unknown) |
|
sample statistic |
statistic estimate= the value for the attribute calculated from your sample Population parameters are constants, vs. population estimates will change from one random sample to the next, even though the sample was taken from the same population. |
|
variability |
real world. sampling will lead to uncertainty (not perfectly representative) |
|
sampling error |
chance difference between a sample estimate and the true parameter for the whole population large samples= smaller sampling error Statistics involve the measurement and examination of uncertainty. statistics become necessary when observations are variable. |
|
How closely does sample reflect true values? |
Poor data (unrepresentative, pseudoreplicated) can lead to erroneous conclusions Inadequate data (too small sample size) leads to inconclusive results. |
|
Statistical considerations can aid in design of experiments to prevent: |
wrong manner of sampling too small sample size unnecessary effort how you design your experiment will determine the power of different statistical tests |
|
Experimental design |
constructing hypotheses how to set up experiment type of data to collect how to collect data and how much data to collect goals- to reduce bias and sampling error |
|
Bias |
a systematic discrepancy between estimates and the true population value |
|
random sampling |
minimizes bias and permits sampling error estimate reliable statistic gives estimate which are close to the true value |
|
Stats based on probabalistic distributions which assume |
random sampling and independence of sampels |
|
Different types of sampling |
Random sampling= best but takes more time (each member of pop has equal and independent chance of being selected) (random number generator) Systematic sampling= can be effectively random, much faster, but must make sure no trends are in phase with sampling scheme |
|
Inappropriate Sampling |
Sampling of convenience not likely to be unbiased and independent Judgment sampling which is always biased Volunteer bias- volunteers are likely to be different from the average member of the population |
|
Numerical variable |
Quantitative, measurement data continuous vs discontinuous measured vs. counted |
|
Categorical variable Nominal |
Name/describe categories or attributes qualitative- cannot be measured, not numerical but you calculate frequencies for each category (either absolute or relative frequencies) Must use different statistical tests for this type of data |
|
Categorical variable Ranked |
Ordinal variable gives information about a data point's rank or order but doesn't tell you exact numerical values of the data points less information than measurement type but still useful Mostly categorical (life stages, snake bite severity, size classes, socio-economic categories) can be numerical rankings but still only know ranking not actual magnitudes of a measurement (numerical score answers on surveys) |
|
Independent vs dependent variables |
try to predict or explain a response variable from an explanatory variable |
|
Experimental vs. observational study |
Experimental- researcher assigns treatments randomly to individuals Observational- researcher does not assign treatments |
|
Accuracy vs. Precision |
Accuracy- how close to true value Precision- repeatability (number of digits indicates measurement precision level). (round to one decimal place more than the original measurements) Rounding- round up if next decimal place is greater than or equal to five |
|
Derived variables |
calculated from two or more measured variables ratios, indices, rates, etc. lose information may not be normally distributed |
|
Frequency distributions |
distribution of the total number of observations for a variable shows how often each value of a variable occurs in a sample absolute frequency- count of how many individuals fit or occur in the category relative frequency- proportion of total data set represented by each (fraction of occurrences of each value of a variable) |
|
Ways to organize data to show distribution |
list data, frequency table, graph or diagram of frequency distribution graph (one variable)- curves, bar graphs, and histograms |
|
Bar graph |
categorical and discontinuous numerical data bars do not touch |
|
Histograms |
Continuous numerical data, grouped into bins bars do touch y-axis is frequency density can't just compare bar heights unless all bars are same width must calculate area of bar to know absolute frequency |
|
Cumulative frequency |
Sum of all frequencies so far (as move on to higher x values) |
|
Dot plots |
if few data points for discontinuous data or continuous data without a wide range of values |
|
Frequency polygon |
not common points are places where top of bar would be, then connected with straight lines, don't try to read data between points |
|
Steam and leaf display |
stem- leading digits leaves- last significant digit (for each data point with that leading digit) must keep spacing the same between digits not real graph freq. distribution with appearance like histogram ranking, too, so easy to find median, easy to make and check does not lose specific information |
|
Grouped histograms |
Comparing numerical variables between groups |
|
Grouping of classes |
Needed when there are too many or too few data pints per x-value rules of thumb: 10-20 classes usually good, unless small or very large data set sq. root of n , for total n observations Sturge's rule good judgment needed, rather than strict rules Continuous data- always grouping, size of interval and where start class intervals can affect appearance of graph if have too few classes without grouping for nominal data- can't be helped for continuous data- need to measure more precisesly |
|
Important characteristics of a distribution |
location- where center is dispersion- how scattered data are skewness- symmetrical or not |
|
Probability distributions |
distribution of a variable in the whole population theoretical probability distributions are often used to approximate the distribution of a variable in the population from which a sample has been drawn e.g. normal distribution |
|
Mosaic plot |
bar area represents the relative frequency within each group width of each vertical stack is proportional to the number of observations in that group |
|
Two categorical variables |
Contingency table grouped bar graph mosaic plot |
|
Two numerical |
Scatter plot Line graph Maps |
|
Numerical and categorical |
box plots dot plots multiple histograms cumulative frequency distributions |
|
Guidelines for effective graphs |
1. make graph elements clear (label axes, include units for measurements, simple font, large text, distinguishable graph symbols, colors easy to see) 2. Show actual data points 3. Make patterns easy to see (avoid chart junk and avoid putting too much info in one graph)(draw figures clearly and minimize clutter) 4. Represent magnitudes honestly (baselines of bar graphs and histograms should be zero) |
|
How to make good tables |
1. Make patterns easy to see (no more digits than necessary, not too much data, arrange to aid pattern detection, list categorical data in natural order if applicable or from highest freq to lowest freq) 2. Represent magnitudes honestly 3. Label clearly |