• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/47

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

47 Cards in this Set

  • Front
  • Back

Statistics

analysis and interpretation of data to objectively evaluate the reliability of conclusions based on the data

Goals of Statistics

Estimate values of important parameters and test hypotheses about those parameters




(can assess differences between groups and the relationship between variables)

Descriptive statistics

procedures to summarize/organize data


statistics of location (measure of central tendency, some average value [mean, median, mode])


statistics of dispersion, spread, variability of data (variance, standard deviation)


shape of frequency distribution

Inferential statistics

enables generalized conclusions from data (using sample data to make statements/conclusions/inferences about whole population)


evaluates two kinds of questions:


1. How reliable are the results (via setting confidence limits to sample stats)


2. How probable is it that the results are due to chance alone? (differences between observed and expected)


Involves hypothesis testing via statistical tests (t-tests, ANOVA, Goodness of Fit) to decide whether or not to reject the underlying null hypothesis

Hypothesis testing

comparing two or more statistics' values to determine if they are the same or different

H0 null hypothesis

no difference between the observed and expected

Halt alternative hypothesis

different between observed and expected

variable

characteristic that differes among individuals/ smapling units of a population (e.g. length, weight, # of bristles, genotype, N-content, speed)

population

total set of individuals about which one wishes to draw conclusions (make inferences)

sample

a collection of observations/measurements (data)


Subset of population (pop. usually too big to sample all)

parameter

parametric statistic


parametric value for an attribute


true value for the whole population (usually unknown)

sample statistic

statistic estimate= the value for the attribute calculated from your sample




Population parameters are constants, vs. population estimates will change from one random sample to the next, even though the sample was taken from the same population.

variability

real world.


sampling will lead to uncertainty (not perfectly representative)

sampling error

chance difference between a sample estimate and the true parameter for the whole population


large samples= smaller sampling error


Statistics involve the measurement and examination of uncertainty.


statistics become necessary when observations are variable.

How closely does sample reflect true values?

Poor data (unrepresentative, pseudoreplicated) can lead to erroneous conclusions


Inadequate data (too small sample size) leads to inconclusive results.

Statistical considerations can aid in design of experiments to prevent:

wrong manner of sampling


too small sample size


unnecessary effort


how you design your experiment will determine the power of different statistical tests

Experimental design

constructing hypotheses


how to set up experiment


type of data to collect


how to collect data and how much data to collect


goals- to reduce bias and sampling error

Bias

a systematic discrepancy between estimates and the true population value

random sampling

minimizes bias and permits sampling error estimate


reliable statistic gives estimate which are close to the true value

Stats based on probabalistic distributions which assume

random sampling and independence of sampels

Different types of sampling

Random sampling= best but takes more time (each member of pop has equal and independent chance of being selected)


(random number generator)


Systematic sampling= can be effectively random, much faster, but must make sure no trends are in phase with sampling scheme

Inappropriate Sampling

Sampling of convenience not likely to be unbiased and independent


Judgment sampling which is always biased


Volunteer bias- volunteers are likely to be different from the average member of the population

Numerical variable

Quantitative, measurement data


continuous vs discontinuous


measured vs. counted

Categorical variable


Nominal

Name/describe categories or attributes


qualitative- cannot be measured, not numerical


but you calculate frequencies for each category (either absolute or relative frequencies)


Must use different statistical tests for this type of data

Categorical variable


Ranked

Ordinal


variable gives information about a data point's rank or order but doesn't tell you exact numerical values of the data points


less information than measurement type but still useful


Mostly categorical (life stages, snake bite severity, size classes, socio-economic categories)


can be numerical rankings but still only know ranking not actual magnitudes of a measurement (numerical score answers on surveys)

Independent vs dependent variables

try to predict or explain a response variable from an explanatory variable

Experimental vs. observational study

Experimental- researcher assigns treatments randomly to individuals


Observational- researcher does not assign treatments

Accuracy vs. Precision

Accuracy- how close to true value


Precision- repeatability (number of digits indicates measurement precision level).


(round to one decimal place more than the original measurements)


Rounding- round up if next decimal place is greater than or equal to five



Derived variables

calculated from two or more measured variables


ratios, indices, rates, etc.


lose information


may not be normally distributed

Frequency distributions

distribution of the total number of observations for a variable


shows how often each value of a variable occurs in a sample


absolute frequency- count of how many individuals fit or occur in the category


relative frequency- proportion of total data set represented by each (fraction of occurrences of each value of a variable)

Ways to organize data to show distribution

list data, frequency table, graph or diagram of frequency distribution


graph (one variable)- curves, bar graphs, and histograms

Bar graph

categorical and discontinuous numerical data


bars do not touch

Histograms

Continuous numerical data, grouped into bins


bars do touch


y-axis is frequency density


can't just compare bar heights unless all bars are same width


must calculate area of bar to know absolute frequency

Cumulative frequency

Sum of all frequencies so far (as move on to higher x values)

Dot plots

if few data points for discontinuous data or continuous data without a wide range of values

Frequency polygon

not common


points are places where top of bar would be, then connected with straight lines, don't try to read data between points

Steam and leaf display

stem- leading digits


leaves- last significant digit (for each data point with that leading digit)


must keep spacing the same between digits


not real graph


freq. distribution with appearance like histogram


ranking, too, so easy to find median,


easy to make and check


does not lose specific information

Grouped histograms

Comparing numerical variables between groups

Grouping of classes

Needed when there are too many or too few data pints per x-value


rules of thumb:


10-20 classes usually good, unless small or very large data set


sq. root of n , for total n observations


Sturge's rule


good judgment needed, rather than strict rules




Continuous data- always grouping,


size of interval and where start class intervals can affect appearance of graph


if have too few classes without grouping


for nominal data- can't be helped


for continuous data- need to measure more precisesly

Important characteristics of a distribution

location- where center is


dispersion- how scattered data are


skewness- symmetrical or not

Probability distributions

distribution of a variable in the whole population


theoretical probability distributions are often used to approximate the distribution of a variable in the population from which a sample has been drawn


e.g. normal distribution

Mosaic plot

bar area represents the relative frequency within each group


width of each vertical stack is proportional to the number of observations in that group

Two categorical variables

Contingency table


grouped bar graph


mosaic plot

Two numerical

Scatter plot


Line graph


Maps

Numerical and categorical

box plots


dot plots


multiple histograms


cumulative frequency distributions

Guidelines for effective graphs

1. make graph elements clear (label axes, include units for measurements, simple font, large text, distinguishable graph symbols, colors easy to see)


2. Show actual data points


3. Make patterns easy to see (avoid chart junk and avoid putting too much info in one graph)(draw figures clearly and minimize clutter)


4. Represent magnitudes honestly (baselines of bar graphs and histograms should be zero)

How to make good tables

1. Make patterns easy to see (no more digits than necessary, not too much data, arrange to aid pattern detection, list categorical data in natural order if applicable or from highest freq to lowest freq)


2. Represent magnitudes honestly


3. Label clearly