Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Data analysis and statistics

Data Analysis And Statistics

by katwills, Feb. 2017

Subjects: R, Python, Stats, etc.

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/25

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

25 Cards in this Set

Front
Back

	Name and describe the 2 broad categories of statistics.	Descriptive – summarizes the main features of adata set with measures such as the mean and standard deviation Inferential – generalizes from observed data tothe world at large
	Define population.	Theset of entities about which we make inferences.
	Define population distribution.	The frequency histogram of all possible valuesof an experimental variable. We are typically interested in inferring themean (µ) and s.d. (σ) of a population, which characterize its location andspread, respectively.
	Give a rule of thumb for interpreting standard deviation.	-39% of measurements fall within 0.5 σ -68% of measurements fall within 1 σ -95% of all measurements fall within 2 σ -99.7% of all measurements fall within 3 σ Interquartile range is more appropriate forpopulations that are not approximately normal
	Define sample.	A set of data drawn from the population,characterized by the number of data points n, usually denoted X and indexed bynumerical subscript (X1); larger samples better approximate thepopulation. The sample mean and s.d. are denoted by Xbar and s, respectively.
	Define sampling distribution.	Sample parameters have their owndistribution called the sampling distribution which is constructed byconsidering all possible samples of a given size. Sample distribution parameters are markedwith a subscript of the associated sample variable (e.g µXbar and σXbar are the mean and s.d. of the sample means ofall samples).
	Describe the Central Limit Theorem (CLT).	The CLT tells us that the distribution of samplemeans will become increasingly close to a normal distribution as the samplesize increases, regardless of the shape of the population distribution as longas the frequency of extreme values drops off quickly.
	How are population and sample distribution parameters related according to the CLT? What do these equations tell us?	µXbar = µ σXbar = σ/√n These equations tell us that as nincreases, the spread of the sample means (σXbar) will decrease (oursamples will have more similar means) but σ will not change (sampling has noeffect on the population) The 1/√n term tells us that in order todouble the precision of our estimate we must collect 4x as much data
	Define standard error of the mean (s.e.m or SE-Xbar). Give its equation.	As for the population, the sampledistribution is not directly measurable as we do not have access to all possible samples; s.e.m. is the measured spread of sample means and is used toestimate σXbar s.e.m = s/√n
	Name the 3 common types of error bar.	1. Standard deviation (s.d.) 2. Standard error of the mean (s.e.m.) 3. Confidence interval (CI)
	What does an s.d. error bar tell us?	S.d. error bars inform us about the spread of the population and are therefore useful as predictors of the range of new samples. They reflect the variation in the data and not the error in your measurement.
	What does an s.e.m. error bar tell us?	S.e.m. error bars reflect uncertainty in the mean and its dependency on sample size (s.e.m = s.d./√n); thus, they shrink as we perform more measurements. The idea that “if s.e.m. bars do not overlap the difference is statistically significant” is wrong.
	What does a confidence interval error bar tell us?	A CI error bar is an interval estimate thatindicates the reliability of a measurement; when scaled to a specificconfidence level, the bar captures the population mean CI% of the time. The CIdoes not capture the mean of a second sample drawn from the same populationwith a CI% chance, as CI position and size varies with each sample.
	Define test statistic.	The test statistic is a transformation of the mean to a value determined by the difference of the sample and population means divided by the s.e.m: D = (xbar - µ)/( sx/√n).
	What does the Student's t statistic tell us? How does it work?	The extent to which the test statistic departs from Normal. This departure is due to the fact that the sample variance (sx^2) underestimates the variance of the null distribution, and this underestimate is worse for small n where it is more likely we observe a variance smaller than that of the population The t distribution accounts for this underesimtation by having higher tails than Normal; as n grows, t more closely resembles Normal. If we used the Normal distribution rather than the t distribution, we would overestimate the significance of our finding.
	How do statistical tests work, in general?	Observationsare assumed to be from the null distribution (H0) with mean µ0.We reject H0 for values larger than x* with an error rate α. Thealternative hypothesis (HA) is the competing scenario with adifferent mean µA. Values sampled from Ha smaller than x*do not trigger rejection of H0 and occur at a rate of β.
	Define p value.	The p value is the probability of sampling another observation from the null distribution that is as far or farther away from µ as our original observation.
	Define Type I error (α).	The probability of falselyrejecting H0 (i.e., false positive).
	Define TypeII error (β).	The probability of failing to appropriately reject H0 if the data are drawn from HA(i.e., false negative).
	Define power (sensitivity).	In the context of a typical statistical testpower is defined as the chance of appropriately rejecting H0 if thedata are drawn from HA. It is calculated from the area of HAin the H0 rejection region (i.e., greater than x*). o Power = 1 – β o Traditionally set to 0.8.
	Define specificity.	The probability of a true negative. Equal to 1 –α.
	Define effect size (d).	The difference between the mean of the null andalternative hypotheses, expressed in units of their s.d., σ. o d = (µA - µ0)/σ
	Define positive predictive value (PPV).	The fraction of positive results that arecorrect. o PPV = true positive / (true positive + falsepositive).
	Define noncentrality parameter.	A combination of effect size and sample size, d * √n.
	How can power be increased without altering α?	o Increase sample size – decreases the spread ofthe distribution of sample averages in proportion to the s.e.m. o Increase the size of the effect (d) we want toreliably detect; a larger effect size may be induced with a more extremeexperimental treatment.

Share This Flashcard Set

Set the Language

Data Analysis And Statistics

Add to Folders

Upgrade to Cram Premium

Card Range To Study

25 Cards in this Set