• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/34

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

34 Cards in this Set

  • Front
  • Back

Name and describe the 2 broad categories of statistics.

Descriptive – summarizes the main features of a data set with measures such as the mean and standard deviation


Inferential – generalizes from observed data to the world at large

Define population.

Theset of entities about which we make inferences.

Define population distribution.

The frequency histogram of all possible values of an experimental variable.


We are typically interested in inferring the mean (µ) and s.d. (σ) of a population, which characterize its location and spread, respectively.

Give a rule of thumb for interpreting standard deviation.

-39% of measurements fall within 0.5 σ


-68% of measurements fall within 1 σ


-95% of all measurements fall within 2 σ


-99.7% of all measurements fall within 3 σ


Interquartile range is more appropriate for populations that are not approximately normal

Define sample.

A set of data drawn from the population, characterized by the number of data points n, usually denoted X and indexed by numerical subscript (X1); larger samples better approximate the population.


The sample mean and s.d. are denoted by Xbar and s, respectively.

Define sampling distribution.

Sample parameters have their own distribution called the sampling distribution which is constructed by considering all possible samples of a given size.


Sample distribution parameters are marked with a subscript of the associated sample variable (e.g µXbar and σXbar are the mean and s.d. of the sample means ofall samples).

Describe the Central Limit Theorem (CLT).

The CLT tells us that the distribution of sample means will become increasingly close to a normal distribution as the sample size increases, regardless of the shape of the population distribution as long as the frequency of extreme values drops off quickly.

How are population and sample distribution parameters related according to the CLT? What do these equations tell us?

µXbar = µ


σXbar = σ/√n


These equations tell us that as n increases, the spread of the sample means (σXbar) will decrease (our samples will have more similar means) but σ will not change (sampling has no effect on the population)


The 1/√n term tells us that in order to double the precision of our estimate we must collect 4x as much data

Define standard error of the mean (s.e.m or SE-Xbar). Give its equation.

As for the population, the sample distribution is not directly measurable as we do not have access to all possible samples; s.e.m. is the measured spread of sample means and is used to estimate σXbar



s.e.m = s/√n

Name the 3 common types of error bar.

1. Standard deviation (s.d.)


2. Standard error of the mean (s.e.m.)


3. Confidence interval (CI)

What does an s.d. error bar tell us?

S.d. error bars inform us about the spread of the population and are therefore useful as predictors of the range of new samples. They reflect the variation in the data and not the error in your measurement.

What does an s.e.m. error bar tell us?

S.e.m. error bars reflect uncertainty in the mean and its dependency on sample size (s.e.m = s.d./√n); thus, they shrink as we perform more measurements.


The idea that “if s.e.m. bars do not overlap the difference is statistically significant” is wrong.

What does a confidence interval error bar tell us?

A CI error bar is an interval estimate that indicates the reliability of a measurement; when scaled to a specific confidence level, the bar captures the population mean CI% of the time. The CI does not capture the mean of a second sample drawn from the same population with a CI% chance, as CI position and size varies with each sample.

Define test statistic.

The test statistic is a transformation of the mean to a value determined by the difference of the sample and population means divided by the s.e.m:


D = (xbar - µ)/( sx/√n).

What does the Student's t distribution tell us? How does it work?

The extent to which the test statistic departs from Normal. This departure is due to the fact that the sample variance (sx^2) underestimates the variance of the null distribution, and this underestimate is worse for small n where it is more likely we observe a variance smaller than that of the population The t distribution accounts for this underesimtation by having higher tails than Normal; as n grows, t more closely resembles Normal. If we used the Normal distribution rather than the t distribution, we would overestimate the significance of our finding.

How do statistical tests work, in general?

Observationsare assumed to be from the null distribution (H0) with mean µ0.We reject H0 for values larger than x* with an error rate α. Thealternative hypothesis (HA) is the competing scenario with adifferent mean µA. Values sampled from Ha smaller than x*do not trigger rejection of H0 and occur at a rate of β.

Define p value.

The p value is the probability of sampling another observation from the null distribution that is as far or farther away from µ as our original observation.

Define Type I error (α).

The probability of falselyrejecting H0 (i.e., false positive).

Define TypeII error (β).

The probability of failing to appropriately reject H0 if the data are drawn from HA(i.e., false negative).

Define power (sensitivity).

In the context of a typical statistical testpower is defined as the chance of appropriately rejecting H0 if thedata are drawn from HA. It is calculated from the area of HAin the H0 rejection region (i.e., greater than x*).


o Power = 1 – β


o Traditionally set to 0.8.

Define specificity.

The probability of a true negative. Equal to 1 –α.

Define effect size (d).

The difference between the mean of the null andalternative hypotheses, expressed in units of their s.d., σ.


o d = (µA - µ0)/σ

Define positive predictive value (PPV).

The fraction of positive results that arecorrect.


o PPV = true positive / (true positive + falsepositive).

Define noncentrality parameter.

A combination of effect size and sample size, d * √n.

How can power be increased without altering α?

o Increase sample size – decreases the spread ofthe distribution of sample averages in proportion to the s.e.m.


o Increase the size of the effect (d) we want toreliably detect; a larger effect size may be induced with a more extremeexperimental treatment.


Whatsample size is required in order for histograms and boxplots to be informative?

Histograms – 30


Boxplots – 5

Describebox plots.

Box plots characterize a sample using the 25th,50th (median), and 75th percentiles and the interquartilerange (75th – 25th). A box plot shows a box whose lengthis the IQR and whose width is arbitrary. A line inside the box shows he median.Whiskers are conventionally extended to the most extreme data point no morethan 1.5 x IQR from the edge of the box (Tukey style) or all the way to minimumand maximum data values (Spear style). Outliers beyond the whiskers may beindividually plotted. In some cases a notch is used to show the 95% confidenceinterval for the median, given by m ± 1.58 x IQR / √n, however it is importantto note that this is an approximation based on the normal distribution. Lack ofoverlap between notches generally indicates statistical significance, but lackof overlap does not rule out significance.

What arethe advantages of box plots? (3)

1. Require smaller sample size than other methodssuch as histograms to be informative


2. More readily compared across 3 or more samples


3. Quartiles are insensitive to outliers andpreserve information about the center and spread, making them ideal forasymmetric or irregularly shaped population distributions and for samples withextreme outliers

What are the disadvantages of using a test/validation set for estimating model performance? (2)

1. Estimate of test error rate may be highly variable depending on which samples end up in the training set (this is called model variance).


2. Models often perform worse when trained on less data, so test set rate tends to overestimate the true model error (i.e., suggests your model is worse than it is).

What advantages does leave-one-out cross validation (LOOCV) have over use of a test set? (2) What is a major disadvantage and how can it be offset?

1. Less bias - LOOCV does not overestimate the test error as much does using a test set.


2. Lack of randomness in how the data is split means that LOOCV wil give the same result each time it is run on the same data.


A major disadvantage of LOOCV is that it is computationally expensive. This can be offset by using k-fold CV instead.

Why does k-fold CV often given a more accurate estimate of model test error rate than LOOCV?

Due to the bias-variance tradeoff. Although k-fold CV performs worse than LOOCV in terms of bias, it performs better in terms of variance and overall shows a better balance between the two.

What is bias?

Bias is a type of model error taken as the difference between the expected (or average) output of the model and the correct value we are trying to predict. It tends to result from an overly simplistic (under-fit) model.

What is variance?

Variance is a type of model error taken as the variability of a model prediction for a given data point. High variance indicates that small changes in the training data have a large impact on the results of the model.

What is bootstrapping?

Resampling with replacement.