• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/92

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

92 Cards in this Set

  • Front
  • Back
Exploratory Data Analysis
A method and philosophy of data analysis begun by John Tukey which is designed to uncover information in data without interference of outlying values.
Resistance
An EDA property in which a calculation is not highly affected by outlying data values.
Re-expression
An EDA principle in which the display of data is aided by the use of nonlinear transformations, such as a logarithm or square root.
Exploratory Data Analysis
A method and philosophy of data analysis begun by John Tukey which is designed to uncover information in data without interference of outlying values.
Resistance
An EDA property in which a calculation is not highly affected by outlying data values.
Re-expression
An EDA principle in which the display of data is aided by the use of nonlinear transformations, such as a logarithm or square root.
Residuals
The difference between a measurement and the value of the measurement that is predicted by some mathematical model.
Revelation
The primary goal of EDA in which one can see information carried by one's data.
Glyph
An image that communicates information without words.
Median
An average that is the middle number in an order set of data. The median has half the data below it and half above it.
Upper & Lower Hinges
The medians for the upper and lower halves of the data, respectively
Hinge Spread
An EDA term that is the difference between the upper and lower hinges. The hinge spread is often called the fourth spread.
Stem-and-Leaf Diagram
An EDA figure that displays a distribution of data.
One-Line Summary
A stem-and-leaf diagram in which the leaves of each stem are shown on one line.
Two-Line Summary
A stem-and-leaf diagram in which the leaves of each stem are shown on two lines. The symbols * and . are used for stems 0-4 and 5-9, respectively.
Five-Line Summary
A stem-and-leaf diagram in which the leaves of each stem are shown on five lines. The symbols *, t, f, s, and . are used for stems 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, respectively.
Boxplot
An EDA schematic diagram comprised of a box and two lines that show the distribution of data.
Depth
An EDA term to denote how far a number is in from the highest or lowest number in a batch of data. The greatest depth is the median.
Outlier
An observation that is numerically distant from the rest of the data.
Side-by-Side Stem-and-Leaf Diagram
Two stem-and-leaf diagrams placed next to each other that use a common set of stems.
Location; Central Tendency
Another name for average.
Spread, Variation, Variability
The degree to which numbers differ from central tendency
Frequency Histogram
A frequency graph composed of vertical or horizontal rectangles that touch each other.
Polygon
A type of frequency graph that is constructed by connecting dots by a single line. The resulting figure is a multi-sided figure.
Standard Deviation
The average distance of a datapoint from the mean
Bin Width
The width of an interval that is used in constructing a frequency table or histogram.
Letter Value DIsplay
A method of displaying simple statistical parameters including hinges, the statistical median, and upper and lower values.
Trimmed Mean
An average in which a certain portion of numbers are deleted from the highest and lowest ends of an ordered batch of data.
Geometric Mean
An average which is calculated by taking the nth root of the product of n numbers.
Coded Table
A table substituting symbols for each datapoint, based on that points position in the set; easily visualizes changes over time (e.g., sunspot activity throughout multiple years, crime rates, etc.).
Arithmetic Mean
An average in which the numbers are added and divided by the number of numbers.
Harmonic Mean
A type of average that is calculated by way of the reciprocals of numbers. Specifically, the harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of a specified set of numbers.
Quadratic Mean (RMS)
An average that is computed primarily for positive and negative numbers in which zero is a reference point.
Tukey Trimean
A weighted average that uses the median and hinges.
Weighted Arithmetic Mean
An arithmetic mean in which all the numbers are weighted differentially.
Mode
Most frequently occurring datapoint in a set
Smoothing
An analytic method in which an underlying relationship between two variables is calculated transcending the noise in data.
Grouped data
Frequency data that are displayed in bins or intervals.
Z Score
A standardized score in which the mean of a data set is subtracted from a number and the difference is then divided by the standard deviation. The calculation tells one how far a number is above or below the mean in terms of standard deviations.
Sample Space
The total possible outcomes in calculating a probability.
Event
Some specified occurrence for calculating a probability.
Normal Distribution
In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution, defined on the entire real line. It has a bell-shaped probability density function.
Poisson Distribution
A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time, space, distance, area or volume if these events occur independently with a known average rate.
Standard Normal Deviate
Z-score is a standard normal deviate. Basically, a standardized score.
Bayes Theorem (Rule)
A theorem describing how the conditional probability of a set of possible causes for a given observed event can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause
Addition Rule
When two events, A and B, are mutually exclusive, the probability that A or B will occur is the sum of the probability of each event such that P(A or B) = P(A) + P(B).
Multiplication Rule
When two events, A and B, are mutually exclusive, the probability that A AND B will occur is the product of the probability of each event such that P(A and B) = P(A) x P(B).
Sensitivity
The probability of testing positive given that a subject has some condition.
Specificity
The probability of testing negative given that a subject does not have some condition.
Law of Large Numbers
In probability theory, the average of the results obtained from a large number of trials will converge on the expected value, and will tend to become closer as more trials are performed.
Prior Probability
A probability as assessed before making reference to certain relevant observations, esp. subjectively or on the assumption that all possible outcomes be given the same probability.
Posterior Probability
The statistical probability that a hypothesis is true calculated in the light of relevant observations.
Standard Normal Curve
A normal distribution with a mean of 0 and a standard deviation of 1.
Parameters
A characteristic of a distribution (such as the mean) in a POPULATION.
Statistics
A characteristic of a distribution (such as the mean) in a SAMPLE.
T-scores
Used in testing, a score that reflects one's relative standing in a reference group with a particular mean and standard deviation. (N < 30)
Conditional Probability
The probability of an event given that some condition has been met.
Positive Predictive Value
The probability of having a condition given that a subject tests positive.
Negative Predictive Value
The probability of not having a condition given that a subject tests negative.
Contingency Table
A table constructed with at least two factors that reveal the intersection of all levels. A contingency table is used in factorial ANOVA and with the chi-square test for association.
Hypothesis Testing
The formal process of testing that an individual measurement or statistic (such as a mean) estimates some characteristic of a population.
Alpha Level
The point in a sampling distribution in which one rejects Ho. Alpha is the probability of a Type I error.
Null Hypothesis
In hypothesis testing, the assertion that an individual measurement or statistic (such as a mean) estimates (or points to) a reference population. The null hypothesis is often symbolized as Ho.
Confidence Interval
The interval in which some population parameter (such as a mean) exists with a given level of confidence, such as 95%.
Sampling Distribution
A distribution that is constructed of a statistic (such as a mean) from a very large number of equally sized samples.
Standard Error
The standard deviation of sample means used to form a Sampling Distribution.
Sampling Distribution for the Difference of Proportions
A sampling distribution which is formed by selecting two random samples of equal size from a binary population and then constructing a distribution of the differences of the proportion in each sample. This sampling distribution reveals what the difference between two sample proportions is expected to be when no treatment effects are present.
Sampling Distribution for a Proportion
A distribution that is constructed of sample proportions from a very large number of equally sized samples.
Sampling Distribution for a Mean
A distribution that is constructed of sample means from a very large number of equally sized samples.
Rival Hypothesis
The logical alternative to the null hypothesis.
T Distribution
A bell shaped sampling distribution that is derived with small samples and the population standard deviation is unknown.
Degrees of Freedom
In statistical analysis, the number of numbers that are free to take on a value without restriction.
Binary Variable
A variable that takes on only two values.
One-Sample Z Test
In hypothesis testing, a z calculation where one tests the hypothesis that a sample mean points to a reference population mean. For this test, the sample size needs to be at least 30 and the population standard deviation is known. (N >= 30, and sigma known))
One-Sample T Test
In hypothesis testing, a t calculation where one tests the hypothesis that a sample mean points to a reference population mean. This test is calculated when the sample size falls below 30 and/or the population standard deviation is unknown. (N < 30, and/or sigma unknown)
Nominal Scale
Number stands for non-numeric designation. (e.g., 1 = Yes, 2 = No; 1 = Do Not Know, 2= Very Strongly, etc.)
Ordinal Scale
A scale of magnitude in which the distances between two numbers is not necessarily equal. (However, numbers behave and numbers, per Sinacore.)
Interval Scale
A scale of measurement of data according to which the differences between values can be quantified in absolute but not relative terms and for which any zero is merely arbitrary: for instance, dates are measured on an interval scale since differences can be measured in years, but no sense can be given to a ratio of times
Ratio Scale
A scale of measurement of data which permits the comparison of differences of values; a scale having a fixed zero value. The distances travelled by a projectile, for instance, are measured on a ratio scale since it makes sense to talk of one projectile travelling twice as far as another
Student's t-Test for Independent Samples
A calculation that tests the hypothesis that two sample means point to (or estimate) the same population mean. The term "independent" indicates that each subject is in one and only one sample.
Student's t-Test for Related Samples
A calculation that tests the hypothesis that a sample mean of measurement differences points to (or estimates) a population mean of zero. The term "related" indicates that subjects act as their own control or that two different subjects have been matched or linked in some way. Two measurements are collected per subject or pai r and the sample mean is computed by subtracting one from the other.
Standard Error for the Difference of Means
The standard deviation of the differences in sample means used to form a Sampling Distribution for the Difference of Two Means.
Pooled Variance
The average of the variances of two independent samples for estimating the variance of a measurement in a population.
Central Limit Theorem
Given a population distribution with a mean µand standard deviation σ, the sampling distribution of the mean approaches a normal Central Limit Theorem (formulated in 1810) approaches a normal distribution with a mean of µ and a standard deviation (i.e. the standard error of the mean) equal to as N, the sample size, increases.

The amazing and counter-intuitive aspect of the
central limit theorem is that the sampling
distribution of the mean approaches normality no
matter what the shape of the original distribution.
In addition, for most distributions, a normal
distribution is approached quickly as N increases.
Homogeneous / Heterogeneous Variance
Homogeneous variance refers to a variance that is similar among all study groups, while heterogeneous variance implies the opposite.
Satterthwaite Adjustment
The calculation that is done for an independent samples Student t-Test when the homogeneity of variance assumption is violated.
One-tailed Test
A test of statistical significance in which the rival hypothesis is stated in one direction.
Two-tailed Test
A test of statistical significance in which the rival hypothesis is not stated in any particular direction.
Non-Parametric (Rank Sum) Tests
A class of tests of hypotheses that make no or few assumptions about the nature of a population distribution.
Mann-Whitney U Test
The non-parametric analog of the Student t-test for independent samples.
Wilcoxon Signed Ranks Test
The non-parametric analog of the Student t-test for related samples.
Sampling Distribution for the Difference of Two Means
A sampling distribution which is formed by selecting two random samples of equal size from the same population and then constructing a distribution of the differences of the means in each sample. This sampling distribution reveals what the difference between two sample means is expected to be when no treatment effects are present.