Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
82 Cards in this Set
- Front
- Back
Hypothesis testing
|
The formal process of testing that an individual measurement or statistic (such as a mean) estimates some characteristic of a population.
|
|
Alpha level
|
The point in a sampling distribution in which one rejects Ho. Alpha is the probability of a Type I error.
|
|
Null hypothesis
|
In hypothesis testing, the assertion that an individual measurement or statistic (such as a mean) estimates (or points to) a reference population. The null hypothesis is often symbolized as Ho.
|
|
Confidence interval
|
The interval in which some population parameter (such as a mean) exists with a given level of confidence, such as 95%.
|
|
Sampling distribution
|
A distribution that is constructed of a statistic (such as a mean) from a very large number of equally sized samples.
|
|
Standard error
|
The standard deviation of a statistic (such as a mean) from a sampling distribution.
|
|
Rival hypothesis
|
The logical alternative to the null hypothesis.
|
|
T-distribution
|
A bell shaped sampling distribution that is derived with small samples and the population standard deviation is unknown.
|
|
Degrees of freedom
|
In statistical analysis, the number of numbers that are free to take on a value without restriction.
|
|
One sample z test
|
In hypothesis testing, a z calculation where one tests the hypothesis that a sample mean points to a reference population mean. For this test, the sample size needs to be at least 30 and the population standard deviation is known.
|
|
One sample t test
|
In hypothesis testing, a t calculation where one tests the hypothesis that a sample mean points to a reference population mean. This test is calculated when the sample size falls below 30 and/or the population standard deviation is unknown.
|
|
Binary variable
|
Takes on 2 values.
|
|
EDA
|
A method and philosophy of data analysis begun by John Tukey which is designed to uncover information in data without interference of outlying values.
|
|
Resistance
|
An EDA property in which a calculation is not highly affected by outlying data values.
|
|
Re-expression
|
An EDA principle in which the display of data is aided by the use of nonlinear transformations, such as a logarithm or square root.
|
|
Residuals
|
The difference between a measurement and the value of the measurement that is predicted by some mathematical model.
|
|
Revelation
|
The primary goal of EDA in which one can see information carried by one's data.
|
|
Glyph
|
An image that communicates information without words.
|
|
Median
|
An average that is the middle number in an order set of data. The median has half the data below it and half above it.
|
|
Upper and lower hinges
|
An EDA term for the median of the upper/lower half of a batch of data.
|
|
Hinge spread
|
An EDA term that is the difference between the upper and lower hinges. The hinge spread is often called the fourth spread.
|
|
Stem and leaf diagram
|
An EDA figure that displays a distribution of data.
|
|
One-line display
|
A stem-and-leaf diagram in which the leaves of each stem are shown on one line.
|
|
Two-line display
|
A stem-and-leaf diagram in which the leaves of each stem are shown on two lines. The symbols * and . are used for stems 0-4 and 5-9, respectively.
|
|
Five-line display
|
A stem-and-leaf diagram in which the leaves of each stem are shown on five lines. The symbols *, t, f, s, and . are used for stems 0,1, 2,3, 4,5, 6,7, 8,9, respectively.
|
|
Box plot
|
An EDA schematic diagram comprised of a box and two lines that show the distribution of data.
|
|
Depth of a number
|
An EDA term to denote how far a number is in from the highest or lowest number in a batch of data. The greatest depth is the median.
|
|
Outlier
|
An observation that is numerically distant from the rest of the data.
|
|
Side by side stem and leaf diagram
|
Two stem-and-leaf diagrams placed next to each other that use a common set of stems.
|
|
Location, central tendency
|
Another name for average.
|
|
Spread, variation
|
The degree to which numbers differ from central tendency.
|
|
Frequency histogram and polygon
|
A frequency graph composed of vertical or horizontal rectangles that touch each other.
|
|
Bin width
|
The width of an interval that is used in constructing a frequency table or histogram.
|
|
Letter value display
|
Letters stand in for values of a box plot
|
|
Grouped data
|
Frequency data that are displayed in bins or intervals.
|
|
Variability
|
The degree to which numbers differ from central tendency.
|
|
Standard deviation
|
A measure of variability in which squared deviations from the mean are averaged.
|
|
Trimmed mean
|
An average in which a certain portion of numbers are deleted from the highest and lowest ends of an ordered batch of data.
|
|
Coded table
|
Table with coded values based on letter value display.
|
|
Arithmetic mean
|
An average in which the numbers are added and divided by the number of numbers.
|
|
Harmonic mean
|
A type of average that is calculated by way of the reciprocals of numbers. Specifically, the harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of a specified set of numbers.
|
|
Quadratic mean (RMS)
|
An average that is computed primarily for positive and negative numbers in which zero is a reference point.
|
|
Tukey trimean
|
A weighted average that uses the median and hinges.
|
|
Weighted arithmetic mean
|
An arithmetic mean in which all the numbers are weighted differentially.
|
|
Smoothing
|
An analytic method in which an underlying relationship between two variables is calculated transcending the noise in data.
|
|
Z score
|
A standardized score in which the mean of a data set is subtracted from a number and the difference is then divided by the standard deviation. The calculation tells one how far a number is above or below the mean in terms of standard deviations.
|
|
Sample space
|
The total possible outcomes in calculating a probability.
|
|
Event
|
Some specified occurrence for calculating a probability.
|
|
Normal distribution
|
In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution, defined on the entire real line. It has a bell-shaped probability density function.
|
|
Poisson distribution
|
A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time, space, distance, area or volume if these events occur independently with a known average rate.
|
|
Standard normal deviate
|
A z score of a normally distributed variable in a population.
|
|
Addition rule
|
When two events, A and B, are mutually exclusive, the probability that A or B will occur is the sum of the probability of each event such that P(A or B) = P(A) + P(B).
|
|
Multiplication rule
|
When two events, A and B, are mutually exclusive, the probability that A AND B will occur is the product of the probability of each event such that P(A and B) = P(A) x P(B).
|
|
Sensitivity
|
The probability of testing positive given that a subject has some condition.
|
|
Specificity
|
The probability of testing negative given that a subject does not have some condition.
|
|
Law of large numbers
|
In probability theory, the average of the results obtained from a large number of trials will converge on the expected value, and will tend to become closer as more trials are performed.
|
|
Prior probability
|
The probability of an event before any other relevant information is known (e.g., the probability of having TB before any other information is taken into account).
|
|
Posterior probability
|
A conditional probability. The probability of an event after knowledge of some relevant informtion is taken into account (e.g., the probability of having TB given a + Mantoux test).
|
|
Standard normal curve
|
A normal distribution with a mean of 0 and a standard deviation of 1.
|
|
Parameters
|
A characteristic of a distribution (such as the mean) in a population.
|
|
Statistics
|
A characteristic of a distribution (such as the mean) in a sample.
|
|
T-scores
|
Used in testing, a score that reflects one's relative standing in a reference group with a particular mean and standard deviation.
|
|
Conditional probability
|
The probability of an event given that some condition has been met.
|
|
Positive predictive value
|
The probability of having a condition given that a subject tests positive.
|
|
Negative predictive value
|
The probability of not having a condition given that a subject tests negative.
|
|
Contingency table
|
A table constructed with at least two factors that reveal the intersection of all levels. A contingency table is used in factorial ANOVA and with the chi-square test for association.
|
|
Nominal scale
|
A classification of data in which the numbers represent categories (e.g., 1 = Male, 2 = Female, etc.).
|
|
Ordinal scale
|
A classification of data in which the numbers represent a variable only interms of order (e.g., 1 = Low, 2 = Medium, 3 = High).
|
|
Interval scale
|
A classification of data in which the numbers represent a variable that is understood to have equal intervals of amount. In an interval scale, "0" is relative in that it does not mean the absence of quantity. The most common interval scales are temperature in Fahrenheit and Celsius.
|
|
Ratio scale
|
A classification of data in which the numbers represent a variable that is understood to have equal intervals of amount and a true zero point. Examples of ratio scales are time, distance, and density.
|
|
Student's t-test for independent samples
|
A calculation that tests the hypothesis that two sample means point to (or estimate) the same population mean. The term "independent" indicates that each subject is in one and only one sample.
|
|
Student's t-test for related samples
|
A calculation that tests the hypothesis that a sample mean of measurement differences points to (or estimates) a population mean of zero. The term "related" indicates that subjects act as their own control or that two different subjects have been matched or linked in some way. Two measurements are collected per subject or pair and the sample mean is computed by subtracting one from the other.
|
|
Sampling distribution for the difference of two sample means
|
A sampling distribution which is formed by selecting two random samples of equal size from the same population and then constructing a distribution of the differences of the means in each sample. This sampling distribution reveals what the difference between two sample means is expected to be when no treatment effects are present.
|
|
Standard error for the difference of means
|
The standard deviation of the differences in sample means used to form a Sampling Distribution for the Difference of Two Means.
|
|
Pooled variance
|
The average of the variances of two independent samples for estimating the variance of a measurement in a population.
|
|
Homogeneous variance
|
The assumption that the variability of measurements is similar among all study groups.
|
|
Satterthwaite adjustment
|
The calculation that is done for an independent samples Student t-Test when the homogeneity of variance assumption is violated.
|
|
One-tail test
|
A test of statistical significance in which the rival hypothesis is stated in one direction.
|
|
Two-tail test
|
A test of statistical significance in which the rival hypothesis is not stated in any particular direction.
|
|
Non-parametric (rank sum) tests
|
A class of tests of hypotheses that make no or few assumptions about the nature of a population distribution.
|
|
Mann-Whitney U test
|
The non-parametric analog of the Student t test for independent samples.
|
|
Rank sum tests
|
An alternate name for those non-parametric tests that rank the data and then analyze the ranks.
|