Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Biostatistics Midterm

Biostatistics Midterm

by JasonGantenberg, Sep. 2013

Subjects: biostatistics, epidemiology

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/92

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

92 Cards in this Set

Front
Back

	Exploratory Data Analysis	A method and philosophy of data analysis begun by John Tukey which is designed to uncover information in data without interference of outlying values.
	Resistance	An EDA property in which a calculation is not highly affected by outlying data values.
	Re-expression	An EDA principle in which the display of data is aided by the use of nonlinear transformations, such as a logarithm or square root.
	Exploratory Data Analysis	A method and philosophy of data analysis begun by John Tukey which is designed to uncover information in data without interference of outlying values.
	Resistance	An EDA property in which a calculation is not highly affected by outlying data values.
	Re-expression	An EDA principle in which the display of data is aided by the use of nonlinear transformations, such as a logarithm or square root.
	Residuals	The difference between a measurement and the value of the measurement that is predicted by some mathematical model.
	Revelation	The primary goal of EDA in which one can see information carried by one's data.
	Glyph	An image that communicates information without words.
	Median	An average that is the middle number in an order set of data. The median has half the data below it and half above it.
	Upper & Lower Hinges	The medians for the upper and lower halves of the data, respectively
	Hinge Spread	An EDA term that is the difference between the upper and lower hinges. The hinge spread is often called the fourth spread.
	Stem-and-Leaf Diagram	An EDA figure that displays a distribution of data.
	One-Line Summary	A stem-and-leaf diagram in which the leaves of each stem are shown on one line.
	Two-Line Summary	A stem-and-leaf diagram in which the leaves of each stem are shown on two lines. The symbols * and . are used for stems 0-4 and 5-9, respectively.
	Five-Line Summary	A stem-and-leaf diagram in which the leaves of each stem are shown on five lines. The symbols *, t, f, s, and . are used for stems 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, respectively.
	Boxplot	An EDA schematic diagram comprised of a box and two lines that show the distribution of data.
	Depth	An EDA term to denote how far a number is in from the highest or lowest number in a batch of data. The greatest depth is the median.
	Outlier	An observation that is numerically distant from the rest of the data.
	Side-by-Side Stem-and-Leaf Diagram	Two stem-and-leaf diagrams placed next to each other that use a common set of stems.
	Location; Central Tendency	Another name for average.
	Spread, Variation, Variability	The degree to which numbers differ from central tendency
	Frequency Histogram	A frequency graph composed of vertical or horizontal rectangles that touch each other.
	Polygon	A type of frequency graph that is constructed by connecting dots by a single line. The resulting figure is a multi-sided figure.
	Standard Deviation	The average distance of a datapoint from the mean
	Bin Width	The width of an interval that is used in constructing a frequency table or histogram.
	Letter Value DIsplay	A method of displaying simple statistical parameters including hinges, the statistical median, and upper and lower values.
	Trimmed Mean	An average in which a certain portion of numbers are deleted from the highest and lowest ends of an ordered batch of data.
	Geometric Mean	An average which is calculated by taking the nth root of the product of n numbers.
	Coded Table	A table substituting symbols for each datapoint, based on that points position in the set; easily visualizes changes over time (e.g., sunspot activity throughout multiple years, crime rates, etc.).
	Arithmetic Mean	An average in which the numbers are added and divided by the number of numbers.
	Harmonic Mean	A type of average that is calculated by way of the reciprocals of numbers. Specifically, the harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of a specified set of numbers.
	Quadratic Mean (RMS)	An average that is computed primarily for positive and negative numbers in which zero is a reference point.
	Tukey Trimean	A weighted average that uses the median and hinges.
	Weighted Arithmetic Mean	An arithmetic mean in which all the numbers are weighted differentially.
	Mode	Most frequently occurring datapoint in a set
	Smoothing	An analytic method in which an underlying relationship between two variables is calculated transcending the noise in data.
	Grouped data	Frequency data that are displayed in bins or intervals.
	Z Score	A standardized score in which the mean of a data set is subtracted from a number and the difference is then divided by the standard deviation. The calculation tells one how far a number is above or below the mean in terms of standard deviations.
	Sample Space	The total possible outcomes in calculating a probability.
	Event	Some specified occurrence for calculating a probability.
	Normal Distribution	In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution, defined on the entire real line. It has a bell-shaped probability density function.
	Poisson Distribution	A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time, space, distance, area or volume if these events occur independently with a known average rate.
	Standard Normal Deviate	Z-score is a standard normal deviate. Basically, a standardized score.
	Bayes Theorem (Rule)	A theorem describing how the conditional probability of a set of possible causes for a given observed event can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause
	Addition Rule	When two events, A and B, are mutually exclusive, the probability that A or B will occur is the sum of the probability of each event such that P(A or B) = P(A) + P(B).
	Multiplication Rule	When two events, A and B, are mutually exclusive, the probability that A AND B will occur is the product of the probability of each event such that P(A and B) = P(A) x P(B).
	Sensitivity	The probability of testing positive given that a subject has some condition.
	Specificity	The probability of testing negative given that a subject does not have some condition.
	Law of Large Numbers	In probability theory, the average of the results obtained from a large number of trials will converge on the expected value, and will tend to become closer as more trials are performed.
	Prior Probability	A probability as assessed before making reference to certain relevant observations, esp. subjectively or on the assumption that all possible outcomes be given the same probability.
	Posterior Probability	The statistical probability that a hypothesis is true calculated in the light of relevant observations.
	Standard Normal Curve	A normal distribution with a mean of 0 and a standard deviation of 1.
	Parameters	A characteristic of a distribution (such as the mean) in a POPULATION.
	Statistics	A characteristic of a distribution (such as the mean) in a SAMPLE.
	T-scores	Used in testing, a score that reflects one's relative standing in a reference group with a particular mean and standard deviation. (N < 30)
	Conditional Probability	The probability of an event given that some condition has been met.
	Positive Predictive Value	The probability of having a condition given that a subject tests positive.
	Negative Predictive Value	The probability of not having a condition given that a subject tests negative.
	Contingency Table	A table constructed with at least two factors that reveal the intersection of all levels. A contingency table is used in factorial ANOVA and with the chi-square test for association.
	Hypothesis Testing	The formal process of testing that an individual measurement or statistic (such as a mean) estimates some characteristic of a population.
	Alpha Level	The point in a sampling distribution in which one rejects Ho. Alpha is the probability of a Type I error.
	Null Hypothesis	In hypothesis testing, the assertion that an individual measurement or statistic (such as a mean) estimates (or points to) a reference population. The null hypothesis is often symbolized as Ho.
	Confidence Interval	The interval in which some population parameter (such as a mean) exists with a given level of confidence, such as 95%.
	Sampling Distribution	A distribution that is constructed of a statistic (such as a mean) from a very large number of equally sized samples.
	Standard Error	The standard deviation of sample means used to form a Sampling Distribution.
	Sampling Distribution for the Difference of Proportions	A sampling distribution which is formed by selecting two random samples of equal size from a binary population and then constructing a distribution of the differences of the proportion in each sample. This sampling distribution reveals what the difference between two sample proportions is expected to be when no treatment effects are present.
	Sampling Distribution for a Proportion	A distribution that is constructed of sample proportions from a very large number of equally sized samples.
	Sampling Distribution for a Mean	A distribution that is constructed of sample means from a very large number of equally sized samples.
	Rival Hypothesis	The logical alternative to the null hypothesis.
	T Distribution	A bell shaped sampling distribution that is derived with small samples and the population standard deviation is unknown.
	Degrees of Freedom	In statistical analysis, the number of numbers that are free to take on a value without restriction.
	Binary Variable	A variable that takes on only two values.
	One-Sample Z Test	In hypothesis testing, a z calculation where one tests the hypothesis that a sample mean points to a reference population mean. For this test, the sample size needs to be at least 30 and the population standard deviation is known. (N >= 30, and sigma known))
	One-Sample T Test	In hypothesis testing, a t calculation where one tests the hypothesis that a sample mean points to a reference population mean. This test is calculated when the sample size falls below 30 and/or the population standard deviation is unknown. (N < 30, and/or sigma unknown)
	Nominal Scale	Number stands for non-numeric designation. (e.g., 1 = Yes, 2 = No; 1 = Do Not Know, 2= Very Strongly, etc.)
	Ordinal Scale	A scale of magnitude in which the distances between two numbers is not necessarily equal. (However, numbers behave and numbers, per Sinacore.)
	Interval Scale	A scale of measurement of data according to which the differences between values can be quantified in absolute but not relative terms and for which any zero is merely arbitrary: for instance, dates are measured on an interval scale since differences can be measured in years, but no sense can be given to a ratio of times
	Ratio Scale	A scale of measurement of data which permits the comparison of differences of values; a scale having a fixed zero value. The distances travelled by a projectile, for instance, are measured on a ratio scale since it makes sense to talk of one projectile travelling twice as far as another
	Student's t-Test for Independent Samples	A calculation that tests the hypothesis that two sample means point to (or estimate) the same population mean. The term "independent" indicates that each subject is in one and only one sample.
	Student's t-Test for Related Samples	A calculation that tests the hypothesis that a sample mean of measurement differences points to (or estimates) a population mean of zero. The term "related" indicates that subjects act as their own control or that two different subjects have been matched or linked in some way. Two measurements are collected per subject or pai r and the sample mean is computed by subtracting one from the other.
	Standard Error for the Difference of Means	The standard deviation of the differences in sample means used to form a Sampling Distribution for the Difference of Two Means.
	Pooled Variance	The average of the variances of two independent samples for estimating the variance of a measurement in a population.
	Central Limit Theorem	Given a population distribution with a mean µand standard deviation σ, the sampling distribution of the mean approaches a normal Central Limit Theorem (formulated in 1810) approaches a normal distribution with a mean of µ and a standard deviation (i.e. the standard error of the mean) equal to as N, the sample size, increases. The amazing and counter-intuitive aspect of the central limit theorem is that the sampling distribution of the mean approaches normality no matter what the shape of the original distribution. In addition, for most distributions, a normal distribution is approached quickly as N increases.
	Homogeneous / Heterogeneous Variance	Homogeneous variance refers to a variance that is similar among all study groups, while heterogeneous variance implies the opposite.
	Satterthwaite Adjustment	The calculation that is done for an independent samples Student t-Test when the homogeneity of variance assumption is violated.
	One-tailed Test	A test of statistical significance in which the rival hypothesis is stated in one direction.
	Two-tailed Test	A test of statistical significance in which the rival hypothesis is not stated in any particular direction.
	Non-Parametric (Rank Sum) Tests	A class of tests of hypotheses that make no or few assumptions about the nature of a population distribution.
	Mann-Whitney U Test	The non-parametric analog of the Student t-test for independent samples.
	Wilcoxon Signed Ranks Test	The non-parametric analog of the Student t-test for related samples.
	Sampling Distribution for the Difference of Two Means	A sampling distribution which is formed by selecting two random samples of equal size from the same population and then constructing a distribution of the differences of the means in each sample. This sampling distribution reveals what the difference between two sample means is expected to be when no treatment effects are present.

Share This Flashcard Set

Set the Language

Biostatistics Midterm

Add to Folders

Upgrade to Cram Premium

Card Range To Study

92 Cards in this Set