Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
61 Cards in this Set
- Front
- Back
Data |
Recorded values whether numbers or labels, together with their context |
|
Data Table |
An arrangement of data in which each row represents a case and each column represents a variable |
|
Context |
The context ideally tells who was measured, what was measured, how the data were collected, where the data were collected, and when and why the study was performed |
|
Case |
An individual about whom or which we have data |
|
Respondent |
Someone who answers, or responds to, a survey |
|
Subject |
A human experimental unit. Also called a participant |
|
Participant |
A human experimental unit. Also called a subject |
|
Experimental Unit |
An individual in a study for which or for whom data values are recorded. Human experimental units are usually called subjects or participants |
|
Record |
Information about an individual in a database |
|
Sample |
A subset of a population, examined in hope of learning about the population |
|
Population |
The entire group of individuals or instances about whom we hope to learn |
|
Variable |
A variable holds information about the same characteristic for many cases |
|
Categorical Variable |
A variable that names categories with words or numerals |
|
Nominal Variable |
The term "nominal" can be applied to a variable whose values are used only to name categories |
|
Quantitative Variable |
A variable in which the numbers are values of measured quantities with units |
|
Units |
A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams |
|
Identifier Variable |
A categorical variable that records a unique value for each case, used to name or identify it |
|
Ordinal Variable |
The term "ordinal" can be applied to a variable whose categorical values possess some kind of order |
|
Frequency Table |
A frequency table lists the categories in a categorical variable and gives the count (or percentage) of observations for each category |
|
Distribution |
The distribution of a variable gives the possible values of the variable and the relative frequency of each value |
|
Area Principle |
In a statistical display, each value should be represented by the same amount of area |
|
Bar Chart |
Bar Charts show a bar whose area represents the count of observations for each category of a categorical variable |
|
Pie Chart |
Pie charts show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category |
|
Categorical Data Condition |
The methods in this chapter are appropriate for displaying and describing categorical data. Be careful not to use them with quantitative data |
|
Contingency Table |
A contingency table displays counts and, sometimes, percentages of individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once to reveal possible patterns in one variable that may be contingent on the category of the other |
|
Marginal Distribution |
In a contingency table, the distribution of either variable alone is called the marginal distribution. The counts or percentages are the totals found in the margins of the table |
|
Conditional Distribution |
The distribution of a variable restricting the who to consider only a smaller group of individuals is called the conditional distribution |
|
Independence |
Variables are said to be independent if the conditional distribution of one variable is the same for each category of the other. We'll show how to check for independence in a later chapter |
|
Segmented Bar Chart |
A segmented bar chart displays the conditional distribution of a categorical variable within each category of another variable |
|
Simpson's Paradox |
When averages are taken across different groups, they can appear to contradict the overall averages. This is knows as "Simpson's Paradox" |
|
Distribution |
The distribution of a quantitative variable slices up all the possible values of the variable into equal-width bins and gives the number of values falling into each bin |
|
Histogram |
A histogram uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency of values falling in each bin |
|
Gap |
A region of the distribution where there are no values |
|
Stem-and-Leaf Display |
A display that shows quantitative data values in a way that sketches the distribution of the data. It's best described in detail by example |
|
Dotplot |
A dotplot graphs a dot for each case against a single axis |
|
Shape |
To describe the shape of a distribution, look for single vs. multiple modes, symmetry vs. skewness, outliers and gaps |
|
Mode |
A hump or local high point in the shape of the distribution of a variable. The apparent location of modes can change as the scale of a histogram is changed |
|
Unimodal |
Having one mode. This is a useful term for describing the shape of a histogram when it's generally mound-shaped |
|
Bimodal |
Distributions with two modes |
|
Multimodal |
Distributions with more than two modes |
|
Uniform |
A distribution that doesn't appear to have any mode and in which all the bars of its histogram are approximately the same height |
|
Symmetric |
A distribution is symmetric if the two halves on either side of the center look approximately like mirror images of each other |
|
Tails |
The parts of a distribution that typically trail off on either side. Distributions can be characterized as having long tails or short tails |
|
Skewed |
A distribution is skewed if it's not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left when the longer tail stretches to the left, and skewed right when it goes to the right |
|
Outliers |
Outliers are extreme values that don't appear to belong with the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there's no obvious way to tell. Don't delete outliers automatically - you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them. Boxplots display points more than 1.5 IQR from either end of the box individually, but this is just a rule-of-thumb and not a definition of what is an outlier |
|
Center |
The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of center include the mean and median |
|
Median |
The median is the middle value, with half of the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR |
|
Spread |
A numerical summary of how tightly the values are clustered around the center. Measures of spread include the IQR and standard deviation |
|
Range |
The difference between the lowest and highest values in a data set |
|
Quartile |
The lower quartile is the value with a quarter of the data below it. The upper quartile has three quarters of data below it. The median and quartiles divide data into four parts with equal numbers of data values |
|
Percentile |
The ith percentile is the number that falls above i% of the data |
|
Interquartile Range |
The IQR is the difference between the first and third quartiles. It is usually reported along with the median |
|
5-Number Summary |
The 5-number summary of a distribution reports the minimum value, Q1, the median, Q3, and the maximum value |
|
Boxplot |
A boxplot displays the 5-number summary as a central box with whiskers that extend to the nonoutlying data values. Boxplots are particularly effective for comparing groups and for displaying possible outliers |
|
Mean |
The mean is found y summing all the data and dividing by the count: It is usually paired with the standard deviation |
|
Resistant |
A calculated summary is said to be resistant if outliers have only a small effect on it |
|
Variance |
The variance is the sum of squared deviations from the mean, divided by the count minus 1, it is useful in calculations later in the book |
|
Standard deviation |
The standard deviation is the square root of the variance |
|
Comparing Distributions |
When comparing the distributions of several groups using histograms or stem-and-leaf displays, consider their shape, center, and spread |
|
Comparing Boxplots |
When comparing groups with boxplots compare the shapes, compare the medians, compare the IQRs, and check for possible outliers |
|
Timeplot |
A timeplot displays data that changes over time. Often successive values are connected with lines to show trends more clearly. Sometimes a smooth curve is added to the plot to help show long-term patterns and trends |