Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/61

Click to flip

61 Cards in this Set

  • Front
  • Back

Data

Recorded values whether numbers or labels, together with their context

Data Table

An arrangement of data in which each row represents a case and each column represents a variable

Context

The context ideally tells who was measured, what was measured, how the data were collected, where the data were collected, and when and why the study was performed

Case

An individual about whom or which we have data

Respondent

Someone who answers, or responds to, a survey

Subject

A human experimental unit. Also called a participant

Participant

A human experimental unit. Also called a subject

Experimental Unit

An individual in a study for which or for whom data values are recorded. Human experimental units are usually called subjects or participants

Record

Information about an individual in a database

Sample

A subset of a population, examined in hope of learning about the population

Population

The entire group of individuals or instances about whom we hope to learn

Variable

A variable holds information about the same characteristic for many cases

Categorical Variable

A variable that names categories with words or numerals

Nominal Variable

The term "nominal" can be applied to a variable whose values are used only to name categories

Quantitative Variable

A variable in which the numbers are values of measured quantities with units

Units

A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams

Identifier Variable

A categorical variable that records a unique value for each case, used to name or identify it

Ordinal Variable

The term "ordinal" can be applied to a variable whose categorical values possess some kind of order

Frequency Table

A frequency table lists the categories in a categorical variable and gives the count (or percentage) of observations for each category

Distribution

The distribution of a variable gives the possible values of the variable and the relative frequency of each value

Area Principle

In a statistical display, each value should be represented by the same amount of area

Bar Chart

Bar Charts show a bar whose area represents the count of observations for each category of a categorical variable

Pie Chart

Pie charts show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category

Categorical Data Condition

The methods in this chapter are appropriate for displaying and describing categorical data. Be careful not to use them with quantitative data

Contingency Table

A contingency table displays counts and, sometimes, percentages of individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once to reveal possible patterns in one variable that may be contingent on the category of the other

Marginal Distribution

In a contingency table, the distribution of either variable alone is called the marginal distribution. The counts or percentages are the totals found in the margins of the table

Conditional Distribution

The distribution of a variable restricting the who to consider only a smaller group of individuals is called the conditional distribution

Independence

Variables are said to be independent if the conditional distribution of one variable is the same for each category of the other. We'll show how to check for independence in a later chapter

Segmented Bar Chart

A segmented bar chart displays the conditional distribution of a categorical variable within each category of another variable

Simpson's Paradox

When averages are taken across different groups, they can appear to contradict the overall averages. This is knows as "Simpson's Paradox"

Distribution

The distribution of a quantitative variable slices up all the possible values of the variable into equal-width bins and gives the number of values falling into each bin

Histogram

A histogram uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency of values falling in each bin

Gap

A region of the distribution where there are no values

Stem-and-Leaf Display

A display that shows quantitative data values in a way that sketches the distribution of the data. It's best described in detail by example

Dotplot

A dotplot graphs a dot for each case against a single axis

Shape

To describe the shape of a distribution, look for single vs. multiple modes, symmetry vs. skewness, outliers and gaps

Mode

A hump or local high point in the shape of the distribution of a variable. The apparent location of modes can change as the scale of a histogram is changed

Unimodal

Having one mode. This is a useful term for describing the shape of a histogram when it's generally mound-shaped

Bimodal

Distributions with two modes

Multimodal

Distributions with more than two modes

Uniform

A distribution that doesn't appear to have any mode and in which all the bars of its histogram are approximately the same height

Symmetric

A distribution is symmetric if the two halves on either side of the center look approximately like mirror images of each other

Tails

The parts of a distribution that typically trail off on either side. Distributions can be characterized as having long tails or short tails

Skewed

A distribution is skewed if it's not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left when the longer tail stretches to the left, and skewed right when it goes to the right

Outliers

Outliers are extreme values that don't appear to belong with the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there's no obvious way to tell. Don't delete outliers automatically - you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them. Boxplots display points more than 1.5 IQR from either end of the box individually, but this is just a rule-of-thumb and not a definition of what is an outlier

Center

The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of center include the mean and median

Median

The median is the middle value, with half of the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR

Spread

A numerical summary of how tightly the values are clustered around the center. Measures of spread include the IQR and standard deviation

Range

The difference between the lowest and highest values in a data set

Quartile

The lower quartile is the value with a quarter of the data below it. The upper quartile has three quarters of data below it. The median and quartiles divide data into four parts with equal numbers of data values

Percentile

The ith percentile is the number that falls above i% of the data

Interquartile Range

The IQR is the difference between the first and third quartiles. It is usually reported along with the median

5-Number Summary

The 5-number summary of a distribution reports the minimum value, Q1, the median, Q3, and the maximum value

Boxplot

A boxplot displays the 5-number summary as a central box with whiskers that extend to the nonoutlying data values. Boxplots are particularly effective for comparing groups and for displaying possible outliers

Mean

The mean is found y summing all the data and dividing by the count: It is usually paired with the standard deviation

Resistant

A calculated summary is said to be resistant if outliers have only a small effect on it

Variance

The variance is the sum of squared deviations from the mean, divided by the count minus 1, it is useful in calculations later in the book

Standard deviation

The standard deviation is the square root of the variance

Comparing Distributions

When comparing the distributions of several groups using histograms or stem-and-leaf displays, consider their shape, center, and spread

Comparing Boxplots

When comparing groups with boxplots compare the shapes, compare the medians, compare the IQRs, and check for possible outliers

Timeplot

A timeplot displays data that changes over time. Often successive values are connected with lines to show trends more clearly. Sometimes a smooth curve is added to the plot to help show long-term patterns and trends