Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
31 Cards in this Set
 Front
 Back
labeled scatterplot

device for including information from a categorical variable into a scatterplot; assigns different labels to the dots in a scatterplot


scatterplot

the simplest graph for displaying two quantitative variables simultaneously; it uses a vertical axis for one of the variables and a horizontal axis for the other. A dot is placed for each observational pair at the intersection of its two values.


response variable

the variable to be predicted; the convention is to place this variable on the vertical axis


explanatory variable

the variable to do the predicting; the convention is to place this variable on the horizontal axis


positively associated

if larger values of one variable tend to occur with larger values of the other variable


negatively associated

if larger values of one variable tend to occur with smaller values of the other


correlation coefficient

a measure of the linear relationship between two quantitative variables
has to be between +1 and 1. It can equal one of those values when the observations form a perfectly straight line. 

difference between association and causation

two variables may be strong associated without a causeandeffect relationship existing between them. Often the explanation is that both variables are related to a third variable not being measured; this variable is often called a lurking or confounding variable.


least squares regression

a technique for modeling the relationship between two quantitative variables


least squares

a criterion that says to choose the line that minimizes the sum of squared vertical distances from the points to the line


prediction

one can use the regression line to predict the value of the yvariable for a given value of the xvariable simply by plugging that value of x into the equation of the regression line; finding the yvalue of the point on the regression line corresponding to the xvalue of interest


extrapolation

trying to predict y for values of x beyond those contained in the data


fit

the part that is explained by the model


residual

the "leftover" part that is either the result of chance variation or of variables not measured


fitted value

the yvalue that the regression line would predict for the xvalue of that observation


residual

the difference between the actual yvalue and the fitted value y; measures the vertical distance from the observed yvalue to the regression line


proportion of variability

the square of the correlation coefficient, written r^2; provides a measure of how closely the points fall to the least squares line and thus also provides an indication of how confident one can be of predictions made with the line


residual plots

can be used to indicate when the linear relationship is not a satisfactory model for describing the relationship between two variables


transformation

can be used to create a linear relationship between variables


outliers

observations with large (in absolute value) residuals; outliers fall far from the regression line, not following the pattern of the relationship apparent in the others


influential observation

one whose removal would substantially affect the regression line


sidebyside stemplot

a common set of stems is used in the middle of the display with leaves for each category branching out in either direction, one to the left and one to the right
the convention is to order the leaves from the middle out toward either side 

statistical tendency

pertains to average or typical cases but not necessarily to individual cases


outlier

observation lying more than 1.5 times the interquartile range away from the nearer quartile


modified boxplot

treats outliers differently by marking them with a special symbol (*) and then extending the boxplot's "whiskers" only to the most extreme nonoutlying value


twoway table

classifies each person according to two variables
it is a 2 x 3 table; the first number represents the number of categories of the row variable, and the second number represents the number of categories of the column variable the explanatory variable should be in columns and the response variable in rows 

marginal and conditional distributions

conditional distributions: distributions of one variable for given categories of the other variable


segmented bar graph

conditional distributions can be represented visually
contain segments whose lengths correspond to the conditional proportions 

Simpson's paradox

aggregate proportions can reverse the direction of the relationship seen in the individual pieces


independence

two categorical variables are said to be independent if the conditional distributions of one variable are identical for every category of the other variable


relative risk

the ratio of the proportions having the disease between the two groups of the explanatory variable
