Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
99 Cards in this Set
- Front
- Back
regression line
|
-a straight line that describes how a response variable y changes as an explanatory variable x changes.
-used as a tool to predict the value of y given x |
|
fitting a line
|
drawing a line that comes as close as possible to the points
|
|
equation for a regression line
|
y=a+bx
a=intercept b=slope |
|
slope
|
rate of change in y as x changes
|
|
extrapolation
|
the use of the regression line for prediction far outside the range of values of the explanatory variable
|
|
least squares regression line
+/-? |
-error=observed gain -predicted gain
-it always passes through the point xbar, ybar errors are positive if they lie above the line negative below |
|
equation for least squares regression
|
if a =ybar -bxbar
and b= =r (sy/sx) then the least squares regression formula is: yhat=a + bx |
|
b=r (sy/sx)
|
a change of one standard deviationin x corresponds to a change of r standard deviations in y
|
|
r squared
|
squares of the correlation-the fraction of the variation in the values of y that is explained by the least squares regression of y on x
|
|
residuals
|
the difference between an observed value and the value predicted by the regression line
-the mean of the least squares residual is always 0 |
|
residuals formula
|
observed y-predicted y
y-yhat |
|
outlier
|
an observation that lies outside the overall pattern of the other observations
|
|
influential
|
an observation is ____ for a statistical calculation if removing it would markedly change the result of the calculation
|
|
lurking variable
|
a variable that is not among the explanatory or response variable in the study and yet may influence
|
|
r square in regression
|
the square of the correlation r square, is the fraction of the variation in the values of y that is explained by the least squares regression of y on x
|
|
r=0
|
means that there is no linear function and that it is undefined
|
|
ybar
|
average
(response variable+rv+rv+rv)/# of subjects) |
|
x
|
-always on horizontal axis
-explanatory variable |
|
explanatory variable
|
explains or causes changes in response variables
|
|
y
|
-always on the vertical axis
-response variable |
|
response variable
|
measures an outcome
|
|
total sum of squares
|
-square the deviation for each subject and sum the squares around the mean ybar
-SST |
|
formula for SST
|
=∑(y-ybar)squared
|
|
sum of squares of the errors
|
-SSE
-(y-yhat)squared |
|
yhat
|
x-1
|
|
sum of errors
|
always equals zero
|
|
r
|
correlation coefficient
|
|
correlation
|
-r is the slope of the least squares regression line when we measure both x and y in standardized units
-makes no distinction between explanatory and response variables -requires that both variables be quantitative -always satisfies -1≤r≤1 |
|
regression
|
the square of the correlation r is the fraction of the variance of one variable that is explained by least squares regression on the other variable
|
|
SRS
|
simple random sample
-each unit has the same chance of being chosen-unbiased -selection of one unit has no influence on the selction of another |
|
sample
|
-A sample is any subset of elements selected from the population.
-Subset of the a population. Group of creatures from which one gathers data with the intention of making inferences to all organisms that fit those criteria (i.e., the population). |
|
stratified sampling
|
the population is divided into homogenous groups
|
|
cluster sampling
|
the population is grouped into small clusters and an srs of clusters is drawn
all objects in the selected cluster are observed |
|
systemic sampling
|
randomly choose a unit from a population, then select every kth unit thereafter
|
|
multistage sampling
|
the sampling is chosen in stages. most opinion polls and other national samples use this method
|
|
experiment
|
-delibrately imposes some treatment on individuals in order to observe their responses
-an act or process that leads to a single result or outcome that cannot be predicted |
|
observational study
|
observes individuals and measures variables of interest but does not attempt to influence the responses
|
|
explanatory variable
|
attempts to explain or is purported to cause differences in a response variable
|
|
experimental units
|
the individuals on which the experiment is performed
|
|
subjects
|
if human, this is what an experimental unit is called
|
|
treatment
|
a specific experimental condition applied to the units
|
|
three basic principles of experimental design
|
replication
control randomization |
|
replication
|
the same treatments are assigned to different experimental units
|
|
control
|
any method that accounts for and reduces natural variability
|
|
confounding variable
|
one whose effects on the response are indistiguishable from those of the explanatory variable
|
|
comparison
|
a form of control where 2 or more treatments are compared to prevent confounding the effect of a treatment with other variables
|
|
blocks
|
a form of control where similarexperimental units are placed into groups
|
|
randomization
|
the use of chance to divide experimental units into groups
|
|
double blind
|
those that read the results are also unaware of the treatment
|
|
calculator key strokes to generate a simple random sample
|
clear lists
highlight list 1 f4:calc 4:probability a:random seed __ enter highlight list 1 f4:calc 4:probability 5:randint(1, population size, number of samples needed) |
|
the logic of randomization
|
-produces two groups of similar subjects before treatment
-comparative design ensures that outside variables influence equally -therefore the only reason for the data outcome is the response variable |
|
calculator key syrokes for least squares regression
|
-enter data for list1 and list2
-f4:calc -3:regressions -1:linreg(a+bx) -x:list1 -y:list2 -store reqeqn: y1(x) -enter -enter |
|
statistically significant
|
an observed effect so large that it would rarely occurby chance
|
|
r values
|
-does not change when we change the units of measure of either x or y or both
- -1≤r≤1 - +r indicates a positive association between the variables - -r indicates a negative association - near 0 indicate a weak linear relationship -strongly affected by outliers |
|
sentence for the interpretation of a regression line
|
"in our example, the slope means that the mean reaction time is increasing at a rate of .7 seconds per percent over the sampled range of drug amounts from 1% to 5%
|
|
voluntary response sample
|
people who choose themselves by responding to a general appeal-these are usually biased
|
|
random
|
we call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large # of repititions
-a kind of order that only occurs in the long run |
|
probability
|
the proportion of times the outcome of a random phenomenon would occur in a very long series of repititions
|
|
event
|
a specific collection of one or more outcomes
|
|
sample space
|
the collection of all possible outcomes of an experiment
|
|
equally likely outcomes
|
if a random phenomenon has k possible outcomes, all equally likely, then each individual outcome has probability 1/k. the probability of any event A is:
P(A)= outcomes in A/sample space =outcomes in A/k |
|
probability rule #1
|
the probability of P(A) of any event A satisfies 0≤P(A)≤1
|
|
probability rule #2
|
if S is the sample space in a probability model, then P(S)=1
|
|
probability rule #3
the addition rule for disjoint events |
two events A and B are disjoint if they have no outcomes in common and so can never occur together. if A and B are disjoint
P(A or B) = P(A) + P(B) |
|
probability rule #4
the complement rule |
the complement of any event A is the event that A does not occur, written as Ac. the complement rule states
P(Ac)=1-P(A) |
|
probability rule #5
the multiplication rule for independent events |
two events A and B are independent if knowing that one occurs does not change the probability that the other occurs. if A and B are independent then:
P(A and B)=P(A)P(B) |
|
probability rule #6
the general addition rule for unions of two events |
for any two events A and B
P(A or B) = P(A) +P(B) - P(A and B) |
|
probability rule #7
multiplication rule |
the probability that both of the two events A and B happen together can be found by P(A and B) = P(A)P(B I A)
here P(B I A) is the conditional probability that B occurs given the information that A occurs |
|
venn diagram
|
a picture that shows the sample space S as a rectangular area and events as areas within S
|
|
contingency table
|
cross tabulation tables
present frequency counts for combinations of 2 or more variables |
|
marginal probabilities
|
probabilities of single events
|
|
joint probabilities
|
P(A and B)
|
|
conditional probability
|
when P(A)>0, the conditional probability of B given A is
P(B I A) = P(A and B) ÷ P(A) |
|
union probabilities
|
P(A or B)
|
|
intersection
|
the intersection of events is the event that all of the events occur
|
|
tree diagram
|
use a table instead
|
|
baye's rule
|
A1, A2, ..., Ak are disjoint events whose probabilities are not 0 and add to 1
P(Ai l C) = [P(C l Ai)P(Ai)] / [P(C l A1)P(A1) + ... + P(Ak)P(C l Ak) C is another event whose probability is not 0 or 1 |
|
independent events
|
two events A and B that both have positive probability are independent if
(PB I A)=P(B) |
|
random variable
|
a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical value is assigned to each outcome
|
|
discrete random variable
|
-a countable number of possible values
-expressed with a table -a random variable that may assume either a finite number of values or an infinite sequence of values (number of units sold, customers who enter a bank in one day) |
|
continuous random variable
|
random variables such as weight and time, which may take on all values in a certain interval or collection of intervals
-uses density curves |
|
probability distribution
|
of a discrete random variable lists the possible values and their probabilities
|
|
two requirements for probabilities of discrete random variable
|
-every probability pi is a number between 0 and 1
-p1+p2+p3...+pk = 1 (also no numbers can be negative) |
|
≤ vs< in discrete random variable
|
they are not the same
|
|
to find the mean of a discrete random variable
|
E(X)=µx=∑x p(x)
|
|
to find the variance of a discrete random variable
|
Var(X)=sigma2/x = ∑ (x-µx)squared p(x)
|
|
to find the standard deviation of a discrete random variable
|
SD(X)=sigmax=square root of the variance
|
|
to find mean, variance and standard deviation of a discrete random variable on the calculator
|
enter values into list1, list2
f4:calc 1:1var stats list: list1 freq: list2 enter on the calc µx=xbar ∑(x-xbar)squared=sigma squared x = variance |
|
continuous random variable
|
takes on all values in an interval of numbers
|
|
probability distribution of a continuous random variable
|
-described by a density curve
-is always on or above the horizontal axis -has an area exactly 1 underneath it |
|
uniform distribution
|
-shape of a box
-find height by multiplying the interval along the x axis by something to equal 0 (interval between 2 and 6 = 4 4 x __ = 1 ___=1/4=.25 |
|
the probability density function
|
f(X) = 1/4, 2≤x≤6
0, elsewhere |
|
≤ vs<
|
for continuous random variables these are =
|
|
µx on a uniform distribution
|
lowest x + highest x ÷ 2 = mean
|
|
percentage from a normal distribution
|
from the question find:
x=what is varying standard deviation mean example: P(x≥80)=nmcdf(80, ∞, 75, 7.5) |
|
specific number from a percentage on a normal distribution
|
P(x≤k)=.98
k=invnm=.98, 75, 7.5 |
|
rules for means
|
rule 1: if X is a random variable and a and b are fixed numbers then
µ a+bx=a+bµx rule 2:if X and Y are random variables then µx+y = µx + µy µx-y = µx + µy |
|
rules for variances
|
rule 1: if X is a random variable and a and b are fixed numbers then
sigma squared a=bx=bsquared sigmasquaredx or sigma a+b=bsigmax rule 2: if X and Y are independent random variables then sigmasquared x+y = sigma squaredx +sigmasquaredy rule 1: if X is a random variable and a and b are fixed numbers then sigma squared a=bx=bsquared sigmasquaredx or sigma a+b=bsigmax rule 2: if X and Y are independent random variables then sigmasquared x-y = sigma squaredx +sigmasquaredy |
|
the law of large numbers
|
draw independent observations at random from any population with finite mean. as the number of observations are drawn increases, the mean of the observed values eventually approaches the mean of the population as closely as you specified and then stays that close
|