Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
170 Cards in this Set
- Front
- Back
trial
|
one repetition of a experimental situation
|
|
outcome
|
what happens in the trial
|
|
relative frequency
|
# times an outcome of interest occurs/total # of trials
|
|
regularity
|
idea that in the long run that there will be a set proportion to outcomes
|
|
law of large numbers
|
long run relative frequency of repeated independent trials will get closer to the true relative frequency as the number of trials increases
|
|
sample space
|
list of all possible outcomes in a probability experiment
|
|
event
|
outcome of collection of outcomes in a probability experiment
ex: A= HH (getting heads then heads again for flipping a coin) |
|
equally likely outcomes
|
have the same probability of occuring
|
|
probability
|
# outcomes of an event/# of total outcomes
|
|
rules of probability
|
P is always between 0 and 1
probabilities of all possible outcomes sum to 1 |
|
complement rule
|
P(B) = 1 - P(A)
|
|
Addition rule
|
If two events are mutually exclusive then P(A or B) = P(A) + P (B)
|
|
mutually exclusive
|
no outcomes in common
|
|
mutliplication rule
|
if two events are independent of each other then P (A and B) = p(A) x P(B)
|
|
indepedent
|
the outcome of A does not depend on the outcome of B
|
|
disjoint
|
when two events have nothing in common
|
|
what is one way to tell that events are independent
|
see if selection of one event affects the probability of the other
OR The probability of (A or B) = P(A) * P(B) |
|
General addition rule
|
P(A or B) = P (A) + P (B) - P (A and B)
|
|
what does the word and imply
|
that they events occur at the same time
|
|
what does the word or imply
|
that event A occurs, event B occurs, or event A and B occurs at the same time
|
|
conditional probability
|
P(B|A) = P (A and B)/P(A)
|
|
How is P(B|A) read
|
the probability of event B given that event A has already occured
|
|
General multiplication rule
|
P(A and B) = P(A) x P(B|A)
|
|
what does the box in a venn diagram represent
|
the sample space
|
|
what does a circle in a venn diagram represent
|
an individual event
|
|
If events in a venn diagram do not overlap what does that imply
|
that the events are disjoint
|
|
premise of conditional probability
|
finding the probability of one event given that a second event has already occured
|
|
What does an event to the power C represent
|
the compliment
ex: if there are two events blue marbles and red marbles. If A is choosing a red marble, then Ac is chosing a blue marble |
|
what does truly random mean
|
that every outcome is equally likely to happen
|
|
random variable
|
a variable whose value is a numerical outcome of some random event
for example the number of heads that occur during 10 coin flips is a random variable because it is subject to a random event |
|
what is a typical notation of random vairable
|
X or Y
|
|
discrete random variable
|
finite number of possible outcomes for an event
|
|
continous random variable
|
can contain any value within a range of values
|
|
what does a probability distribution of X (a random variable) mean?
|
it is the range of values that the random variable probability lies within
|
|
how is the probability of a continuous random variable described
|
by a density curve
|
|
How is probability determined by a density curve?
|
It is the area underneath the curve between the values in which the continuous variable falls
|
|
how does one distinguish between a variable of interest and a random variable
|
variable of interest represents the piece of information one wishes to receive from just one observation. (ex: what happens when we flip a coin? we get a heads or tail)
random variable: combines all the observations together. Keeping track of the number of times an outcome occurs. (ex: the number of times a coin flip comes up heads) |
|
What is a nuance if the variable of interest is quantitative
|
the random variable will be the same as well
|
|
If the variable of interest is categorical, the random variable will be
|
discrete
|
|
If the random variable is quantitative then the random variable is
|
continuous
|
|
when using density curves to calculate the probability under a curve, what is essential to remember
|
that the area under the curve is 1
therefore l x w = 1 |
|
on a continuous random variable density curve, what is the probability of a single value
|
zero because it does not have a width on the graph
|
|
what is he mean and standard deviation of a standard normal curve
|
mean is 0 and standard deviation is 1
|
|
z score
|
is the number of standard deviation an observation is from the mean
|
|
what does it mean if a z statistic is positive? what does it mean if it is negative?
|
positive- greater than the mean
negative- less than the mean |
|
what does a z score of zero mean
|
it is equal to the mean
|
|
how does one calculate a z score
|
observed value - mean / standard deviation
X -u / o |
|
if the population data are normally distrubutedd, then the sample means are
|
normally distributed
|
|
why use sample means
|
because it is often difficult to get data for an entire population
|
|
when are sample means normally distributed
|
most of the time so long as the sample size for each sample is large enough
|
|
what is always true about the population mean and the sample mean
|
they are always equal
|
|
what does ux mean? uxbar?
|
ux = population mean
uxbar =the mean of the sample sample means |
|
what is always true about the standard deviation sof the population and sample mean
|
the smaple mean will always be smaller
|
|
equation for the standard deviation of the sample means
|
o xbar = ox / srt(n)
n = sample size |
|
what is a condition of the standard deviation of sample means equation
|
it is best when sampling is done with replacement
|
|
sample mean
|
is a mean of a collection of samples
often this is done many times in order to create a distribution of sample means |
|
alternative hypothesis notation
|
Ha
|
|
what is required about the null and alternative hypothesis
|
they have the same hypothesized value
|
|
what is often true about an alternative hypothesis
|
it makes a statement where the population mean is greater or less than the hypothesized value or that it does not equal it
|
|
What are conditions that lead to a one sided (one tailed) test
|
when the alt hypothesis states that the population mean is greater or less than the hypothesized mean
|
|
what are conditions that lead to a two sided (two tailed test)
|
when the alt hypothesis states that the population mean does not equal the hypothesized value
|
|
if there is a quantitative variable of interest, what information is needed to determine what statistical test should be done?
|
n (sample size)
xbar (sample mean) sx (sample standard deviation) ox (population standard deviation) |
|
if the population standard deviation is known what stat test should be done?
|
one sample t test
|
|
if the sample standard deviationis known what stat test should be done
|
one sample t test
|
|
what are conditions of the one sample z test
|
1) the sample is representative of the population
2) the distribution of sample means is normal 3) obervations are independent of each other |
|
what is the only way to guarantee that you have a representative sample from a population
|
take a random sample
|
|
what are assumptions of a one way z test
|
that the sample is representative of the population
that the distrubution of sample means is approximately normal observations are independent of each other |
|
what assumption is made for all hypothesis tests
|
that observations are independent of each other
|
|
p value
|
the probability of getting a sample mean that is as or more extreme than the one that was observed during the experiment if the null hypothesis is true
|
|
is it possible that a sample mean could have turned out to be a different value than the one hypothesized
|
yes
|
|
what action can be taken about the null hypothesis
|
it can be rejected or fail to be rejected
|
|
z statistic
|
the number of standard deviations from what is observed from the mean
oberv-hypo/stand dev |
|
when do you reject the null hypothesis
|
when it is small!
less than 0.1 is strong evidence 0.1 to 0.5 show suggestive evidence 0.05 to .1 weak evidence .1 or larger is not sufficient |
|
significance level
|
a point where if the p value is below it, then the null hypothesis is rejected, or if it is above, then it can not be rejected
|
|
what is the notation for significance
|
alpha
|
|
what are the steps to hypothesis testing
|
1. define the variable of interest and the population of interest. determine if the VI is quantitative or categorical.
2. determine the null hypothesis and alternative hypothesis 3. choose an appropriate hypothesis test for the null hypothesis and list the assumptions made by the hypothesis test. 4. find the p value 5. Conclusion write a sentnece in the context of the problem that answers the question of interest. |
|
null hypothesis
|
a statement of no effect or no difference
|
|
what is the notation for the null hypothesis
|
H0
|
|
what are some general rules for a null hypothesis
|
always involves an equal sign
the population parameter, ux, will be the same as the hypothesized value, u0 |
|
alternative hypothesis
|
often associated with the researcheres claim or question. It makes a statement hypothesizing some change that will be expected to happen from a given treatment.
|
|
what does a normal probability plot show
|
the normality of a data set regardless of size
|
|
what does not equal to impluy
|
less than or greater than a value
|
|
one tail test vs two tail test
|
one tail test either looks @ one end or the other of a normal distribution
two tail test looks at both sides of a normal distribution, usually a given distance away from the mean. |
|
How do you find the p value for a two tailed test
|
add together the probability of both tails
|
|
what is an easy way to find a probability for a two tailed test
|
find the probability of one tail and multiply by 2
|
|
what is the type 1 hypothesis error
|
occurs when there is evidence to reject the null hypothesis when the null hypothesis is actually true
|
|
what is the type 2 hypothesis error
|
when the null hypothesis is not rejected when it is actually false
|
|
what is the probability of making a type 1 error if the significance level is given
|
P(making error) = alpha
|
|
what is the probability of making a type 1 error if the significane level is not given
|
P(making error) = p value
|
|
how do you determine the probability of making a type II error
|
P(type II error) = beta and beta depends on the effect size
|
|
effect size
|
choosing a specific value for u that would make the null hypothesis and be large enough to make a difference
|
|
power
|
probability of correctly rejecting the null hypothesis
|
|
how does one obtain a higher power value
|
more sample size
|
|
power is complements with...
|
beta
|
|
beta depends on
|
a sample mean u that would cause one to accept the null hypothesis when it is not true
|
|
equation for power
|
1 - B
|
|
point estimate
|
using the sample mean as the best estimate of the population mean
|
|
margin of error
|
range of values above and below the sample mean that give potential values for the population parameter
|
|
purpose of confidence interval
|
it is used when trying to find out characteristics of a given population. In using confidence intervals, an assumption is made that the sample mean is close to the population mean. The confidence interval is a measure of the reliability of the estimate (usually the sample mean) and provides a range for which possible true values for the population parameter lie
|
|
what is a requirement of using confidence intervals
|
that the data be distributed normally
|
|
what is the confidence interval surround
|
the sample mean
|
|
what is a 68% confidence interval
|
the range is determined to be 1 SD above and below the sample mean. This is the range where 68% of the data is believed to fall
|
|
what is the general formula for a confidence interval
|
lower bound = sample mean - (# of sd away from the mean)(value of SD)
upper bound same as above but with a plus sign |
|
what does a 95% confidence interval imply
|
that 95% of the population data is within 2 (*actually it is 1.96) sds above or below the sample mean
|
|
how do you write a confidence interval
|
(lower limit # & units, upper limit # & units)
|
|
what is the critical value for a confidence interval and what does it depend on?
|
critical value is z and it depends on the level of confidence
|
|
what conditions must be met to use a confidence interval
|
sample must be representative of the population
distribution of the sample means must be normal observations must be independent |
|
what does the confidence interval states about many samples and their confidence intervals
|
that from all the samples, 95% (or whatever conf value) will have a confidence interval that contains the true population mean
|
|
when should a confidence interval be included
|
whenever a hypothesis test is performed
|
|
what is the relation between a two sided hypothesis test and the confidence interval at a given significance level
|
If the hypothesized null hypothesis value is in the confidence interval, then there is no evidence to reject the null hypothesis at that significance level.
If the hypothesized value for the null hypothesis is not within the confidence interval, then there is evidence to reject the null hypothesis |
|
standard error of the distribution of sample means
|
sample standard deviation / number of individuals in the sample
it is used when you do not know the standard deviation of the population |
|
t value equation
|
mean - value / SEx
|
|
t statistic represents
|
the number of standard errosr from an observed value from the mean of the distribution
|
|
as the sample size increases what happens to sx in terms of ox
|
sx becomes a better estimate of ox
|
|
what the point of degrees of freedom
|
to account for the different sample means that will occur from different sample sizes
|
|
degrees of freedmo
|
n-1
|
|
what is needed to calculate a p value in a t stat
|
t stat and deg of freed
|
|
a higher confidence means less
|
precision
|
|
what does the sahpe of a t distribution depend on
|
the degrees of freedom
|
|
can t stats be negative
|
yes, the represent the # of standard errors below the mean
|
|
where is the t value in a t value chart
|
it makes up the bulk of it.
|
|
If one were to look at a t stat of 2.492 what does the probability for that value mean?
|
It is the probability of getting a t stat greater than the one you currently have
P(x>2.492) |
|
what are properties of a t distribution
|
symmetric and unimodal
|
|
how should one find the probaibliyt of a negative t stat
|
find the positive one, they are they same
|
|
what should one do when the t statistic can't be found on the table
|
create an interval between the two closest values that surround your desired t stat
|
|
when is it best to use a t test
|
when the population standard deviation is NOT KNOWN
|
|
what is the equation for the confidence interval on a t test
|
mean +/- t value @ n-1 degrees of freedom *SEx
|
|
what is a synonym for variable of interest
|
response variable
|
|
in a two sample t test how many populations are thewre
|
2
|
|
what is the purpose of a two sample t test
|
it compares the means between two independent populations that have a response variable in common
|
|
how should one write a null hypothesis for a 2 sample t test
|
either than u1 = u2 or u1-u2 = 0
|
|
how should one write the alternative hypothesis for a 2 sample t test
|
using an inequality sign
ex: u1 > u2 or u1 - u2 > 0 |
|
t are the conditions of a two sample t test
|
the samples are representative of the population
the distribution of the differences betweetn the sample means is approx normal the observations are independent of each other |
|
u vs x bar
|
population parameter vs sample mean statistic
|
|
what distribution does a two sample t test look at
|
the distribution of the difference between the sample means of the two populations
|
|
what does the standard error of the difference in the sample means represent
|
the variance of the two populations added together, sqrted
|
|
variance
|
standard dev squared
|
|
t stat for a 2 sample t tests
|
observed - hypothethical / standard error
|
|
how do you find the degrees of freedom for a two sample t test
|
the smaller of the two sample sizes minus 1
n(smallest #) - 1 |
|
how do you find the confidence interval for a two sample t test
|
estimate between diff of population means +/- (t stat)(standard error)
|
|
frequency table
|
shows the frequency of individuals in a category
|
|
contingency table
|
a table that compares the frequency of the categorical variable of interests between 2 (or more) groups
|
|
marginal distribution
|
number of individuals in each category divided by the total
|
|
conditional distribution
|
same as a marginal distribution, only that it shows that of the group, how the distribution is broken up into the categories
ex: 38% of women felt overweight 34% of women felt just right etc |
|
when the variable of interest is categorical, what is the random variable
|
discrete
|
|
expected value is synonymous with
|
mean
|
|
how to calculate the expected mean
|
it is the categorical value times the probability of that value all summed together
|
|
how to calculate the standard deviation
|
the categorical value minus the mean squared times the probability summed together and square rooted
|
|
what are the four conditions that must be met for a random varaible to have binomial distribution?
|
1. each observation has 2 possible outcomes of which only 1 occurs
2. there is a fixed number of observations (n the sample size 3. the outcomes of the observations are independent 4. the probability of of an outcome occuring is the same in a single trial |
|
0! =?
|
1
|
|
what is a requirement of probability distributions
|
that they will sum up to equal 1
|
|
how to calculate the mean of a binomial random variable X
|
ux = n*p
|
|
what is the standard deviation of a binomial random variable
|
ox = sqrt(n*p*(p-1)
|
|
what is a geometric distribution show
|
it is for a discrete random variable
it is the probability distribution of how many trials must happen before the first "success" occurs, or before the desired outcome occurs |
|
what does x represent in a geometric distribution
|
the number of trials to the first "success"
|
|
what is an equation to represent the probability of x in a geometric distribution
|
P(x) = (1-p)^x-1 * p
|
|
how is the expected value calculated in the geometric distribution
|
ux = 1/p
|
|
how is the standard deviation calculated for the geometric distribution
|
ox = sqrt(I1-p)/p2)
|
|
bernoulli trial
|
is an experiment in which only two outcomes are possible
|
|
binomial random variable
|
the number of successes in a series of independent bernoulli trials
|
|
when the sample size increases what happens to a bionomial distribution
|
it begins to look more like a normal distribution
|
|
what is a problem about computing the probability with a large sample size
|
it is computationally intense
|
|
what does the distribution of the sample proportions represent
|
the distribution of lots of sample proportions of sample size n taken from a population with a true proportion p
|
|
as the sampling size increases how does the sampling proportion and the population proportion relate to each other
|
p hat = p
|
|
what happens to the distribution of sampling proportions as the sample size increases
|
it becomes smaller
|
|
what is the sampling proportion distribution centered around
|
p the true population proportion
|
|
If we do not know p, the population proportion, what should one do
|
use p hat in its place
|
|
when using a sampling proportion distribution what should you check
|
if np >10 and if n(1-p) >10
|
|
how do you calculate the z statistic for a sampling proportion
|
p hat - p all over standard deviation
|
|
bernoulli trial
|
situations where there are two outcomes: success or fail
|