• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/47

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

47 Cards in this Set

  • Front
  • Back
Imprecision is the random ____ to get the same result.

Bias is a systematic ____ from the truth. When something is unbiased it is said to be ___.

The term observation refers to the unit upon which _____ are made.
ex: person, location.

Variables are the ____ being measured.
inability

deviation

valid

measurements

characteristics
In a simple random sample of size n, all possible combinations of n individuals in the population are equally ____ to comprise the sample.
likely
The shape of a distribution can be discussed in terms of its _____, modality and kurtosis.

Symmetry refers to the degree to which the shape reflects a ____ image of itself around its center.

Modality refers to the number of ____ on the distribution.

Kurtosis refers to the ____ of the mound.

A positive skew is skewed to the ____ and a negative skew is skewed ____.
symmetry

mirror

peaks

steepness

right

left
Relative Frequency (%) is the frequency count divided by the ____ expressed as a percentage.

Relative frequencies are important because they do not depend on the ____ of the data.

To determine the Cumulative frequency (%) add the current ____ frequency to the ____ cumulative frequency.
total

size

relative

prior
The mean is highly susceptible to the influence of ____ and skews.

The mean from a frequency table is equal to the sum of (proportion)(x)

The mean tells you nothing about the ____ of a distribution..

The median is ____ resistant to outliers than mean.

When the mean = median, the distribution is _____.

When mean > median there is a ____ skew

When mean < median there is a ____ skew.
outliers

spread

more

symmetrical

positive

negative
The deviations of a data set will always sum to ____.

A standard deviation of 0 signifies that there is no _____.
0

variability
A random variable is a numerical quantity that takes on different values depending on ____.

Probability refers to the proportion of times an event is expected to ____ in the population.

Discrete random variables exist as a ____ set of possible outcomes.

Continuous random variables address quantities that take on an ____ continuum of possible values.
chance

occur

countable

unbroken
Probabilities in a sample space must sum to exactly ____.

In probabilities, the mean is referred to as the ____ value.

Risk is the probability of ____ an adverse condition in a specified time period.

The relative risk is the probability of disease given ____ divided by the probability of disease given _____.
1

expected

developing

exposure

non-exposure
____ % of the area under the Normal curve lies in the region +/- 1 SD

____ % of the area under the Normal curve lies in the region +/- 2 SD

____% of the area under the Normal curve lies in the region +/- 3 SD
68

95

99.7
To determine probabilities for normal values that do not fall exactly +/- 1,2 or 3 SD from the mean, we must first ____ the values and then use a Standard Normal table to look up the associated ____. The transformed value is called a ____.

The z-score tells you the distance the value falls from the mean in ____ _____ units. Values that are smaller than the mean will have ____ z scores and are said to fall ___ the mean.
standardize

probabilities

z-score

standard deviation

negative; below
The central limit theorem states that the sampling distribution of means tends toward _____ even when the underlying population is not normal.

The standard error of the mean gets smaller and smaller as the sample size gets ____.

Means based on large n are more likely to fall close to the ____ value of the population mean.
Normality

true
The null hypothesis is a statement of no ____. The alternative hypothesis ___ the null.

The objective of the test is to seek evidence ____ the null hypothesis as a way of bolstering the alternative hypothesis.
difference

contradicts

against
The statistical distance of sample mean from the hypothesized value of the population mean provides the weight of evidence ___ the null hypothesis. This distance is reported in the form of the ___ statistic (z-stat).
against

test
Large values of the z-stat represent large statistic differences and thus good evidence ____ the null hypothesis.

The P-value answers the question, "if the null were correct results this extreme or more extreme would ____ y% of the time.

When P is small, ____ the null hypothesis. When P is less than or equal to alpha, the results are _____ _____.

When an event is is not statistically significant, the evidence is not strong enough to ____ the null hypothesis, thus you accept it.
against

occur

reject

statistically significant

reject
To test the hypothesis that a sample comes from a population with a known mean but an unknown standard deviation, you calculate a ___ statistic.

When the t value is very large, the significance level is very close to ___.

When the standard deviation is unknown, the ___ distribution.
t

0

t
A type I error is ____ the null hypothesis when it is _____.

A type II error is ____ the null hypothesis when it is ____.
rejecting; true

accepting; false
Power is the statistical term used to describe your ability to ____ the null hypothesis when it is false. It is a ____ that ranges from 0 to 1.

Power depends on how large the true difference is, the ___ ____, the variance of the difference and the ____ level.
reject

probability

sample size

significance level
Alpha is the probability of a type ___ error. Beta is the probability of a type ___ error.

Confidence = 1- ____
Power = 1- _____
I

II

alpha

beta
Point estimate provides a single estimate of the ____.

The confidence level of a confidence interval refers to the _____ rate of the method in capturing the parameter it seeks.
parameter

success
Synonyms for explanatory variable: X, ____ variable, factor, treatment, ____.

Synonyms for response variable: Y, ____ variable, outcome, ____, disease.

A scatterplot's form refers to whether it is ____, curved, or random. The direction refers to an upward trend (____ association), downward trend (____ association) or flat trend (no association). The strength of a scatterplot refers to how ____ data points adhere to an imaginary trend line.
independent; exposure

dependent; response

linear; positive; negative

closely
The strength of the linear relationship between two quantitative variables is calculated with Pearson's _____ ____. When all the points fall directly on a line with an upward slope, r = ___.

The sign of r indicates the ____ of the relationship (positive or negative). The absolute value of r indicates the ____ of the relationship.

r is the ____ product of z-score for x and y. When r is greater than or equal to ___ this is a strong association. When r falls between ___ and 0.7, this is a moderately strong association. When r is less than ____, this is a weak association.

The correlation coefficient is influenced by ____.
correlation coefficient

1

direction; strength

average; 0.7
0.3
0.3

outliers
Sample correlation coefficient r is the estimator of population coefficient ____.

Testing a correlation coefficient for statistical significance:

1.) Null: rho =0
2.) Find the ___ statistic. T-test = r/SEr. *with n-2 degrees of freedom.
3.) Find the ___ value using the T-distribution.

Regression is also used to ____ the relationship between two quantitative variables. A regression model can be used to ____ the value y for a given value of x. Regression is influenced by ____.

A residual is the ____ y - expected y. The least squares regression line: y=a +bx.

The slope predicts a change in ___ per ____.
The intercept is the value of y when x= ____
rho
test
p

quantify
predict
outliers

observed
y; x
When 0 is captured in the confidence level, the data is not ____ _____.

r squared is referred to as the coefficient of _____. r squared indicates the _____ of the variation in the response variable is accounted for by the _____ variable.
statistically significant

determination

proportion; explanatory
In a negative correlation, as the values of one of the variables increases the values of the second variable _____. It is like an _____ correlation.

Bivariate correlation can be used to determine if two variables are _____ related to each other.

Using SPSS for Bivariate Correlation:

Analyze- correlate- bivariate-options-select means and standard deviations. Three stars indicates very highly significant.
decreases; inverse

linearly
In a negative correlation, as the values of one of the variables increases the values of the second variable _____. It is like an _____ correlation.

Bivariate correlation can be used to determine if two variables are _____ related to each other.

Using SPSS for Bivariate Correlation:

Analyze- correlate- bivariate-options-select means and standard deviations. Three stars indicates very highly significant.
decreases; inverse

linearly
Regression is based on the concept of least ____. It usually requires both dependent and independent variable to be _____. Single regression addresses a single _____ variable where as multiple regression addresses multiple.
squares

continuous

explanatory
SIMPLE LINEAR REGRESSION:

y=a + bx

b= r (standard deviation of y/standard deviation of x)

a = (mean of y) - b(mean of x)
The population regression Model:
y= α+βxi +εxi

Standard error of the Regression SY|x = standard error of the estimate

Confidence interval = point estimate +/- (t)(SEb)
EQUATIONS
For ANOVA regression the null hypothesis: the regression line ____ fit the population.

Fstat = Regression MS/ Residual MS

Degrees of freedom for regression always equals ____. Degrees of freedom for residual equals n-2.

r squared = Regression SS/Total SS
doesn't

1
When both the independent variable and dependent variable are categorical, use the ____ test.

When both the independent and dependent variable are _____ use simple and multiple linear regression.

When the independent variable is ____ and the dependent variable is ____ use the t-test or ANOVA.

When the independent variable is ____ and the dependent variable is ____ use logistic regression.
chi square

continuous

categorical; continuous

continuous; categorical
MULTIPLE LINEAR REGRESSION:

y= α+β1x1 + β2x2 + βixi

A dummy variable is also known as an _____ variable and refers to the _____ variable (x) being categorical. This variable is coded with 2 levels: female-male; college-no college; etc.

The test statistic has n-k-1 degrees of freedom.
indicator

independent
A binary variable is a categorical variable with ____ possible outcomes.

Prevalence proportions are the proportions of individuals in the population _____ by condition at a particular time.

Incidence proportions are also called the average ____. It is the proportion of individuals at ____ who go on to develop a condition over a specified period of time.

Binomial random variables are based on counting the number of ____ of n independent trials.
2

affected

risk; risk

succsses
Testing a proportion for statistical significance:

1.) Ho: p = proportion under the null.
2.) Find the Zstat = sample p -null/SEp
3.) Find the p-value

An npq < ____ is a small sample size. The z test is only accurate for proportions in ____ samples. When the sample size is small an ____ binomial test is used (Fischer's exact test).
5; large

exact
The _____ _____ method is the primary method for calculating confidence intervals for proportion. It is based on the Wilson score test.

n= n+4
x = x+2
p = x/n

CI = p +/- Z1-alpha (SEp)
SEp = suare root(pq/n)
plus four
One way to compare proportions is in the form of a ____. Comparisons may represent ____ differences or incidence (risk) differences.

The risk difference parameter: P1-P2. The risk difference estimate is synonymous to the _____ risk.

In ____ samples, the sampling distribution of P1-P2 is normal.

Prevalence ratio and risk ratio are the same as _____ risk.

A risk ratio of 1 indicates that risks are ____ in the exposed and non-exposed group (there is no ____ between the explanatory and response variable).
differences; prevalence

relative

large

relative

equal; association
Both the relative risk (p1/p2) and the risk difference(p1-p2) can be used to qualify the ____ of an exposure.

The RR quantifies the effect of exposure in ____ terms. The RD quantifies the effect of exposure in ____ terms.

ex:

If RR is 2, then the risk in exposed group is 2 times that of the non-exposed.

If the RD=0.01114, then there is a 1.1% increase in risk in absolute terms.
effect
relative
absolute
Confidence intervals and hypothesis tests are effective in addressing the random ____ associated with sampling and randomization but not ____ errors.

Systematic errors come in these forms: confounding, _____ bias and selection bias.

Information bias arises from defects in _____.
error

systematic

information

measurement
Cross tabulation is used to analyze _____ outcomes.There are 3 different sampling methods.

The relative risk under cross tabulation is prevalence 2/ prevalence 1.

The odds of an vent is the proportion of ____ divided by failures.
categorical
RR=
(disease exposed/total exposed) /
(disease non-exposed/
(total non-exposed)

OR = (case exposed/total case) /
(control exposed/total control)
EQUATIONS
A chi square statistic is used to test the _____ between the row and column variables. This statistic tests the ____ of variables when data are from naturalistic samples and homogeneity of proportions when the data are from ____ and case-control samples.

The null hypothesis would be: There is no _____ between the row and column variables in the source populations.

Alternative: The null is false.

A chi square random variable on a normal distribution has ___ degree of freedom.

Chi square distributions are ____ and become more symmetrical as their df _____. Chi square statistics are calculated from ____ and expected frequencies.
association
independence
cohort
association
1
asymmetrical; increases
observed
Expected frequency =
(row total x column total)/ table total

Chi square = sum(O-E)^2/ E

df = (r-1)(c-1)

*uses the Z distribution

Chi square does not provide the type and ____ of association.
strength
When observed values = expected values, chi square is equal to ____.

When observed values are far from the expected amounts, chi square ____ and evidence against the null ____.

Do not use chi square when more than 20% of cells have an expected value of less than ____.

The square root of chi square equals the ____ stat.
0

increases; increases

5

z
When the explanatory variable (x) is ordinal and response variable is binary, it is necessary to test for a ____.
1.) Null: there is no ____ trend.
2.) Find test statistic z

Case-control sampling provides an efficient way to study ____ outcomes.

The relationship between the explanatory variable and response variable is quantified with an ____ ratio. The CI for testing the odds ratio uses the natural _____.

The null Hypothesis: OR=1 which means no ____. When expected values exceed 5, ____ statistic or z statistic may be used.
trend
linear
rare
odds
logarithm
association
chi square
A test used in a screening program, especially for a disease with low incidence, must have good ____ in addition to acceptable sensitivity.

Screening is performed only when the disease is an important cause for mortality or _____; a proven and acceptable _____ exists to detect individuals at an early, modifiable age; and there is safe and effective ____ available to prevent the disease or its ____.

Sensitivity= # of present positive /
total present

Specificity = # absent negative /
# total absent

Less specific=less false ____ = more false positives= ____sensitivity

More specific=____false negatives=less ____=less false positives
specificity
morbidity
test
treatment; consequences

negative;more

more; sensitivity
PPV = # present positive/total # of pos

NPV = #absent negative/ total # of neg

Prevalence= cases/ total pop at risk
EQUATIONS
The region of the hypothesis test that divides the rejection and non-rejection regions is called the _____ value.

When a range is used to estimate a population parameter, it is called an _____ estimate.

If the p-value is greater than alpha in a two-tailed test the null hypothesis should be ____.

In testing a hypothesis about two population means, if the t-distribution is used, both populations are distributed ____.

The purpose of the chi square test is to test for equality of ____.
critical
interval
rejected
normally
proportions
Constructing a mathematical model that can be used to predict one variable by another is called ____.

The Y-intercept represents the ____ value of y when x=0.

Residuals represent the difference between the actual Y values and ____ Y-values.

When the response variable is categorical and takes on two possible values, this is called ____ regression.

SSR/SST = coeff of mult determin.

The one-way ANOVA is used to test statistical hypotheses concerning ____.
regression
predicted
predicted
logistic
means
ANOVA is an extension of the ___ test for comparing the means of two _____ populations.

incidence = new cases/pop at risk
t
independent