Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

108 Cards in this Set

  • Front
  • Back
Subset of the a population. Group of creatures from which one gathers data with the intention of making inferences to all organisms that fit those criteria (i.e., the population).
The entire collection of events in which you are interested. Can range from relatively small set to large, but finite set.
random sampling
Each and every element of the population has an equal chance of being selected. (as opposed to convenience sampling where you may not have access to all elements, therefore, the results may be biased).
studies that are not randomly sampled lack....
external validity
random assignment
Each and every element of the selected sample will have an equal chance of either being assigned to the experimental group or the control group.
discrete variables
Property can take on limited number of different values
continuous variables
Property can take on an unlimited number of values (any value between the lowest and highest point on the scale).
quantitative variables
Measurement data; something is actually measured
qualitative variables
variables are categorized and results are based on the frequency for which these events occur. Outcomes tend to be binary; can’t directly assign a number value to the outcome (that is representative of the event)
qualitative variables also called....
categorical/frequency data
independent variables
Manipulated by the experimenter
dependent variables
the data, which is not under the experimenter’s control.
discrete variables tend to be.....
Discrete variables  categorical/qualitative
continuous variables tend to be....
Continuous variables  quantitative
2 types of categorical measurement scales
ordinal and nominal
nominal measurement scale:
“nom”, categories have no ranked order; examples: Republican or Democrat, male or female, numbers on a football jersey
ordinal measurement scale:
rank is given, ordered, examples: ranks in the Navy, scale of life stress
2 types of continuous measurement scales:
interval and ratio scale
interval scale:
measurement scale in which you legitimately speak of differences between scale points, examples: degrees F scale of temperature. But still do not have ability to speak meaningful about the ratios
ratio scale:
has a true zero point. Must be a true point, not an arbitrary point like 0 deg F or C. examples: length, time, volume.
measurement scales are important because...
examples of how the scales would be different
Measurement scales are important in statistics so that you get a true representation of how various data points relate to one another. For example, you need to determine the most appropriate interval scale to use when measuring: temperature changes in a house, versus minute temperature changes in a scientific experiment. You need to ensure that your measures relate as closely as possible to what we want to measure, but our results are ultimately only the numbers we obtain and the faith in the relationship between those numbers and the underlying objects or events.
most frequently occuring value
disadvantages of mode:
Disadvantages: when you are plotting continuous values, it would be difficult to find a meaningful value that is most commonly repeated. Also, if the plot yields a bimodal or trimodal distribution, it would be more representative to use those two or three values as modes, instead of picking one single value.
advantages of mode:
It is a score that actually occurred, whereas mean and median may be values that do not appear in the data. The mode also represents the largest number of people. By definition, the probability that the observation drawn at random (Xi) will be equal to the mode is greater than the probability that it will be equal to any other value. Mode is especially applicable to nominal data, where there really is no true mean or median. Also generally unaffected by extreme scores
middle point
advantages of median
Unaffected by extreme scores. Calculation does not require any assumptions about the interval properties on the scale.
disadvantages of median
It does not enter readily into equations, so it may be more difficult to work with than the mean. It is also not as stable from sample to sample.
disadvantages of mean
Disadvantages: Influenced by extreme scores, value may not actually exist in the data, interpretation in terms of the underlying variable being measured requires at least some faith in the interval properties of the data.
advantages of mean
Advantages: mean can be manipulated algebraically. Useful with the population mean: if you drew many samples from the same population, the sample means that resulted would be more stable estimates of the central tendency of that population than would the sample medians or modes. It is widely used because the sample mean is generally better estimate of the population mean (than mode or median).
variance: definition
Variance tells you on average the squared deviation from the mean. The n-1 takes into account that there will be some uncertainty about the population so it compensates. The mean itself would be a less accurate reflection.
variance: equation
s2x= summation (x-mean of x)^2 // n-1
std deviation: definition
Std deviation makes the number more interpretable because it puts it in similar terms as the data itself.
std deviation: equation
Std deviation = sq root of variance
sample and population variance and std deviation
sample: s and s^2
population sigma and sigma squared
normal distribution: description
Normal distribution generally is symmetric, unimodal distribution, “bell shaped” and has limits of +/- infinity. Abscissa: horiz axis, ordinate: vertical axis. Normal distribution depends on the mean and std deviation of the distribution.
why is normal distribution impt?
Normal distribution important because the theoretical distribution of the hypothetical set of sample means obtained by drawing an infinite number of samples from a specified population can be shown to be approximately normal under a wide variety of conditions. --- Sampling distribution of the mean.
description of standard normal distribution and why it is important
Standard normal distribution: Has a mean of zero and a standard deviation of 1. N(0,1). Transformed from the normal distribution which represents the entire population. Uses z-scores which transform the distribution to a more workable format with which you can use the tables.
Z = (X-mean)/sigma
units of standard normal distribution and normal distribution
Units of normal distribution can be anything, but the units of standard normal distribution are in terms of z which is a function of the mean and std deviation.
explain and provide a formula for z scores
Z scores allow you to use linear transformation to develop a standard normal distribution. Z score represents the number of standard deviations that Xi is above or below the mean, a positive z-score being above the mean and negative z-score being below the mean. The shape of the distribution is not affected by linear transformation. If it was not a normal distribution before, it will not be afterward.
Z = (x-mean)/sigma
what does a sampling distribution tell us?
Sampling distributions: tells us specifically what degree of sample-to-sample variability we can expect by chance as a function of sampling error. They tell us what values we might or might not expect to obtain for a particular statistic under a set of predefined conditions. The standard deviation of that distribution reflects the variability that we would expect to find in the values of that statistic over repeated trials.
what does the sample mean tell you about a hypothetical population mean?
Is the difference between a sample mean and a hypothetical population mean, or the difference between two obtained sample means, small enough to be explained by chance alone or does it represent a true difference that might be attributable to the effects of our treatment?
sampling distribution of the mean:
Sampling distribution of the mean: the distribution of the means of an infinite number of random samples drawn under certain specified conditions. This allows us to question if an obtained sample mean can be taken as evidence in favor of the hypothesis that we actually are sampling from this population.
sampling error
Variability due to chance. Does not imply carelessness or mistakes.
units of sampling distribution of the means
x-axis: means
y-axis: usually frequency?
sampling distributions impt in hypothesis testing b/c...
Sampling distributions important in hypothesis testing because you need to determine if you are looking at fluctuations due to chance or if it is a large enough fluctuation to deem it due to a specific reason.
Why is it not sufficient to just know the size of the difference between the two means when determining whether or not two groups differ significantly?
It depends what you’re testing. You’d have to look at the distribution of means and determine if the values fall within 95% of the values. Statistically significant difference, etc.
diff between std deviation and standard error of the mean?
Standard deviation is the sq root of variance. It is used as intervals for standard error of the means (SEM’s). In a sampling distribution of the mean, the distribution marks off the central tendency at 0, 68% within 1 SEM on either side, 95% within 2 SEM, 99% within 3 SEM’s.
null hypothesis
Given that all points are from the same population, how much sample to sample variability can you expect other than differences by chance: none. “what you did didn’t make a difference”
alternate hypothesis
Given that all points are taken from the same population,
Same shape as H0, just shifted.
rejection level
P less than or equal to 0.05 or 0.01, depending on convention and how conservative you want it, probability of rejecting H0, referred to as the rejection level or significance level of the test. Whenever p is less than or equal to our predetermined significance level, we reject H0.
rejection region
Any outcome that whose probability under H0 is less than or equal to the significance level falls in the rejection region, since such an outcome rejects H0.
type I errors
rejecting H0 when in fact it is true. False negative, alpha
type II errors
failing to reject H0 when it is in fact false, H1 is true, beta. False positive.
diff between 1 and 2 tailed tests
We reject H0 for only the lowest or highest results----directional one-tailed tests.
We reject the extremes in both tails, two tailed, non-directional test.
With the two-tailed test, we gain the ability to reject the null hypothesis for extreme scores in either direction. But now we have reduced the number by which you would reject it, so some values would not be rejected in the two tailed test, where they would in the one-tailed test. One-tailed tests can be more conservative if you are absolutely certain that the other extreme will not happen. It depends on what you are measuring.
diff between joint and conditional probability
Joint probability is the probability of co-occurrence of two or more events. AND
Conditional probability is the probability that one event will occur given that some other event has occurred. IF THEN
which type of probability is reversable and which is not
Joint probability is reversible since it is an AND function. Conditional probability must occur in a certain order.
which type of probability is type I and II errors? why?
Type I and II errors are conditional probabilities. Given that H0 is true, there is a 5% chance I am wrong, etc. If you flip this, the probabilities will change.
sampling distribution of the mean
Sampling distribution of the mean: tells you the sampling error which tells you how much slop you can expect in those means. Did what you did matter????/
central limit th
Central limit theorem: Runs underneath all parametric tests: Factual statement about the distribution of means. It states that given a population with a mean of mu, and a variance of sigma squared, the sampling distribution of the mean will have a mean equal to mu, a variance equal to sigma squared/N, and a standard deviation of sigma/N. The distribution will approach the normal distribution as N, the sample size, increases.
how do SEMs come into play with the central limit th
Also, if you start with a weird shape, you can make it look normal eventually with the help of SEMs.

SEM = s/(N^.5)
3 kinds of t-tests
1 Sample T-Test:
Paired T-Test:
Independent T-Test:
describe 1 sample t-test
Only have one sample and compare to a known compilation. Ex: medical blood tests
Numerator: how different is it from the population?
denom corrects for noise
describe paired t-test
Intended for scenarios with one sample, measured twice.
Pre-measure and Post-measure or two conditions, but taken from only one group.
Also called Dependent T-Test, Repeated Measures, Matched Sample
Numerator: are average differences invoked greater than zero?
denom corrects for noise
describe independent t-test
2 Different Groups, compare against each other. Are the 2 groups different?
Numerator: size differences between the groups
denom corrects for noise
assumptions of t-test
1) population is normally distributed
2) population variance is unknown in t-tests (so you use sample variance to tell something about the population)
3) sample(s) are independent and random
4) if you have 2 or more groups the sample size N is relatively equal
5) variance within each group (2&3) or set of data (1) is similar: homogeneity of variance
which assumption of t-test has worst consequence
Number 3 has the highest consequences of violating the assumptions.
difference between covariance and correlation
Covariance examines the extent to which two variables vary together. It does not take into account the variability within each variable. Correlation examines the strength of the relationship of two variables by taking variability within each variable into account.
formula for covariance
COV = summation [(x-xbar)(y-ybar)] /// N-1
formula for correlation
R = COV xy /// SxSy
explain what a and b represent in regression formula
In a regression formula, a represents the y-intercept and b represents the slope (in contrast to the normal mx+b formula)

A = ybar – (b*xbar)
B = covxy // (sx^2)
explain concept of residual
Y – Yhat is an error of prediction, called the residual. In order to find the line (set of yhats that minimizes errors of prediction. We can’t minimize the sum of errors or it would equal zero. Instead we look for that line that minimizes the sum of squared errors, that minimizes summation(Y-Yhat)^2.
diff between correlation and regression
Usually, regression is used for situations where the value of X is fixed or specified by the experimenter. No sampling error is involved in X, and repeated replications of the experiment will involve the same X values. Correlation is used to describe the situation in which both X and Y are random variables. X’s and Y’s vary from one replication to another and thus sampling error is involved in both variables.
definition of mean (not equation)
point about which the sum of the deviations equals zero
what does std deviation express?
variability; accounts for outliers; shows whether or not the mean captures the bulk of the data
definition of variance (not equation)
average squared deviation from the mean
why is n-1 used in variance?
makes variance bigger; compensates for uncertainty about the population
frequency distribution
organizes the frequency that your data occurs----reflects the probability that a certain value will occur
definition of distribution
visual representation of how things compare against one another.
probability----how likely a data point is to occur (areas under curve)
sampling distribution of means
repeatedly taking samples of same size and same population; expressing as samples of means-----what is the probability of getting the means if you sample other chunks of the population
what does sampling of the means tell you?
likelihood that these particular means come from the same population (need to take into acct the noisiness)
standard normal distribution
has a mean of 0 and std dev of 1. tables have to be standard normal dist.
what does a z-score represent
the number of std deviations that Xi is above or below the mean
definition of sampling error
variability due to chance; due to the fact that statistics can vary from one sample to another simply b/c of chance/nature
hypothesis testing lets you
decide whether we are looking at a small chance fluctuation or a significant difference
what do sampling distributions tell us?
what degree of sample to sample variability we can expect by chance as a function of sampling error
Ho proves:
falsehood, but not truth
remember the unicorn----we can't prove that it was there if it really was there
t-tests are used with what type of measurement scales?
interval/ratio (continuous data)
central limit theorem
shape of the sampling distribution of the means will still be normal regardless of if the initial curve is normal
t-tests tell you...
if the difference between two groups are statistically signficantly different
max number of groups in a t-test
2 (ANOVA can do more than 2)
types of t-tests:
1 sample
explain nuisance variables
potential independent variables, which if left uncontrolled, could exert a systematic influence on the different treatment conditions. when this occurs, the effects could eventually not be separated from the independent vars.
explain between-subjects designs:
any differences in behavior observed between any one treatment condition and the others are based on diff between indpendent groups of subjects
within subjects designs:
also repeated measures design; any diffs in behavior observed among tx effects are represented by diffs within single grouip of subjects serving in exp.
factorial designs
permit the manipulation of one or more independent variable in the same experiment
single factor experiment
single independent variable represented by two or more independent variable, manipulated concurrently in same experiment
manipulation of either qualitative or quantitative differences
specific treatment conditions represented in an experiement are levels in a factor
levels treatments and tx levels can be used
experimental error
all nuisance vars can be potential contrib to exp error
sources of experimental error
measurement error
method errors
individual diff from creature to creature within group
sum of squares (explain)
sum of sq deviations from the mean; degree to which the numbers in a set vary among themselves
Total SS
sum (Y-Ytotal bar)^2
Between groups SS:
SSA = n*sum (YAbar - Ytotal bar)^2
within groups SS:
SSSA = sum (Y-YAbar)^2
explain mean squares
variance = SS/df
F ratio:
treatment mean square MSA divided by the within groups mean square MSS/A. denom is called the error term. F: error + tx divided by error (error between groups divided by error within groups)
sampling distribution of F
frequency dist of F