Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
108 Cards in this Set
 Front
 Back
sample

Subset of the a population. Group of creatures from which one gathers data with the intention of making inferences to all organisms that fit those criteria (i.e., the population).


population

The entire collection of events in which you are interested. Can range from relatively small set to large, but finite set.


random sampling

Each and every element of the population has an equal chance of being selected. (as opposed to convenience sampling where you may not have access to all elements, therefore, the results may be biased).


studies that are not randomly sampled lack....

external validity


random assignment

Each and every element of the selected sample will have an equal chance of either being assigned to the experimental group or the control group.


discrete variables

Property can take on limited number of different values


continuous variables

Property can take on an unlimited number of values (any value between the lowest and highest point on the scale).


quantitative variables

Measurement data; something is actually measured


qualitative variables

variables are categorized and results are based on the frequency for which these events occur. Outcomes tend to be binary; can’t directly assign a number value to the outcome (that is representative of the event)


qualitative variables also called....

categorical/frequency data


independent variables

Manipulated by the experimenter


dependent variables

the data, which is not under the experimenter’s control.


discrete variables tend to be.....

Discrete variables categorical/qualitative


continuous variables tend to be....

Continuous variables quantitative


2 types of categorical measurement scales

ordinal and nominal


nominal measurement scale:

“nom”, categories have no ranked order; examples: Republican or Democrat, male or female, numbers on a football jersey


ordinal measurement scale:

rank is given, ordered, examples: ranks in the Navy, scale of life stress


2 types of continuous measurement scales:

interval and ratio scale


interval scale:

measurement scale in which you legitimately speak of differences between scale points, examples: degrees F scale of temperature. But still do not have ability to speak meaningful about the ratios


ratio scale:

has a true zero point. Must be a true point, not an arbitrary point like 0 deg F or C. examples: length, time, volume.


measurement scales are important because...
examples of how the scales would be different 
Measurement scales are important in statistics so that you get a true representation of how various data points relate to one another. For example, you need to determine the most appropriate interval scale to use when measuring: temperature changes in a house, versus minute temperature changes in a scientific experiment. You need to ensure that your measures relate as closely as possible to what we want to measure, but our results are ultimately only the numbers we obtain and the faith in the relationship between those numbers and the underlying objects or events.


mode:

most frequently occuring value


disadvantages of mode:

Disadvantages: when you are plotting continuous values, it would be difficult to find a meaningful value that is most commonly repeated. Also, if the plot yields a bimodal or trimodal distribution, it would be more representative to use those two or three values as modes, instead of picking one single value.


advantages of mode:

It is a score that actually occurred, whereas mean and median may be values that do not appear in the data. The mode also represents the largest number of people. By definition, the probability that the observation drawn at random (Xi) will be equal to the mode is greater than the probability that it will be equal to any other value. Mode is especially applicable to nominal data, where there really is no true mean or median. Also generally unaffected by extreme scores


median:

middle point


advantages of median

Unaffected by extreme scores. Calculation does not require any assumptions about the interval properties on the scale.


disadvantages of median

It does not enter readily into equations, so it may be more difficult to work with than the mean. It is also not as stable from sample to sample.


disadvantages of mean

Disadvantages: Influenced by extreme scores, value may not actually exist in the data, interpretation in terms of the underlying variable being measured requires at least some faith in the interval properties of the data.


advantages of mean

Advantages: mean can be manipulated algebraically. Useful with the population mean: if you drew many samples from the same population, the sample means that resulted would be more stable estimates of the central tendency of that population than would the sample medians or modes. It is widely used because the sample mean is generally better estimate of the population mean (than mode or median).


variance: definition

Variance tells you on average the squared deviation from the mean. The n1 takes into account that there will be some uncertainty about the population so it compensates. The mean itself would be a less accurate reflection.


variance: equation

s2x= summation (xmean of x)^2 // n1


std deviation: definition

Std deviation makes the number more interpretable because it puts it in similar terms as the data itself.


std deviation: equation

Std deviation = sq root of variance


sample and population variance and std deviation

sample: s and s^2
population sigma and sigma squared 

normal distribution: description

Normal distribution generally is symmetric, unimodal distribution, “bell shaped” and has limits of +/ infinity. Abscissa: horiz axis, ordinate: vertical axis. Normal distribution depends on the mean and std deviation of the distribution.


why is normal distribution impt?

Normal distribution important because the theoretical distribution of the hypothetical set of sample means obtained by drawing an infinite number of samples from a specified population can be shown to be approximately normal under a wide variety of conditions.  Sampling distribution of the mean.


description of standard normal distribution and why it is important

Standard normal distribution: Has a mean of zero and a standard deviation of 1. N(0,1). Transformed from the normal distribution which represents the entire population. Uses zscores which transform the distribution to a more workable format with which you can use the tables.
Z = (Xmean)/sigma 

units of standard normal distribution and normal distribution

Units of normal distribution can be anything, but the units of standard normal distribution are in terms of z which is a function of the mean and std deviation.


explain and provide a formula for z scores

Z scores allow you to use linear transformation to develop a standard normal distribution. Z score represents the number of standard deviations that Xi is above or below the mean, a positive zscore being above the mean and negative zscore being below the mean. The shape of the distribution is not affected by linear transformation. If it was not a normal distribution before, it will not be afterward.
Z = (xmean)/sigma 

what does a sampling distribution tell us?

Sampling distributions: tells us specifically what degree of sampletosample variability we can expect by chance as a function of sampling error. They tell us what values we might or might not expect to obtain for a particular statistic under a set of predefined conditions. The standard deviation of that distribution reflects the variability that we would expect to find in the values of that statistic over repeated trials.


what does the sample mean tell you about a hypothetical population mean?

Is the difference between a sample mean and a hypothetical population mean, or the difference between two obtained sample means, small enough to be explained by chance alone or does it represent a true difference that might be attributable to the effects of our treatment?


sampling distribution of the mean:

Sampling distribution of the mean: the distribution of the means of an infinite number of random samples drawn under certain specified conditions. This allows us to question if an obtained sample mean can be taken as evidence in favor of the hypothesis that we actually are sampling from this population.


sampling error

Variability due to chance. Does not imply carelessness or mistakes.


units of sampling distribution of the means

xaxis: means
yaxis: usually frequency? 

sampling distributions impt in hypothesis testing b/c...

Sampling distributions important in hypothesis testing because you need to determine if you are looking at fluctuations due to chance or if it is a large enough fluctuation to deem it due to a specific reason.


Why is it not sufficient to just know the size of the difference between the two means when determining whether or not two groups differ significantly?

It depends what you’re testing. You’d have to look at the distribution of means and determine if the values fall within 95% of the values. Statistically significant difference, etc.


diff between std deviation and standard error of the mean?

Standard deviation is the sq root of variance. It is used as intervals for standard error of the means (SEM’s). In a sampling distribution of the mean, the distribution marks off the central tendency at 0, 68% within 1 SEM on either side, 95% within 2 SEM, 99% within 3 SEM’s.


null hypothesis

Ho
Given that all points are from the same population, how much sample to sample variability can you expect other than differences by chance: none. “what you did didn’t make a difference” 

alternate hypothesis

H1
Given that all points are taken from the same population, Same shape as H0, just shifted. 

rejection level

P less than or equal to 0.05 or 0.01, depending on convention and how conservative you want it, probability of rejecting H0, referred to as the rejection level or significance level of the test. Whenever p is less than or equal to our predetermined significance level, we reject H0.


rejection region

Any outcome that whose probability under H0 is less than or equal to the significance level falls in the rejection region, since such an outcome rejects H0.


type I errors

rejecting H0 when in fact it is true. False negative, alpha


type II errors

failing to reject H0 when it is in fact false, H1 is true, beta. False positive.


diff between 1 and 2 tailed tests

We reject H0 for only the lowest or highest resultsdirectional onetailed tests.
We reject the extremes in both tails, two tailed, nondirectional test. With the twotailed test, we gain the ability to reject the null hypothesis for extreme scores in either direction. But now we have reduced the number by which you would reject it, so some values would not be rejected in the two tailed test, where they would in the onetailed test. Onetailed tests can be more conservative if you are absolutely certain that the other extreme will not happen. It depends on what you are measuring. 

diff between joint and conditional probability

Joint probability is the probability of cooccurrence of two or more events. AND
Conditional probability is the probability that one event will occur given that some other event has occurred. IF THEN 

which type of probability is reversable and which is not

Joint probability is reversible since it is an AND function. Conditional probability must occur in a certain order.


which type of probability is type I and II errors? why?

Type I and II errors are conditional probabilities. Given that H0 is true, there is a 5% chance I am wrong, etc. If you flip this, the probabilities will change.


sampling distribution of the mean

Sampling distribution of the mean: tells you the sampling error which tells you how much slop you can expect in those means. Did what you did matter????/


central limit th

Central limit theorem: Runs underneath all parametric tests: Factual statement about the distribution of means. It states that given a population with a mean of mu, and a variance of sigma squared, the sampling distribution of the mean will have a mean equal to mu, a variance equal to sigma squared/N, and a standard deviation of sigma/N. The distribution will approach the normal distribution as N, the sample size, increases.


how do SEMs come into play with the central limit th

Also, if you start with a weird shape, you can make it look normal eventually with the help of SEMs.
SEM = s/(N^.5) 

3 kinds of ttests

1 Sample TTest:
Paired TTest: Independent TTest: 

describe 1 sample ttest

Only have one sample and compare to a known compilation. Ex: medical blood tests
Numerator: how different is it from the population? denom corrects for noise 

describe paired ttest

Intended for scenarios with one sample, measured twice.
Premeasure and Postmeasure or two conditions, but taken from only one group. Also called Dependent TTest, Repeated Measures, Matched Sample Numerator: are average differences invoked greater than zero? denom corrects for noise 

describe independent ttest

2 Different Groups, compare against each other. Are the 2 groups different?
Numerator: size differences between the groups denom corrects for noise 

assumptions of ttest

1) population is normally distributed
2) population variance is unknown in ttests (so you use sample variance to tell something about the population) 3) sample(s) are independent and random 4) if you have 2 or more groups the sample size N is relatively equal 5) variance within each group (2&3) or set of data (1) is similar: homogeneity of variance 

which assumption of ttest has worst consequence

Number 3 has the highest consequences of violating the assumptions.


difference between covariance and correlation

Covariance examines the extent to which two variables vary together. It does not take into account the variability within each variable. Correlation examines the strength of the relationship of two variables by taking variability within each variable into account.


formula for covariance

COV = summation [(xxbar)(yybar)] /// N1


formula for correlation

R = COV xy /// SxSy


explain what a and b represent in regression formula

In a regression formula, a represents the yintercept and b represents the slope (in contrast to the normal mx+b formula)
A = ybar – (b*xbar) B = covxy // (sx^2) 

explain concept of residual

Y – Yhat is an error of prediction, called the residual. In order to find the line (set of yhats that minimizes errors of prediction. We can’t minimize the sum of errors or it would equal zero. Instead we look for that line that minimizes the sum of squared errors, that minimizes summation(YYhat)^2.


diff between correlation and regression

Usually, regression is used for situations where the value of X is fixed or specified by the experimenter. No sampling error is involved in X, and repeated replications of the experiment will involve the same X values. Correlation is used to describe the situation in which both X and Y are random variables. X’s and Y’s vary from one replication to another and thus sampling error is involved in both variables.


definition of mean (not equation)

point about which the sum of the deviations equals zero


what does std deviation express?

variability; accounts for outliers; shows whether or not the mean captures the bulk of the data


definition of variance (not equation)

average squared deviation from the mean


why is n1 used in variance?

makes variance bigger; compensates for uncertainty about the population


frequency distribution

organizes the frequency that your data occursreflects the probability that a certain value will occur


definition of distribution

visual representation of how things compare against one another.
probabilityhow likely a data point is to occur (areas under curve) 

sampling distribution of means

repeatedly taking samples of same size and same population; expressing as samples of meanswhat is the probability of getting the means if you sample other chunks of the population


what does sampling of the means tell you?

likelihood that these particular means come from the same population (need to take into acct the noisiness)


standard normal distribution

has a mean of 0 and std dev of 1. tables have to be standard normal dist.


what does a zscore represent

the number of std deviations that Xi is above or below the mean


definition of sampling error

variability due to chance; due to the fact that statistics can vary from one sample to another simply b/c of chance/nature


hypothesis testing lets you

decide whether we are looking at a small chance fluctuation or a significant difference


what do sampling distributions tell us?

what degree of sample to sample variability we can expect by chance as a function of sampling error


Ho proves:

falsehood, but not truth
remember the unicornwe can't prove that it was there if it really was there 

ttests are used with what type of measurement scales?

interval/ratio (continuous data)


central limit theorem

shape of the sampling distribution of the means will still be normal regardless of if the initial curve is normal


ttests tell you...

if the difference between two groups are statistically signficantly different


max number of groups in a ttest

2 (ANOVA can do more than 2)


types of ttests:

1 sample
paired independent 

explain nuisance variables

potential independent variables, which if left uncontrolled, could exert a systematic influence on the different treatment conditions. when this occurs, the effects could eventually not be separated from the independent vars.


explain betweensubjects designs:

any differences in behavior observed between any one treatment condition and the others are based on diff between indpendent groups of subjects


within subjects designs:

also repeated measures design; any diffs in behavior observed among tx effects are represented by diffs within single grouip of subjects serving in exp.


factorial designs

permit the manipulation of one or more independent variable in the same experiment


single factor experiment

single independent variable represented by two or more independent variable, manipulated concurrently in same experiment


factors:

manipulation of either qualitative or quantitative differences


levels:

specific treatment conditions represented in an experiement are levels in a factor


levels treatments and tx levels can be used

concurrently


experimental error

all nuisance vars can be potential contrib to exp error


sources of experimental error

measurement error
method errors individual diff from creature to creature within group 

sum of squares (explain)

sum of sq deviations from the mean; degree to which the numbers in a set vary among themselves


Total SS

sum (YYtotal bar)^2


Between groups SS:

SSA = n*sum (YAbar  Ytotal bar)^2


within groups SS:

SSSA = sum (YYAbar)^2


explain mean squares

variance = SS/df


F ratio:

treatment mean square MSA divided by the within groups mean square MSS/A. denom is called the error term. F: error + tx divided by error (error between groups divided by error within groups)


sampling distribution of F

frequency dist of F
