Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/27

Click to flip

27 Cards in this Set

  • Front
  • Back
14. Central Tendency
is a statistical measure that identifies a single score as a representative for an entire distribution or set of data.
14. Three measures of Central Tendency (CT)
Mean, Mode, Median
14. Mean=
1.The mean is the standard measure of central tendency in statistics. It is most frequently used.
2.The mean is not necessarily equal to any score in the data set
3.The mean is the most stable measure from sample to sample.
4.The mean is very influenced by Outliers -- That is, the mean will be strongly influenced by the presence of extreme scores.
•The average of a set of scores
•The most commonly used measure of CT


•Notation
•The mean of a population is symbolized as: 
•the mean of a sample is symbolized as: x (with a bar on top)
•The mean = the sum of all the scores divided by the number of scores: X/N
14. Median

•The median is the score that divides the distribution exactly in half -- that is, have the scores are below the median and half are above the median.
•It is the precise midpoint.
•Notation: Mdn
•The computation of the median depends on whether there are an odd number of observations or an even number of observations. It also depends upon whether there are duplicate observations (i.e., 1 2 2 2 3 3 4).
1.If N is odd and there are no duplicates, then the median is the score that falls exactly in the middle of the scores (once you have ordered them).

2.If N is even and there are no duplicates, then the median is the average of the two middle scores. Again you need to order them first.

3.If there are duplications of scores, then you need to use the median formula as discussed in class.
The median is not sensitive to outliers
The median is the best measure of central tendency if the distribution is skewed.
14. Mode=

The Mode is the least stable measure from sample to sample.
• The mode is the most frequent score in a distribution.
•It is the "typical" value.
•In a frequency graph you can immediately see what the mode is because it is the tallest value, or the score with the highest frequency.
•Notation: Mo
15. Range
Range=
• R = URL of highest score - LRL of lowest score
• Problem – only determined by two scores
o Not sensitive to distribution
o Very sensitive to outliers

Getting an average deviation
• using absolute value
• using the sum of squares -- SS (formulas giving in class)
15. Variance

• This is probably the most important statistical idea in this class
Variance – mean squared deviation (s2 or 2)
15. Standard deviation = square root of the variance (s or )
Standard deviation =

“typical” distance of a set of scores from the mean.
• Computing SS, variance and St dev – to be discussed in class.
o Computational vs. definitional formula
o Variance and St. dev can only be computed for interval and ratio data, not nominal or ordinal
o Variance and St. dev are always greater or equal to zero.
• Properties of the standard deviation
1.it describes how variable or spread out the scores are
2.it is the typical or approx. average distance from the mean
3.if it is small, then scores are clustered close to mean; if it is large, they are scattered far from mean
4.it is very influenced by extreme scores
15. Measures of Variability
Variance,Range, Standard Deviation
16. Central Limit Theorem
•Central Limit Theorem states that for any population with mean = , and standard deviation = , the distribution of sample means from sample size = n will be approximately normally distributed with mean =  and standard deviation = /n as n goes to infinity.
•Note: the CLT holds best if either 1) the original population has a normal distribution or 2) n>30.
•What does this all mean?
oNo matter what the shape of the original distribution, the distribution of sample means taken from that population will be normally distributed.
o x bar = 
o x bar = /n
Type I error

•type I errors are considered to be worse than Type II
•The probability of a Type I error is equal to our alpha level
owhen you reject the null hypothesis when it is actually true
oYou say the treatment has an effect, when in reality it doesn’t
Type II error
oWhen you fail to reject the null hypothesis when it is actually false.
oYou say that there is no effect of the treatment, when in reality there is.
Null hypothesis
o states that there is NO treatment effect. In other words, the new population is the same as the old.
o Ho: treatment group = original population
o E.g., Ho: stimulated babies = 
Alternative Hypothesis
o states that the treatment DID have an effect. In other words, the two populations are different (non-directional alternative hypothesis)
oH1: treatment group = original population
oE.g., H1: stimulated babies = 
oThere are also two possible directional alternative hypotheses.
H1: stimulated babies < 
H1: stimulated babies > 
oIn class, you should always use the non-directional alt. hyp.
Power (defintion)
the ability of the test to correctly reject the null hypothesis when the null hypothesis is false. It is the sensitivity of the test to find real differences when they exist.
o So power is related to Type II error – if the test is not “powerful”, you are more likely to make a Type II error.
Power (4 factors)
1. size of the treatment -- more extreme treatments make for more powerful tests
2. alpha-level – the bigger the alpha level, the more power
3. one vs. two-tailed tests – one tailed tests are more powerful
4. sample size – the larger the sample size, the greater the power
Between vs. Within subjects designs
•Between subjects – the participant is in one and only one level of the IV, so the data in one group is completely independent of the data in the other group
•Within subjects – the participant is in every level of the IV, so the scores from one level of the IV are related to the other level.
•These two designs require different statistical procedures. Ch. 10 focuses on the between subjects situation. In Ch. 11, we will cover within subjects designs
Assumptions of the independent groups t-test
1.Normality
2.Independence of observations
3.Homogeneity of variance -- the two populations from which we have sampled have equal variances.
•12 =  22 =  2
•so we can drop the subscript and simply call it 2
Advantages for a within subjects design (compared to between subjects design.)
• Error variability is reduced so the power of the test is increased
• Can control for potentially confounding variables
• Is more economical in terms of the number of participants needed
• Note: The first two advantages also apply to a matched pairs design
Carry-over effects
occurs when being in one treatment causes changes in the second treatment.
o Three examples
 Learning
 Boredom
 Subject bias caused by guessing the hypothesis
o Remedy for carry-over effects – Counterbalancing (I will discuss this more later)
Counterbalancing – is a way to remove the problem of carry-over effects
• Carry-over effects can be thought of as “order effects”, such that the order of administration has an effect on the DV. In effect, order becomes a confounding variable.
• With counterbalancing, half the participants get the treatment in one order, and half get the other order.
o This removes the confounding variable of “order”, by spreading the effect evenly over the two conditions.
o But, then order becomes a disturbance variable, adding extra unsystematic variability to the study.
• A disturbance variable is one that can cause changes in the DV but does not vary systematically with the IV.
o Disturbance variables make it more difficult to find a significant effect (incr. Type II error)
o It is best to keep all disturbance variables constant when possible.
• It is better to have a disturbance variable than a confounding variable.
Assumptiuons of ANOVA
Assumptions of ANOVA – the assumptions are the same as for an independent groups t-test
• Independence of observations
• Normality
• Homogeneity of variance-3. Homogeneity of variance -- the two populations from which we have sampled have equal variances.
Comparison of t-test and z-test
• So, far with hypothesis testing, we have been using z-tests, for which we must know . However, most often we don’t know enough about the population and don’t know exactly what  is.
• If we don’t know , we can estimate it using our sample data by computing s.
• Recall:
s = SS/df  = SS/N
• So, if we have an estimate of , then we can estimate x using sx. sx is called the estimated standard error of the means and can be used in place of x, when  is not known.
sx = s/n

Note: the x should have a bar over it


• Using the estimated standard error, we can compute t. But now, since  is estimated it is not longer exact.
t = (x – )/sx

Note: the x’s should have bars over them.
Characteristics of the t-statistic

• It is used to test hypotheses about  when s is not known.
• It is sort of a substitute for z, with the major difference being that t uses an estimate of the standard error in the denominator that is based on the sample data.
• How well does t approximate z?
o This depends on the sample size – the larger the sample, the better s estimates the population.
o So, the larger the df, the closer t will be to z.
• What does the t-distribution look like compared to the z-distribution?
o Is bell-shaped
o Has a mean of 0.
o The shape depends on df.
 In general, the distribution is more variable than the z-distribution.
 The fewer the df, the flatter and more spread out the t-distribution is and the fatter the tails are.
 If there are many df (over 30 or 60 or so), then the t-distribution looks almost exactly like the standard normal distribution.
 If df=infinity, the two distributions are identical.
• Why is the t-distribution more spread out then the z-distribution?
o In z – the only value that changes from sample to sample is the sample mean.
o In t – the sample mean also varies, but the estimated standard error does too, as it is based on sample data.
o So, for t – there is extra variability in both the numerator and the denominator, leading to a more variable distribution overall.
o As n gets larger, the estimate of the standard error becomes more accurate decreasing the variability in the denominator, making t more like z.
How do u calculate the probability of obtaining a given score in a normal distribution?
• Write out the probability equation
• Draw a picture – draw a normal distribution, marking the mean, and shade in the area of the picture that corresponds to the question being asked.
• Calculate the z-score(s) and look up the probability values.
• We will go through some examples in class.
2 ways tpo describe shape of distribution

oSkewness – the degree to which the distribution departs from being symmetrical
 Positively skewed – tail extends in the positive direction
 Negatively skewed—tail extends in the negative direction
o Skewness – the degree to which the distribution departs from being symmetrical
 Positively skewed – tail extends in the positive direction
 Negatively skewed—tail extends in the negative direction