Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
65 Cards in this Set
 Front
 Back
Validity Types:
Statistical 
How accurate is the conclusion you draw from a statistical test. We hope to be able to draw conclusions based on reliable, dep measures. We desire our statistical assumptions (pop distrib and var) be met.


p < .05

The probability of obtaining a test statistic as extreme or more extreme than a stated value usu one you've computed. If its less than 5% chance that we could get a test statistic this extreme, we decide to reject the null hyp b/c the occurence of the statistic is so unus. A risk in making an error and the extent to which we'd like to think we risk leaving statistical validity unprotected.


Construct

Do the findings support the theory over competing theories. We hope our theory explains the nature of the data we analyze but we must submit our theory to scrutiny.


External

Can we generalize these results to other conditions? We need to be able to be assured our findings can be detected in other contexts besides our controlled lab setting or this particular set of partic.


Internal

Did the IV cause a change in the DV? We need to eliminate confounds which subtract from the potential explanatory power of the IV.


Confounds: Maturation

DV changes occur solely as a result of partic growing older or more experienced. Partic relevant to longit studies.


Confounds: History
Testing 
DV changes due to variation in events outside the study's IV effects. Worst with large amount of time btwn pre and post tests. Testing: partic show dv changes due to practice in repeated measurements.


Confounds: Instrumentation. Regression to the mean.

DV changes are due to the measuring instrument's variability. Ex, changing observational criteria. Regression: the effect of initially high scorers showing score reductions and initially low scorers showing score incmore extreme to less.


Selection. Attrition.

Result of groups not randomly selected and assigned to groups; they're unequivalent. Attrition: partic drop out differentially across groups, causing biased results.


Confounds: Diffusion of treatment
Sequence effects 
Partic in diff exper conditions share info from their manip and reduce exper diff across groups. Seq effects: in repeated measure designs or w/insubject designs, partic are exposed to diff exper conditions, but the order is constant, changes in DV may be caused by the order of the presentaton and the not the condition.


Subject effects

Deals w/ partic behaviors in a social setting like: curiosity, motivation, expectation and bias. Unnatural beh exhibited, acting good. Hawthorne effect: inc productivity after any special attention.


Subj effects: demand characteristics. Placebo effects

This is the term given to those cues given to partic usu unintentional by the researcher, that urge the partic to beh unnaturally. Placebo effects: demand char. Expectations of change, improvement w/out manip.


Experimenter effects

Deal w/ the experimenter influencing partic towoard an unexpected outcome, via manip, or cueing of some type, leading to bias. Ethical violations: another experimenter effect deals w/ only data which supports his hypoth or w/ outliers?


General control procedures

Prepare lab setting. Lighting, temp, sound. Make setting similar for all. Use same experimenter whenever possible.


Exact vs systematic

Identical in every way vs similar but w/ procedural changes.


General Control Procedures

Prepare the lab setting
Lighting, temperature, sound Make setting similar for all Use same experimenter whenever possible Too sterile? Generalizability… 

General Control Procedures…

Use stimuli with good reliability and validity
Reliability = stability or “repeatability” of results from test, scale, or stimulus Validity = stimulus, test, or scale should give us confidence that we’re measuring the “thing” we think we’re measuring; our test should correlate with other, similar measures of the same behavior 

Replication: Systematic and Exact

Exact = identical in every way
Systematic = Similar, but with procedural changes, e.g., “When I read something, I find I must read it again” vs. nearly always 

Conceptual Replication

Different studies are generated from the same problem statement
More common than systematic or exact replication E.g., Study 1 on selfefficacy: qaires and reallife problem solving situation (observational) Study 2 on selfefficacy: experimental manipulation and artificial but controlled problemsolving (experimental) 

Control over Subject and Experimenter Effects

Single and doubleblind procedures
Single=RA is blind to manipulation/condition Double=RA and participants are blind to manipulation/condition Particularly important to have doubleblind studies in social psychology experiments Singleblind studies are important for protecting against experimenter bias 

Automation

Standardization of the study’s instructions using a computer, or video/audio tape
Computeraided scoring and recording of responses tremendously reduces clerical errors in data entry Many studies now use PDAs to automate their data collection Reduces bias, errors, saves money, reduces waste 

Use Objective Measures

Use easily agreedupon responses if observational measures are taken
“Does mother gaze at the infant, facetoface, for at least 10 seconds at a time?” 

Use Multiple Observers

At least two (more is better) observers rate the behaviors being measured, and compare ratings with some “% agreement” (e.g., 90%).


Deception

Deliberately withholding information from participants (and misleading them to believe the study’s purposes are very different from what they actually are)
Use only when necessary Always debrief participants at end of study E.g., “helping behavior and altruism” Batson study 

Control Through Subject Selection and Assignment

Subject selection
Issues: proper sampling helps us increase generalizability of results E.g., The Nun Study on Alzheimer’s Disease Good: careful selection; no smokers, no sex or reproduction; little/no drinking; similar vocations (teachers); group living Bad: generalizability to whom??? 

Subject Selection…

Broad Terms:
Population — some larger group of interest Sample — some smaller group drawn from a population; should be representative Specific Terms: General population  all persons Target population  specific population of interest (e.g., Alzheimer’s patients) Accessible population  subset of population available to the researcher (e.g., 100 Alzheimer’s patients from N. Virginia) 

Subject Selection

As researchers, we must be careful to generalize from our sample to the accessible population, but not to the target population
Which is why we specify our populations as “those persons like those in our study” when we form hypotheses If other researchers replicate our findings in other accessible populations, we have more external validity 

Subject Selection

How to solve sampling issues?
Random Sampling Rarely done or accomplishable Occurs when any person in population has an equal chance of being selected Participants should be independent of each other How to make random draws: random # generator or table 

Subject Selection

Stratified random sampling
More commonly used Define the subpopulations to closely match population’s distributional characteristics, and we sample persons this way E.g., if we want to study public policy attitudes of psychologists, and the population of U.S. psychologists is comprised of: 54%, who work in universities 18% in hospitals 12% in private practice 10% in industry 

Subject Selection

Then, for our sample, of N=1000 people, we should randomly select:
540 who work in universities 180 in hospitals 120 in private practice 100 in industry Good, but we still have the issue of study volunteers being unrepresentative of others (selection bias) 

Subject Selection
3. Ad hoc samples 
samples drawn from accessible populations
The sample’s characteristics define the population to which we can generalize This type of sample is used most often in psychological research E.g., “100 female psychology students, aged 1820, from a southern university”…gather background information on the participants, and report it in our research article 

Subject Assignment

Placing persons in experimental conditions
More important than random selection…especially for internal validity…5 different ways… Random assignment Use a random # table or generator Random is not equal to haphazard We may not end up with equal groups, but we can say the sources of bias have an equal likelihood of being spread across groups 

2. Matched Random Assignment

Makes n’s equal and balanced across conditions
E.g., “memory in college students” study; 2group design; one group gets mnemonic training; we believe GPA could be a confound if allowed to run loose 

3. Eliminate the variable

Elimination involves removing persons beyond a certain level of a variable
E.g., Aging study of visual perceptionmental transformation of a visual object Persons having visual acuity of 20/20, 20/30, 20/40, and 20/50 are acceptable; those persons having acuity worse than 20/50 are eliminated from the study 

4. Hold a variable constant

Constancy involves allowing persons in a study having only a given level of a variable
Same example of visual perception in this case, we allow only persons with 20/20 acuity to be in the studythe result gives us constancy 

5. Put a variable in study’s design

Build this variable into the design and examine the effect of the levels on the DV


IV. Control through experimental design

Goal is to reduce threats to internal validity
We want to be able to say “The IV manipulation caused a change in the DV” and nothing else… The best way to have control in an experiment...Select participants in an unbiased way e.g., stratified random sampling or ad hoc samples Assign participants to conditions in an unbiased way e.g., random assignment, or matched random assignment 

Control through experimental design…

C. Use a control group in your study
e.g., one group gets the mnemonic training, the other does not D. True experiment’s 5 characteristics 1. Research hypothesis stating IV’s effect on the DV e.g., R+ will increase proactive behavior 

Control through experimental design…

2. Have two or more IV levels
E.g., varying am’t of time given to solve cognitive problem 10 sec 20 sec 30 sec 3. Assign participants to experimental conditions in an unbiased way E.g., random assignment or matched random assignment 

Control through experimental design…

4. Use strong procedures for testing causal relationships
E.g., use a control group, use pilot testing, use prior research to support work 5. Use specific control procedures to reduce internal validity threats E.g., random assignment, automation, objective measures, placebo, deception 

Hypothesis testing: General steps

We want to draw conclusions about our results by examining the probability of getting our results if the opposite of what we are predicting were true. Make questions into H0 and H1 about populations


Ex. Step 1
2. Determine characteristics of the comparison distribution 
We believe the Population 1 adults do not remember more words than Population 2 adults (hence, Population 1’s mean is less than or equal to Population 2’s mean).
2. Next, we ask, “What is the probability of obtaining a particular test statistic if the null hypothesis is true?” We assume the sample (here, 1 person) was selected from a distribution representing a true null hypothesis, approximately normal in form, with a mean of 8 and a standard deviation of 3. The distribution to which you compare your sample (when the null hypothesis is true) is the comparison distribution 

3. Determine the cutoff score to reject the null hypothesis

If the null hypothesis is true, then witnessing an adult remembering 14 words is very unlikely…that’s TWO standard deviations above the mean…an extreme value associated with less than 2% probability of occurrence…
Performing 2 standard deviations above the mean is rare, regardless of experimental training… The cutoff score, here, is set in terms of normal Zunits If the Zstatistic we compute is more extreme than +2 (in our example), we can reject the null hypothesis Typically, we use a cutoff score that corresponds to a probability of 5%…typically called alpha (a = .05). 

4. Determine the sample’s score on comparison distribution

So, in our example, we need Z > +2 to reject the null hypothesis
So, we’ve conducted our study on N = 1, and we see that our adult was able to remember 15 words Now we’ll compute Z and determine its standing on the comparison distribution. Our Z of + 2.33 indicates that the adult who remembered 15 words is 2.33 standard deviations above the population mean. 

5. Reject null hypothesis or not?

We know we needed a Zscore in excess of +2 to reject the null hypothesis…we got +2.33…so we can reject the null hypothesis.
Therefore, our research hypothesis was supported…Pop 1 adults who received memory training remembered more words than Pop 2 adults who didn’t receive training. If we would not have had a statistically significant test statistic, we’d simply say our results were inconclusive, and that we failed to reject the null hypothesis… 

Ways of framing hypotheses testing questions

Onetailed tests and directional hypotheses. The researcher does not have an idea of which population has the higher mean, only that they’ll be different


Hypothesis Testing Introduction

1. Make questions into H0 and H1
2. Determine characteristics of the comparison distribution 3. Determine cutoff score to reject H0 4. Determine sample’s score to reject H0 5. Reject H0 or not? 

Hyp Testing Popul 2

Pop. 2: Adults who do not receive such special memory training


Hypothesis testing using a distribution of means

1. Mean of a distribution of means = pop mean.
2. Variance of a distribution of means = pop variance, divided by # of scores in each sample 3. Shape of a distribution of means is bellshaped and unimodal 

Three kinds of distributions

1. Distribution of a population of persons
2. Distribution of a sample drawn from a population 3. Distribution of means of all possible samples of a particular size taken from that distribution 

The distribution of means

We want to compare this mean to something similara distribution of means. The Central Limit Theorem underlies several rules we’ll address: As number of samples increases, the mean of the distribution of means approximates the population mean. As # of samples increases, high and low means cancel out each other. Rule 2: The variance of the distribution of means is the variance of the distribution of scores divided by the number of scores in each sample.


Rule 2

The standard deviation of a distribution of means is simply the square root of the variance of the distribution of means. **Remember this ! This standard error is the “average amount off we expect our sample mean to differ from the mean of the distribution of means”


Rule 3

The shape of a distribution of means tends to be unimodal and bellshaped. It is at least approximately normal if there are at least 30 scores in each sample, OR if the population of scores is normal


Single sample ttest

Compare a single mean to a population mean when the population standard deviation (s) is not known


Dependent means ttest

Compare two linked means when the population standard deviation (s) is not known


Single sample ttest…

We don’t know the s of the population, only m
SO! We will estimate s of the population using our sample information If the sample is randomly. Compute an unbiased estimate of the population variance in a way that boosts the sample’s variance a bitwe’ll do this by putting N – 1 in the denominator of the variance equation 

ttest for dependent means

Here, two scores from each person are compared (repeated measures designs, or withinsubjects designs), OR
two scores from matched persons are compared 

When independent means ttests should be used:

when sigma (s) is unknown for the population distributions
two separately sampled means are being compared. Important point: the comparison distribution is a distribution of differences between means. With equal N, the two estimates are averaged, and we call this the pooled est of popul variance. Also have: the weighted avg est of pop var. 

***Assumptions of the independent means ttest

A. Population distributions are normally distributed. however, if moderate violations to normality occur, then the ttest is still robust.
if the 2 populations are believed to be skewed in different directions, and a onetailed test is used, then there’s a problem. Check for normality before running a test. 

***Assumptions of the independent means ttest

B. Variances are equal across populations
this is the “homogeneity of variance” assumption if “moderate” violations occur and N per sample is equal or nearly equal, then the ttest is robust to these violations. Examine the variances in your data. if “very large” differences in sample variances occur, the ttest could give inaccurate results. ((If both the normality and variance assumptions are not met, we may want to consider performing some nonparametric tests, or “distributionfree” tests)) 

Effect Size
(d = .80 (large) means we expect (or, found) 4/5 of a standard deviation between the two means) 
These values should be evaluated in their absolute sense; i.e., obtaining an effect size of .80 is still considered large; their value is not bounded by –1 or +1
These effect sizes are symbolized by d, and are calculated as follows: d= M/S. **Denominator of is relatively UNinfluenced by N, as for dep means t tests (M0/SM) and indep. 

What shapes the statistical power in our studies?

Effect size – the larger the effect size, the more power associated with it; it’s easier to find a big difference than a small one, relative to the standard deviation
Sample size – the larger the N, the more power we have; and, the smaller the standard error, which describes the spread of the distribution of means Statistical significance level – the less extreme the a, the easier it will be to reject H0 1 or 2tailed tests – 1tailed tests provide less stringent cutoffs for rejecting H0 

Statistical Power and Effect Size

Please note from these tables:
As N increases, so does power As effect size (d) increases, so does power 1tailed tests have more power than their 2tailed counterparts For the same effect size, a dependent means ttest has more power with half the sample size as that of an independent means ttest – why ? (and, not shown in these tables…more extreme a levels will be more difficult to achieve, and will therefore have less power than a less extreme a level) 

Hypothesis Testing Steps for CORRELATIONS

Step 1: State Null and Research Hypotheses about populations
Suppose we believe that SAT scores and 2nd year college GPA are positively correlated. We’ll set an a level of .05, onetailed test H0: rho1 < r2 H1: r1 > r2 

Hypothesis Testing Steps for CORRELATIONS

df for t = N – 2 = 20 – 2 = 18Step 3: Get tcritical to reject H0 Step 4: Get sample’s score on the comparison distribution Step 5: Compare scores and make decision ***t computed is more extrme than t critical which happens to be. . .Write this part: As hyp, we found stat sig pos Pearson correl btwn x and y, r = .70, p<.01. Underline.
