• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/100

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

100 Cards in this Set

  • Front
  • Back

internal validity

the degree to which the study demonstrates that the treatment caused a change in behavior. If a study lacks internal validity, the researcher may falsely believe that a factor causes an effect when it really doesn’t. Most studies do not have internal validity because they can’t rule out the possibility that some other factor may have been responsible for the effect. Unfortunately, steps taken to increase internal validity (such as keeping non-treatment factors constant) could harm the study’s external validity

extraneous factors

factors other than the treatment. If we can’t control or account for extraneous variables, we can’t conclude that the treatment had an effect. That is, we will not have internal validity. History, instrumentation, maturation, mortality, regression, testing, selection, and selection by maturation interactions are all potential sources of extraneous variables. (p. 255)

maturation

internal, biological changes such as growth, aging, and development. Apparent treatment effects may really be due to maturation

history

external, environmental changes—other than the treatment—that might affect participants’ behavior. These outside events can be almost anything—from wars to unusually cold weather.

instrumentation

the way participants were measured changed from pretest to posttest. In instrumentation, the actual measuring instrument changes, the way it is administered changes, or the way it is scored changes.

testing

participants score differently on the posttest as a result of what they learned from taking the pretest. Thus, even if the treatment had no effect, scores might be better on the posttest because of the practice participants got on the pretest

mortality (attrition)

differences between conditions are due to participants dropping out of the study

selection

treatment and no-treatment groups were different before the treatment was administered

selection by maturation interaction

treatment and no-treatment groups, although similar at one point, would have naturally grown apart (developed differently) even if no treatment had been administered

regression (toward the mean)

if participants are chosen because their scores were extreme, these extreme scores may be loaded with extreme amounts of random measurement error. On retesting, participants are bound to get more normal (average) scores as random measurement error’s effects decrease to more normal levels

matching

choosing your groups so that they are similar (they match) on certain characteristics. Matching reduces, but does not eliminate, selection bias. Because of regression and selection by maturation effects, two groups that were matched on the pretest may score differently on the posttest.

pretest–posttest design

a before–after design in which each participant is given the pretest, administered the treatment, then given the posttest. The pretest–posttest design is not vulnerable to selection and selection by maturation interactions. It is, however, extremely vulnerable to history, maturation, and testing effects.

placebo treatment

a fake treatment that we know has no effect, except through the power of suggestion. For example, in medical experiments, a participant may be given a pill that does not have a drug in it. By using placebo treatments, you may beable to make people “blind” to whether a participant is getting the real treatment.

single blind

when either the participant or the experimenter is unaware of whether the participant is getting the real treatment or a placebo treatment. Making the participant “blind” prevents the participant from biasing the results of the study; making the experimenter blind prevents the experimenter from biasing the results of the study

double blind (double masked)

neither the participant nor the research assistant knows what type of treatment (placebo treatment or real treatment) the participant is getting. By making both the participant and the assistant “blind,” you reduce both subject (participant) and experimenter bias.

experimental group

the participants who are randomly assigned to get the treatment

control group

the participants who are randomly assigned to not receive the treatment. The scores of these participants are compared to the scores of the experimental group to see if the treatment had an effect

empty control group

a control group that does not receive any kind of treatment, not even a placebo treatment. One problem with an empty control group is that if the treatment group does better, we don’t know whether the difference is due to the treatment itself or to a placebo effect. To maximize construct validity, most researchers avoid using an empty control group

independent variable

the treatment variable; the variable manipulated by the experimenter. The experimental group gets more of the independent variable than the controlgroup. Note: Don’t confuse independent variable with dependent variable

levels of an independent variable

the treatment variable is often given in different amounts. These different amounts are called


levels

dependent variable (dependent measure)

participants’ scores—the response that the researcher is measuring. In the simple experiment, the experimenter hypothesizes that the dependent variable will be affected by (depend on) the independent variable

independently, independence

a key assumption of almost any statistical test. In the simple experiment, observations must be independent. That is, what one participant does should have no influence on what another participant does, and what happens to one participant should not influence what happens to another participant. Individually assigning participants to treatment or no-treatment condition and individually testing each participant are ways to achieve independence

independent random assignment

randomly determining, for each individual participant, and without regard to what group the previous participant was assigned to, whether that participant gets the treatment. For example, you might flip a coin for each participant to determine whether that participant receives the treatment. Independent random assignment to experimental condition is the cornerstone of the simple experiment

experimental hypothesis

a prediction that the treatment will cause an effect. In other words, a prediction that the independent variable will have an effect on the dependent variable

null hypothesis

the hypothesis that there is no treatment effect. Basically, this hypothesis states that any difference between the treatment and no-treatment groups is due to chance. This hypothesis can be disproven, but it cannot be proven. Often, disproving the null hypothesis lends support to the experimental hypothesis

simple experiment

participants are independently and randomly assigned to one of two groups, usually to either a treatment group or to a no-treatment group. The simple experiment is the easiest way to establish that a treatment causes an effect

internal validity

a study has internal validity if it can accurately determine whether an independent variable causes an effect. Only experimental designs have internal validity

inferential statistics

the science of chance. More specifically, the science of inferring the characteristics of a population from a sample of that population

population

the entire group that you are interested in. You can estimate the characteristics of a population by taking large random samples from that population

central limit theorem

the fact that, with large enough samples, the distribution of sample means will be normally distributed. Note that an assumption of the t test is that the distribution of sample means will be normally distributed. Therefore, to make sure they are meeting that assumption, many researchers try to have “large enough samples,” which they often interpret as at least 15participants per group

t test

the most common way of analyzing data from a simple experiment. It involves computing a ratio between two things: (1) the difference between your group means; and (2) the standard error of the difference (an index of the degree to which group means could differ by chance alone). As a general rule, if the difference you observe is more than three times bigger than the standard error of the difference, then your results will probably be statistically significant. However, exact ratio that you need for statistical significance depends on your level of significance and on how many participants you have. You can find the exact ratio by looking at the t table in Appendix E and looking for where the column relating to your significance level meets the row relating to your degrees of freedom. (In the simple experiment, the degrees of freedom will be two less than the number of participants.) If the absolute value of the t you obtained from your experiment is bigger than the tabled value, then your results are significant

statistical significance

when a statistical test says that the relationship we have observed is probably not due to chance alone, we say that the results are statistically significant. See also p <.05.

p < .05

in the simple experiment, p < .05 indicates that if the treatment had no effect, a difference between the groups at least as big as what was discovered would happen fewer than 5 times in 100. Since the chances of such a difference occurring by chance alone are so small, experimenters usually conclude that such a difference must be due, at least in part, to the treatment

Type 1 error

rejecting the null hypothesis when it is really true. In other words, declaring a difference statistically significant when the difference is really due to chance. Thus, Type 1 errors lead to “false discoveries.” If you set p < .05, there is less than a 5% (.05) chance that you will make a Type 1 error. (p. 290)

Type 2 error

failure to reject the null hypothesis when it is really false; failing to declare that a difference is statistically significant, even though the treatment had an effect. Thus, Type 2 errors lead to failing to make discoveries

power

the ability to find differences; or, put another way, the ability to avoid making Type 2errors

null results (nonsignificant results)

results that fail to disprove the null hypothesis. Null results do not prove the null hypothesis because null results may be due to lack of power. Indeed, many null results are Type 2 errors. (p.290)

construct

a mental state that can’t be directly observed or manipulated, such as love, intelligence, hunger, feeling warm, and aggression

construct validity

the degree to which the study actually measures and manipulates the elements that the researcher claims to be measuring and manipulating. If the operational definitions of the constructs are poor, the study will not have good construct validity. For example, a test claiming to measure “aggressiveness” would not have construct validity if it really measured assertiveness

internal validity

the degree to which the study demonstrates that a particular factor caused a change in behavior. If a study lacks internal validity, the researcher may falsely believe that a factor causes an effect when it really doesn’t. Most studies involving humans do not have internal validity because they can’t rule out the possibility that some other factor may have been responsible for the effect. Unfortunately, steps taken to increase internal validity (such as keeping nontreatment factors constant) could harm the study’s external validity

experiment

a special type of study (not at studies are experiments!) that allows researchers to determine the cause of an effect; usually involves randomly assigning participants to groups

external validity

the degree to which the results of the study can be generalized to other places, people, or times

ethical

conforming to a profession’s principles of what is morally correct behavior. In the case of psychological research, the American Psychological Association has established guidelines and standards of morally appropriate behavior. Usually, ethical human research must be approved by an internal review board (IRB) and involve both informed consent and debriefing.

informed consent

Giving potential participants information about the study, especially in terms of factors that might lead them to refuse to be in the study, before they decide whether to participate

debrief, debriefing

Explaining the purpose of the study, answering any questions, and undoing any harm that the participant may have experienced as a result of participating in the study

Internal Review Board (IRB)

committee of at least five members--one of whom must be a nonscientist--that review proposed research and monitor approved research in an effort to protect human research participants

bias

systematic errors that can push the scores in a given direction. Bias may lead to “finding” the results that the researcher wanted

observer bias

bias created by the observer seeing what the observer expects to see, or selectively remembering/counting/looking for data that support the observer’s point of view (also known as scorer bias)

blind (masked), blind observer

an observer who is unaware of the participant’s characteristics and situation. Using blind observers reduces observer bias.

operational definition

a publicly observable way to measure or manipulate a variable; a “recipe” for how you are going to measure or manipulate your factors

random error of measurement

inconsistent, unsystematic errors of measurement. Carelessness on the part of the person administering the measure, the person taking the test, and the person scoring the test can cause random error

reliable, reliability

the extent to which a measure produces stable, consistent scores. Measures are able to produce such stable scores if they are not strongly influenced by random error. A measure can be reliable, but not valid. However, if a measure is not reliable, it cannot be valid

test-retest reliability

away of assessing the total amount of random error in a measure by administering the measure to participants at two different times and then correlating their results. Low test-retest reliability could be due to inconsistent observers, inconsistent standardization, or poor items. Low test-retest reliability leads to low validity

interobserver agreement

the percentage of times the raters agree

interobserver reliability

like interobserver agreement, interobserver reliability is an index of the degree to which different raters rate the same behavior similarly. Low interobserver reliability probably means that random observer error is making the measure unreliable

internal consistency

the degree to which all the items on a measure correlate with each other. If you have high internal consistency, all the questions seem to be measuring the same thing. If, on the other hand, answers to some questions are inconsistent with answers to other questions, this inconsistency may be due to some answers being (1) strongly influenced by random error or being (2) influenced by different constructs. Internal consistency can be estimated through average correlations, split-half reliability coefficients, and Cronbach’s alpha

subject (participant) biases

ways the participant can bias the results. The two main subject biases are (1)trying to help the researcher out by giving answers that will support the hypothesis, and (2) giving the socially desirable response

social desirability bias

participants acting in a way that makes the participant look good

demand characteristics

aspects of the study that allow the participant to figure out how the researcher wants that participant to behave

unobtrusive measurement

recording a particular behavior without the participant knowing you are measuring that behavior. Unobtrusive measurement reduces subject biases such as social desirability bias and obeying demand characteristics

Hawthorne Effect

when the treatment group changes its behavior not because of the treatment itself, but because group members know they are getting special treatment

Instructional manipulation

manipulating the variable by giving written or oral instructions

environmental manipulation

a manipulation that involves changing the participant’s environment rather than giving the participant different instructions

stooges (confederates)

people who seem (to the real participants) to be participants, but who are actually the researcher’s assistants

construct validity

the degree to which an operational definition reflects the concept that it claims to reflect. Establishing content, convergent, and discriminant validity are all methods of arguing that your measure has construct validity

content validity

the extent to which a measure represents a balanced and adequate sampling of relevant dimensions, knowledge, and skills

convergent validity

validity demonstrated by showing that the measure correlates with other measures, manipulations, or correlates of the construct

known-groups technique

a convergent validity tactic that involves seeing whether groups known to differ on a characteristic differ on a measure of that characteristic (e.g., ministers should differ from atheists on a measure of religious beliefs)

discriminant validity

the extent to which the measure does not correlate strongly with measures of constructs other than the one you claim to be measuring

experimenter bias

experimenters being more attentive to participants in the treatment group or giving different nonverbal cues to treatment group participants than to other participants

standardization

treating each participant in the same (standard) way. Standardization should reduce experimenter bias

manipulation check

a question or set of questions designed to determine whether participants perceived the manipulation in the way that the researcher intended

placebo treatment

a treatment that is known to have no effect. To reduce the impact of subject (participant) bias, the group getting the real treatment is compared to a group getting a placebo treatment—rather than to a group that knows it is getting no treatment

analysis of variance (ANOVA)

a statistical test that is especially useful when data are interval, and there are more than two groups. For the experiments discussed in this chapter, ANOVA involves dividing between-groups variance by within-groups variance

between-groups variance (treatment variance, variability between group means, Mean Square Treatment, Mean Square Between)

at one level, between-groups variance is just a measure of how much the group means differ from each other. Thus, if all the groups had the same mean, between-groups variance would be zero. At another level, between-groups variance is an estimate of the combined effects of the two factors that would make group means differ—treatment effects and random error

within groups variance (error variance, variability within groups, Mean Square Error, Mean Square Within)

at one level, within-groups variance is just a measure of the degree to which scores within each group differ from each other. A small within-groups variance means that participants within each group are all scoring similarly. At another level, within-groups variance is an estimate of the effects of random error (because participants in the same treatment group score differently due to random error, not due to treatment). Thus, within-groups variance is also called error variance

F ratio

at the numerical level, the F ratio is the Mean Square Between divided by the Mean Square Within. At the conceptual level, F is the between-groups variance (treatment plus random error) divided by within-groups variance (random error). If the treatment has no effect, the F ratio will tend to be close to 1.0, indicating that the difference between the groups could be due to random error. If the treatment had an effect, the F ratio will tend to be substantially above 1.0, indicating that the difference between the groups is bigger than would be expected if only random error were at work

confounding variables

variables, other than the independent variable, that may be responsible for the differences between your conditions. There are two types of confounding variables: ones that are manipulation-irrelevant and ones that are the result of the manipulation. Confounding variables that are irrelevant to the treatment manipulation threaten internal validity. For example, the difference between groups may be due to one group being older than the other, rather than to the treatment. Random assignment can control for the effects of those confounding variables. Confounding variables that are produced by the treatment manipulation hurt the construct validity of the study. They hurt the construct validity because even though we may know that the treatment manipulation had an effect, we don’t know what it was about the treatment manipulation that had the effect. For example, we may know that an “exercise” manipulation increases happiness (internal validity), but not know whether the “exercise” manipulation worked because people exercised more, got more encouragement, had a more structured routine, practiced setting and achieving goals, or met new friends. In such a case, construct validity is harmed because we don’t know what variable(s) are being manipulated by the “exercise” manipulation

empty control group

a group that gets no treatment, not even a placebo. Usually, you should try to avoid empty control groups: They hurt construct validity because they don’t allow you to discount the effects of treatment-related, confounding variables. For example, empty control groups may make your study very vulnerable to hypothesis-guessing

hypothesis-guessing

participants trying to figure out what the study is designed to prove. Hypothesis-guessing can hurt a study’s construct validity

levels of the independent variable

values (amounts) of the treatment variable. In the simple experiment, you only have two levels of the independent variable. In the group experiment, you have more than two levels. Having more than two levels of the independent variable can help you determine the functional relationship between the independent and dependent variables

functional relationship

the shape of the relationship between variables. For example, the functional relationship between the independent and dependent variables might be linear or curvilinear

linear relationship

a functional relationship between an independent and dependent variable that is graphically represented by a straight line

nonlinear relationship (curvilinear relationship)

a functional relationship between an independent and dependent variable that is graphically represented by a curved line

post hoc trend analysis

a type of post hoc test designed to determine whethera linear or curvilinear relationship is statistically significant (reliable)

post hoc test

a statistical test done after (1) doing a general test such as an ANOVA and (2)finding a significant effect. Post hoc tests are used to follow up on significant results obtained from a more general test. Because a significant ANOVA says only that at least two of the groups are significantly different from one another, post hoc tests may be performed to find out which groups are significantly different from one another

matched-pairs design

an experimental design in which the participants are paired off by matching them on some variable assumed to be correlated with the dependent variable. Then, for each matched pair, one member is randomly assigned to one treatment condition, whereas the other is assigned to the other treatment condition (or to a control condition). This design usually has more power than a simple between-groups experiment.

dependent groups t test (also called within-subjects t test)

a statistical test for analyzing matched-pairs designs or within-subjects designs that use only two levels of the treatment

within-subjects design

an experimental design in which each participant is tested under more than one level of the independent variable. Because each participant is measured more than once (for example, after receiving Treatment A, and again after receiving Treatment B), this design is also called a repeated-measures design. In a within-subjects (repeated-measures) design, a participant may receive Treatment A first, Treatment B second, and so on

randomized within-subjects design

to make sure that not every participant receives a treatment series in the same sequence, within-subjects researchers may randomly determine which treatment comes first, which comes second, and so on. In other words, participants all get the same treatments, but they receive different sequences of treatments

order

the position in a sequence (first, second, third, etc.) in which a treatment occurs

order effects (trial effects)

a big problem with within-subjects designs. The order in which the participant receives a treatment (first, second, etc.) will affect how the participant behaves. Order effects may be due to practice effects, fatigue effects, carryover effects, or sensitization. Do not confuse with sequence effects

practice effects

after doing the dependent-measure task several times, a participant’s performance may improve. In a within-subjects design, this improvement might be incorrectly attributed to having received a treatment

fatigue effects

decreased performance on the dependent measure due to being tired or less enthusiastic as the experiment continues. In a within-subjects design, this decrease in performance might be incorrectly attributed to a treatment. Fatigue effects could be considered negative practice effects

carryover effects (also called treatment carryover effects)

the effects of a treatment administered earlier in the experiment persist so long that they are present even while participants are receiving additional treatments. Carryover effects create problems for within-subjects designs because you may believe that the participant’s behavior is due to the treatment just administered when, in reality, the behavior is due to the lingering effects of a treatment administered some time earlier

sensitization

after getting several different treatments and performing thedependent-variable task several times, participants in a within-subjects design may realize (become sensitive to) what the hypothesis is. Consequently, a participant in a within-subjects design may behave very differently during the last trial of the experiment (now that the participant knows what the experiment is about) than the participant did in the early trials (when the participant was naïve)

counterbalanced within-subjects designs

designs that give participants the treatments in systematically different sequences. These designs balance out routine order effects

sequence effects

if participants who receive one sequence of treatments score differently than those participants who receive the treatments in a different sequence, there is a sequence effect

mixed design

a design that has at least one within-subjects factor and one between-subjects factor. Counterbalanced designs are a type of mixed design

power

the ability to find statistically significant results when variables are related. Within-subjects designs are popular because of their power