• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/114

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

114 Cards in this Set

  • Front
  • Back
causal-comparative (sometimes call an ex post facto study) has these essential characteristics:
1. researchers observe and describe some current condition
2. researchers look to the past to try to identify the possible causes of the condition.
(is non-experimental)
directional vs. nondirectional hypothesis.
directional = researchers predict which group will be higher
program evaluation
applied research (versus basic research, which is understanding underlying theories)
formative evaluation
1. info is collected in the process of implementing a program
2. collecting info on the progress toward the ultimate goals (can prevent disappointment by changing midway if not working)
summative evaluation
when evaluators collect info about participants' attainment of the ultimate goals and the end of the program
If a researcher studies every member of a population, this research is conducting a ______.
census
Researchers obtain biased samples when they use...
samples of convenience
simple random simple
every member of a population is given an equal chance of being included in a sample
sample error
error created by random sample
systematic simpling
seen as equivalent to simple random sampling; every nth individual is selected.
The technical term for discussing the magnitude of sampling errors is...
precision (results are more precise when researchers reduce sampling errors)
stratified random sampling
divide a population into strata (men, women)
select the same PERCENTAGE (versus number)from each strata
cluster sampling
draw groups (or clusters) of participants instead of drawing individuals.
To be unbiased, the clusters or groups must be drawn at random
purposive sampling
purposively select individuals whom they believe will be good sources of information. (popular for qualitative research)
snowball sampling
useful when attempting to locate participants that are hard to find (find 1 who might lead you to others, for example, heroine addicts)
mortality
when subjects drop-outs, increases bias and should be reported in results
T or F - increasing sample size decreases precision
false
instrument
the generic term for any type of measurement device
instrumentation
terms used as the heading for the section of a report where the measurement devices used in the research are described
valid
the extent it measures what it is designed to measure and accurately performs the function it is purported to perform.
to determine content validity of an instrument, researchers...
make judgment on the appropriateness of its content
predictive validity
to what extent does the test predict the outcome it is supposed to predict?
validity coefficient
correlation coefficient used to express validity: range from 0 to 1 (1=perfect validity)
concurrent validity
an independent measure of the same trait that the test is designed to measure (a different test) given at about the same time that the test is administered.
concurrent and predictive validity falls under a general term,
criterion validity
construct validity
relies on subject judgments AND empirical data; offers only indirect evidence. (looking at constructs, like depression or self-esteem)
A test is said to be reliable if it yields...
consistent results
A test with high reliability may have ____ validity.
LOW
When evaluating instruments, reliability/validity is more important than reliability/validity. (which)
Validity is more important than reliability.
when researchers use reliability coefficients to describe agreement between observers, they are usually called...
interobserver reliability coefficients
When researchers use correlation coefficients to describe reliability, they call them...
reliability coefficients
when studying interobserver reliability, researchers uaully obtain the two measurements...
at the same time
in ________ reliability, researchers measure at two different points in time.
test-retest
How is parallel-forms reliability determined?
administering one form of the test to examinees and about a week or two later, administering the other form to the same examinees, thus yielding 2 scores per examinee.
How high should a reliability coefficient be?
Most published tests have a reliability of .80 or higher, so researchers should strive to select or build instruments that have coefficients at least this high, especially if researchers plan to interpret scores for individuals.
For group averages based on groups of participants of about 25 or more, instruments with reliability coefficients as low as _____ can be serviceable.
.50
split-half reliability
(usually odd-even split) one test is split to provide 2 scores for each individual. These are correlated to estimate internal consistency (checks on consistency within the test itself)
An alternative to the split-half method for estimating internal consistency is _____.
Cronbach's alpha (formula for essentially averaging a ton of split-half possiblities (like even-odd, first half/second half, and so on).
Having high internal consistency is desirable when a research has developed a test designed to measure... (and provide example)
single unitary variable
example: if a researcher build a test on the ability to sum one-digit numbers, students who score high on some of the items should score high on the other items and vice versa; whereas testing with multiple variables like math, social studies, and science would equal very low internal consistency)
test-retest reliability and parallel forms reliability measure the ...., while internal consistency methods (split-half and alpha) measure ...
consistency of scores over time

consistency among the items within a test at a single point in time
tests designs to facilitate a comparison of an individual's performance with that of a norm group are called...
norm-referenced tests (NRTs)
In a norm-referenced test, if an examinee takes a test and earns a percentile rank of 64, the examinee knows that she scored...
higher than 64% of the individuals in the norm group
tests designed to measure the extent to which individual examinees have met performance standards are called...
criterion-referenced tests (CRTs)
On a criterion-referenced test, interpretation of an examinee's test score is _______ of how other students perform.
independent
NRTs are intentionally built to be of _____ difficulty. Specifically, items that are answered correctly by about ____% of the examinees in tryouts during test development are favored.
medium difficulty
50%
If the purpose is to describe specifically what examinees know and can do, ______ tests should be used.
criterion-referenced tests

example: have students mastered the essentials of simple addition
If the purpose is to examine how a local group differs from a larger norm group, _____ tests should be used.
norm-referenced tests.

example: How well are students in New York doing in reading in comparison to the national average.
An achievement test measures...
knowledge and skills individuals have acquired.
An aptitude test is designed to...
predict some specific type of achievement.
The most widely used aptitude tests typically have _____ validity and _____ reliability.
low to modest validity (validity coefficients of about .20 to .60) and high reliability (reliability coefficients of .80 or higher)
An intelligence tests is designed to...
predict achievement in general, not any one specific type.
The most popular intelligence tests are (2) things.
1) culturally loaded
2) measure knowledge and skills that can be acquired with instruction (with questions such as, "How far is it from N.Y. to L.A.)
Name 3 basic approaches to reducing social desirability in participants' responses in order to measure without the influence of social desirability.
1) administering personality measures anonymously
2) observe behavior unobtrusively (without the participants' awareness--to the extent this is ethically possible.
3) projective techniques (which have controversial validity problems, time consumptions, etc)
Scales that have choices from "strongly agree" to "strongly disagree" are known as...
Likert-type scales
true experimental designs are characterized by
random assignment to treatments
pretest sensitization, or reactive effect of testing
changes in the experimental group may be the result of a combination of being exposed to the pretest and the treatment
posttest-only randomized control group design
an experiment without a pretest (to make-up for the pretest sensitization)
Solomon randomized four-group design
2 pretest groups (treatment, control group)
2 no-pretest groups (treatment, control group)
to determine if pretest sensitization effected the experiment
(probably need 48 or more people in sample to divide into 4)
explanations for changes other than the treatment are...
threats to internal validity
What is "history" in regard to a threat to internal validity?
other environmental influences on the participants between the pretest and posttest (reading a self-help book during the same time period)
What is "maturation" in regard to threat to internal validity?
participants matured during the period between pre and post-test, and the increase is due to maturation and not the treatment.
What is "instrumentation" in regard to threat to internal validity?
possible changes in the instrument (measurement procedure) from the time it was used as a pretest to the time it was used as a post-test. (different observers, for example)
What is "statistical regression" in regard to threat to internal validity?
occurs only if participants are selected on the basis of their extreme scores (if you select only the really low scores, on a post-test they will score higher purely because of the nature of random errors)
What is "intact groups" in regard to threat to internal validity?
individuals for groups aren't selected at random - they are left in "intact groups" (UI student group vs. WSU student group). (indicated by a dotted line between groups)
What is "selection" in regard to threat to internal validity?
When researchers do not assign participants to the 2 groups at random, there is a very strong possibility that the two groups are not initially the same in all important respect. Selection can interact with all other threats to internal validity (selection-history interaction, selection-maturation interactions, etc)
all threats to internal validity can be overcome using a...
true experimental design, in which participants are assigned at random to experiment and control conditions.
When asking if a results generalize, the researcher is essentially asking what?
Is it accurate to assume that the treatment administered to the experimental group will work as well in the population as it did in the sample?
What are some threats to external validity? (how well it generalizes)
selection bias
reactive effects of experimental arrangements
reactive effect of testing (pretest sensitization)
multiple-treatment interference
What is selection bias (in regards to being a threat to external validity) and how can it be controlled?
a biased sample of participants; control this threat by selecting participants at random
What is reactive effects of experimental arrangements? control for it?
if the experimental setting is different from the natural setting in which the population usually operates, the effects that are observed in the experimental setting may not generalize to the natural setting.
Control: conduct experiments under natural conditions, when possible
What is reactive effect of testing? also known as pretest sensitization?
the possibility that the pretest might influence how the participants respond to the treatment
What is multiple-treatment interference?
when a group of participants is given more than one treatment; treatments given earlier in an experiment might affect the effectiveness of treatments given later to the same participants.
internal vs. external validity
internal: "Is the treatment in this particular case response for the observed changes?"

external: "To whom and under what circumstances can the results be generalized?"
pre-experimental designs
limited value for investigating cause-and-effect relationships because of their poor internal validity
list 3 pre-experimental designs
one-group pretest-posttest design
one-shot case study
static-group comparison design
what is the one-shot case study?
one group is given a treatment (X) followed by a test (O)

X O
static-group comparison design
two groups but participants are not assigned to the groups at random (the dashed line between the two groups indicates they are intact groups)

X O
--------
O
When it is not possible to assign participants at random, pre-experimental designs should be AVOIDED. Instead, _______ designs should be used, which are...
quasi-experimental designs, which are of intermediate value for exploring cause-and-effect
Two widely used quasi-experimental designs are...
nonequivalent control group design
equivalent time-samples design
What is the nonequivalent control group design?
two intact groups (not assigned at random as indicated by the dashed line):
O X O
--------
O O

(even though not assigned at random, researchers often use some sort of matching to increase the internal validity)
What is the equivalent time-samples design?
has only one group; treatment conditions are alternated (preferably on random-basis)

X0O X1O X0O X1O X1O
(major disadvantage=multiple treatment interference)
Hawthorne Effect
"attention effect"

(lights in factory, whether down or up, performance went up because participants knew research were paying attention to them)
How can a researcher control for the Hawthorne Effect?
Use 3 groups:
1) experimental group
2) control group that receives attention
3) control group that receives no attention
What is the John Henry Effect?
refers to the possibility that the control group might become aware of its "inferior" status and respond by trying to outperform the experimental group.
descriptive statistics vs. inferential statistics
descriptive: summarize data so they can be easily understood

inferential: draw inferences about the effects of sampling errors on the results that are described with descriptive statistics
Statistics come from ______.
Parameters come from ______.
Statistics come from Samples.
Parameters come from Populations.
significance tests determine the probably that...
the null hypothesis is true
Suppose a researcher conducted a significance test and found that the probability of the null hypothesis is a correct hypothesis is less than 5 in 100. How would this be stated?
p < .05, where p stands for the word probability
An alternative way to say that a researcher has rejected the null hypothesis is to state that the difference is...
statistically significant (if a researcher states that a difference is statistically significant at the .05 level, it is equivalent to stating that the null hypothesis has been rejected at that level.
What are the 4 scales (or levels) at which researchers measure?
nominal
ordinal
interval
ration

(noir)
What is the nominal level?
the "naming" level
(words, not numbers)
For example: male, female, married, single, etc.
What is the ordinal level?
the measurements places participants in order from high to low (putting in order doesn't say the difference between them)
What is the interval scale?
has equal distances but no absolute zero (a zero on a multiple-choice exam doesn't mean the person has 0 knowledge)
What is the ratio scale?
has equal distances and an absolute zero (for example, weight)
univariate analysis
how participants vary (hence the root variate) on only ONE variable.
bivariate analysis
examines relationship between two nominal variables (for example, whether there is a relationship between student gender and preference for a candidate)
What is chi-square?
The usual test of the null hypothesis for differences between frequencies, in other words, used to obtain the probability that the null hypothesis is correct.
Chi-square and degrees of freedom are NOT descriptive statistics. They can be thought of as sub-steps in the mathematical procedure for obtaining the value of >>>, which
p, which if p is less than .05, the null hypothesis can be rejected. (if p > .05, the null hypothesis can NOT be rejected)
Type I Error vs. Type II Error
(RA)
(R) Type I: REJECTING the null hypothesis when it should have been accepted.
(A) Type II: ACCEPTING the null hypothesis when it should have been rejected.
positive vs. negative skew of distribution
positive: tail on right
negative: tail on left
mean
median
mode
mean=average
median=middle score
mode=most frequently occurring score
a synonym for the term averages is...
measures of central frequency
the standard deviation describes ______.
variability
The most widely used coefficient is the...
Pearson product-moment correlation coefficient, usually called Pearson r
direct(positive) relationship vs. inverse (negative) relationship
direct(positive): as one goes up, the other goes up
indirect(negative): as one goes up, the other goes down
coefficient of determination
to think about correlation in terms of percentages, square pearson's r.

Example.
If r=.50, the coefficient of determination would be .25; which means that .50 is 25% better than a Pearson r of 0.00
The "t test" is often used to test...
the null hypothesis regarding the observed difference between TWO means
statistically significant indicates that the...
null hypothesis has been rejected.
Name 3 things that would cause a t test to yield low probability?
sample size: the large the sample, the lower the value of p.
size of the difference between means: the larger the difference, the less likely the difference is due to sampling errors
the amount of variation in the population: the smaller the variability, the lower the p.
An alternative test (to the t test) to test the null hypothesis between two sample means is...
the analysis of variance, the ANOVA.
Instead of t (like with the t test) the ANOVA yield a statistic called
F (as well as degrees of freedom, sum of squares, mean square, and a p value). As with the t test, the only value of interest to the typical consumer is the value of p.
By convention, when p equals .05 or less, researchers ______ the null hypothesis and declare the result to be ______ _______.
reject the null hypothesis
statistically significant
t-test vs. ANOVA
t-test can compare only two means; a single ANOVA can compare a number of means.

When two means are compared, both will result in the same p -value.
meta-analysis
set of statistical methods for combining results of previous studies.