• Shuffle
Toggle On
Toggle Off
• Alphabetize
Toggle On
Toggle Off
• Front First
Toggle On
Toggle Off
• Both Sides
Toggle On
Toggle Off
• Read
Toggle On
Toggle Off
Reading...
Front

### How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

Play button

Play button

Progress

1/65

Click to flip

### 65 Cards in this Set

• Front
• Back
 Validity Types: Statistical How accurate is the conclusion you draw from a statistical test. We hope to be able to draw conclusions based on reliable, dep measures. We desire our statistical assumptions (pop distrib and var) be met. p < .05 The probability of obtaining a test statistic as extreme or more extreme than a stated value usu one you've computed. If its less than 5% chance that we could get a test statistic this extreme, we decide to reject the null hyp b/c the occurence of the statistic is so unus. A risk in making an error and the extent to which we'd like to think we risk leaving statistical validity unprotected. Construct Do the findings support the theory over competing theories. We hope our theory explains the nature of the data we analyze but we must submit our theory to scrutiny. External Can we generalize these results to other conditions? We need to be able to be assured our findings can be detected in other contexts besides our controlled lab setting or this particular set of partic. Internal Did the IV cause a change in the DV? We need to eliminate confounds which subtract from the potential explanatory power of the IV. Confounds: Maturation DV changes occur solely as a result of partic growing older or more experienced. Partic relevant to longit studies. Confounds: History Testing DV changes due to variation in events outside the study's IV effects. Worst with large amount of time btwn pre and post tests. Testing: partic show dv changes due to practice in repeated measurements. Confounds: Instrumentation. Regression to the mean. DV changes are due to the measuring instrument's variability. Ex, changing observational criteria. Regression: the effect of initially high scorers showing score reductions and initially low scorers showing score inc-more extreme to less. Selection. Attrition. Result of groups not randomly selected and assigned to groups; they're unequivalent. Attrition: partic drop out differentially across groups, causing biased results. Confounds: Diffusion of treatment Sequence effects Partic in diff exper conditions share info from their manip and reduce exper diff across groups. Seq effects: in repeated measure designs or w/in-subject designs, partic are exposed to diff exper conditions, but the order is constant, changes in DV may be caused by the order of the presentaton and the not the condition. Subject effects Deals w/ partic behaviors in a social setting like: curiosity, motivation, expectation and bias. Unnatural beh exhibited, acting good. Hawthorne effect: inc productivity after any special attention. Subj effects: demand characteristics. Placebo effects This is the term given to those cues given to partic usu unintentional by the researcher, that urge the partic to beh unnaturally. Placebo effects: demand char. Expectations of change, improvement w/out manip. Experimenter effects Deal w/ the experimenter influencing partic towoard an unexpected outcome, via manip, or cueing of some type, leading to bias. Ethical violations: another experimenter effect deals w/ only data which supports his hypoth or w/ outliers? General control procedures Prepare lab setting. Lighting, temp, sound. Make setting similar for all. Use same experimenter whenever possible. Exact vs systematic Identical in every way vs similar but w/ procedural changes. General Control Procedures Prepare the lab setting Lighting, temperature, sound Make setting similar for all Use same experimenter whenever possible Too sterile? Generalizability… General Control Procedures… Use stimuli with good reliability and validity Reliability = stability or “repeatability” of results from test, scale, or stimulus Validity = stimulus, test, or scale should give us confidence that we’re measuring the “thing” we think we’re measuring; our test should correlate with other, similar measures of the same behavior Replication: Systematic and Exact Exact = identical in every way Systematic = Similar, but with procedural changes, e.g., “When I read something, I find I must read it again” vs. nearly always Conceptual Replication Different studies are generated from the same problem statement More common than systematic or exact replication E.g., Study 1 on self-efficacy: q-aires and real-life problem solving situation (observational) Study 2 on self-efficacy: experimental manipulation and artificial but controlled problem-solving (experimental) Control over Subject and Experimenter Effects Single and double-blind procedures Single=RA is blind to manipulation/condition Double=RA and participants are blind to manipulation/condition Particularly important to have double-blind studies in social psychology experiments Single-blind studies are important for protecting against experimenter bias Automation Standardization of the study’s instructions using a computer, or video/audio tape Computer-aided scoring and recording of responses tremendously reduces clerical errors in data entry Many studies now use PDAs to automate their data collection Reduces bias, errors, saves money, reduces waste Use Objective Measures Use easily agreed-upon responses if observational measures are taken “Does mother gaze at the infant, face-to-face, for at least 10 seconds at a time?” Use Multiple Observers At least two (more is better) observers rate the behaviors being measured, and compare ratings with some “% agreement” (e.g., 90%). Deception Deliberately withholding information from participants (and misleading them to believe the study’s purposes are very different from what they actually are) Use only when necessary Always debrief participants at end of study E.g., “helping behavior and altruism” Batson study Control Through Subject Selection and Assignment Subject selection Issues: proper sampling helps us increase generalizability of results E.g., The Nun Study on Alzheimer’s Disease Good: careful selection; no smokers, no sex or reproduction; little/no drinking; similar vocations (teachers); group living Bad: generalizability to whom??? Subject Selection… Broad Terms: Population — some larger group of interest Sample — some smaller group drawn from a population; should be representative Specific Terms: General population --- all persons Target population --- specific population of interest (e.g., Alzheimer’s patients) Accessible population --- subset of population available to the researcher (e.g., 100 Alzheimer’s patients from N. Virginia) Subject Selection As researchers, we must be careful to generalize from our sample to the accessible population, but not to the target population Which is why we specify our populations as “those persons like those in our study” when we form hypotheses If other researchers replicate our findings in other accessible populations, we have more external validity Subject Selection How to solve sampling issues? Random Sampling Rarely done or accomplishable Occurs when any person in population has an equal chance of being selected Participants should be independent of each other How to make random draws: random # generator or table Subject Selection Stratified random sampling More commonly used Define the subpopulations to closely match population’s distributional characteristics, and we sample persons this way E.g., if we want to study public policy attitudes of psychologists, and the population of U.S. psychologists is comprised of: 54%, who work in universities 18% in hospitals 12% in private practice 10% in industry Subject Selection Then, for our sample, of N=1000 people, we should randomly select: 540 who work in universities 180 in hospitals 120 in private practice 100 in industry Good, but we still have the issue of study volunteers being unrepresentative of others (selection bias) Subject Selection 3. Ad hoc samples samples drawn from accessible populations The sample’s characteristics define the population to which we can generalize This type of sample is used most often in psychological research E.g., “100 female psychology students, aged 18-20, from a southern university”…gather background information on the participants, and report it in our research article Subject Assignment Placing persons in experimental conditions More important than random selection…especially for internal validity…5 different ways… Random assignment Use a random # table or generator Random is not equal to haphazard We may not end up with equal groups, but we can say the sources of bias have an equal likelihood of being spread across groups 2. Matched Random Assignment Makes n’s equal and balanced across conditions E.g., “memory in college students” study; 2-group design; one group gets mnemonic training; we believe GPA could be a confound if allowed to run loose 3. Eliminate the variable Elimination involves removing persons beyond a certain level of a variable E.g., Aging study of visual perception---mental transformation of a visual object Persons having visual acuity of 20/20, 20/30, 20/40, and 20/50 are acceptable; those persons having acuity worse than 20/50 are eliminated from the study 4. Hold a variable constant Constancy involves allowing persons in a study having only a given level of a variable Same example of visual perception--- in this case, we allow only persons with 20/20 acuity to be in the study---the result gives us constancy 5. Put a variable in study’s design Build this variable into the design and examine the effect of the levels on the DV IV. Control through experimental design Goal is to reduce threats to internal validity We want to be able to say “The IV manipulation caused a change in the DV” and nothing else… The best way to have control in an experiment...Select participants in an unbiased way e.g., stratified random sampling or ad hoc samples Assign participants to conditions in an unbiased way e.g., random assignment, or matched random assignment Control through experimental design… C. Use a control group in your study e.g., one group gets the mnemonic training, the other does not D. True experiment’s 5 characteristics 1. Research hypothesis stating IV’s effect on the DV e.g., R+ will increase proactive behavior Control through experimental design… 2. Have two or more IV levels E.g., varying am’t of time given to solve cognitive problem 10 sec 20 sec 30 sec 3. Assign participants to experimental conditions in an unbiased way E.g., random assignment or matched random assignment Control through experimental design… 4. Use strong procedures for testing causal relationships E.g., use a control group, use pilot testing, use prior research to support work 5. Use specific control procedures to reduce internal validity threats E.g., random assignment, automation, objective measures, placebo, deception Hypothesis testing: General steps We want to draw conclusions about our results by examining the probability of getting our results if the opposite of what we are predicting were true. Make questions into H0 and H1 about populations Ex. Step 1 2. Determine characteristics of the comparison distribution We believe the Population 1 adults do not remember more words than Population 2 adults (hence, Population 1’s mean is less than or equal to Population 2’s mean). 2. Next, we ask, “What is the probability of obtaining a particular test statistic if the null hypothesis is true?” We assume the sample (here, 1 person) was selected from a distribution representing a true null hypothesis, approximately normal in form, with a mean of 8 and a standard deviation of 3. The distribution to which you compare your sample (when the null hypothesis is true) is the comparison distribution 3. Determine the cut-off score to reject the null hypothesis If the null hypothesis is true, then witnessing an adult remembering 14 words is very unlikely…that’s TWO standard deviations above the mean…an extreme value associated with less than 2% probability of occurrence… Performing 2 standard deviations above the mean is rare, regardless of experimental training… The cut-off score, here, is set in terms of normal Z-units If the Z-statistic we compute is more extreme than +2 (in our example), we can reject the null hypothesis Typically, we use a cut-off score that corresponds to a probability of 5%…typically called alpha (a = .05). 4. Determine the sample’s score on comparison distribution So, in our example, we need Z > +2 to reject the null hypothesis So, we’ve conducted our study on N = 1, and we see that our adult was able to remember 15 words Now we’ll compute Z and determine its standing on the comparison distribution. Our Z of + 2.33 indicates that the adult who remembered 15 words is 2.33 standard deviations above the population mean. 5. Reject null hypothesis or not? We know we needed a Z-score in excess of +2 to reject the null hypothesis…we got +2.33…so we can reject the null hypothesis. Therefore, our research hypothesis was supported…Pop 1 adults who received memory training remembered more words than Pop 2 adults who didn’t receive training. If we would not have had a statistically significant test statistic, we’d simply say our results were inconclusive, and that we failed to reject the null hypothesis… Ways of framing hypotheses testing questions One-tailed tests and directional hypotheses. The researcher does not have an idea of which population has the higher mean, only that they’ll be different Hypothesis Testing Introduction 1. Make questions into H0 and H1 2. Determine characteristics of the comparison distribution 3. Determine cutoff score to reject H0 4. Determine sample’s score to reject H0 5. Reject H0 or not? Hyp Testing Popul 2 Pop. 2: Adults who do not receive such special memory training Hypothesis testing using a distribution of means 1. Mean of a distribution of means = pop mean. 2. Variance of a distribution of means = pop variance, divided by # of scores in each sample 3. Shape of a distribution of means is bell-shaped and unimodal Three kinds of distributions 1. Distribution of a population of persons 2. Distribution of a sample drawn from a population 3. Distribution of means of all possible samples of a particular size taken from that distribution The distribution of means We want to compare this mean to something similar---a distribution of means. The Central Limit Theorem underlies several rules we’ll address: As number of samples increases, the mean of the distribution of means approximates the population mean. As # of samples increases, high and low means cancel out each other. Rule 2: The variance of the distribution of means is the variance of the distribution of scores divided by the number of scores in each sample. Rule 2 The standard deviation of a distribution of means is simply the square root of the variance of the distribution of means. **Remember this ! This standard error is the “average amount off we expect our sample mean to differ from the mean of the distribution of means” Rule 3 The shape of a distribution of means tends to be unimodal and bell-shaped. It is at least approximately normal if there are at least 30 scores in each sample, OR if the population of scores is normal Single sample t-test Compare a single mean to a population mean when the population standard deviation (s) is not known Dependent means t-test Compare two linked means when the population standard deviation (s) is not known Single sample t-test… We don’t know the s of the population, only m SO! We will estimate s of the population using our sample information If the sample is randomly. Compute an unbiased estimate of the population variance in a way that boosts the sample’s variance a bit---we’ll do this by putting N – 1 in the denominator of the variance equation t-test for dependent means Here, two scores from each person are compared (repeated measures designs, or within-subjects designs), OR two scores from matched persons are compared When independent means t-tests should be used: when sigma (s) is unknown for the population distributions two separately sampled means are being compared. Important point: the comparison distribution is a distribution of differences between means. With equal N, the two estimates are averaged, and we call this the pooled est of popul variance. Also have: the weighted avg est of pop var. ***Assumptions of the independent means t-test A. Population distributions are normally distributed. however, if moderate violations to normality occur, then the t-test is still robust. ---if the 2 populations are believed to be skewed in different directions, and a one-tailed test is used, then there’s a problem. Check for normality before running a test. ***Assumptions of the independent means t-test B. Variances are equal across populations ---this is the “homogeneity of variance” assumption ---if “moderate” violations occur and N per sample is equal or nearly equal, then the t-test is robust to these violations. Examine the variances in your data. ---if “very large” differences in sample variances occur, the t-test could give inaccurate results. ((If both the normality and variance assumptions are not met, we may want to consider performing some non-parametric tests, or “distribution-free” tests)) Effect Size (d = .80 (large) means we expect (or, found) 4/5 of a standard deviation between the two means) These values should be evaluated in their absolute sense; i.e., obtaining an effect size of -.80 is still considered large; their value is not bounded by –1 or +1 These effect sizes are symbolized by d, and are calculated as follows: d= M/S. **Denominator of is relatively UNinfluenced by N, as for dep means t tests (M-0/SM) and indep. What shapes the statistical power in our studies? Effect size – the larger the effect size, the more power associated with it; it’s easier to find a big difference than a small one, relative to the standard deviation Sample size – the larger the N, the more power we have; and, the smaller the standard error, which describes the spread of the distribution of means Statistical significance level – the less extreme the a, the easier it will be to reject H0 1- or 2-tailed tests – 1-tailed tests provide less stringent cut-offs for rejecting H0 Statistical Power and Effect Size Please note from these tables: As N increases, so does power As effect size (d) increases, so does power 1-tailed tests have more power than their 2-tailed counterparts For the same effect size, a dependent means t-test has more power with half the sample size as that of an independent means t-test – why ? (and, not shown in these tables…more extreme a levels will be more difficult to achieve, and will therefore have less power than a less extreme a level) Hypothesis Testing Steps for CORRELATIONS Step 1: State Null and Research Hypotheses about populations Suppose we believe that SAT scores and 2nd year college GPA are positively correlated. We’ll set an a level of .05, one-tailed test H0: rho1 < r2 H1: r1 > r2 Hypothesis Testing Steps for CORRELATIONS df for t = N – 2 = 20 – 2 = 18Step 3: Get t-critical to reject H0 Step 4: Get sample’s score on the comparison distribution Step 5: Compare scores and make decision ***t computed is more extrme than t critical which happens to be. . .Write this part: As hyp, we found stat sig pos Pearson correl btwn x and y, r = .70, p<.01. Underline.