• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/197

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

197 Cards in this Set

  • Front
  • Back

Reliability

Measures the amount of random or nonsystematic error present in a test

Part of error Resides in the test situation

Who gives the test (examiner effects)



Who takes the test (test taker characteristics)



Item effects

Relationship between examiner and taker

Familiarity with and rapport between examiner and test taker can increase test scores



People report more health concerns in response to interview questions delivered online, or by self report or telephone than direct questioning



Face to face questioning can lead to expectancy effects particularly when questioning children or vulnerable people.

Effects of tester race and gender on test scores


(Great truth)

There is little evidence that tester race or gender influences test scores on individual or group administed ability tests



Belief in such effects is myth and is unsupported by data

Why does race and gender not matter

Test administration guidelines are very specific for most ability tests and training for administration of these tests is often done



When effects are found there is usally a deviation from administration given in the manual


-effects are small, insignificant

Rosenthal effects

Experimenter expectancy effects

Rosenthal effects (experimenter expectancy effects)

Expectancy effects are real but the overall effect on test scores is small



Not clear is expectancy effects can be replicated in the manner they erre found in Rosenthals early studies



Whether expectancy effects happen in standardized testing is unclear


-when present small

Responses after a correct or incorrect

Inconsistent feedback can reduce reliability of test results



Results are inconclusive with some showing increase scores after praise and others showing little effect



Reinforcement does alter responses on attitude surveys


-frequently increasing yea saying responses

What to do after a response

After a response no response should be given to avoid the possibility of reinforcement or random reinforcement



What to say or not say is outlined in the test manual


-there are no exceptions to following standardized testing procedures

Advantages if presenting instructions and items on computer (computer administered tests)

Complete standardization



Branching and adaptive testing is possible



Precise timing



Self paced presentation and response



Complete randomization of question presentation

Difference between computer and test administration

Little evidence of score differences between the two administration methods



Reliability is comparable with both

Responses and feelings about computer administered tests

Rather than feeling alienated or frightened, most test takers find the interaction enjoyable



This may not be true for disadvantaged test takers who have little exposure to technology



People may be more likely to respond honestly to personally sensitive questions on a computer than on self report interviews

Computer adaptive testing (CAT) procedure


Also known as branching

All tesr takers start with the same set of questions of moderate difficulty



Program presents harder or easier questions depending on how these initial questions were answered



Test takers spend little time on questions that are too hard or easy



Program presents items based in test takers skill level until a predetermined number if items have been answered incorrectly (test is then over)

Computer adaptive testing (CAT) advantages

Provide a better profile of a person in a short period of times



Test scores can be given almost immediately

Behavioral assements include measures of

Job samples in which ratings are made of ongoing job related activities



Ratings of children's in class or playground behavior



Ratings of psychiatric patients before and after tearment



Ratings of on going social interactions

What is being measured/ effects are we looking at in behavioral assessments

In all beahviour assessments, a rater is evaluating someone else's beahvior using some form of evaluative scales or dimensions



Both the test and reliability of it is based in tester

What is the main concern/ issue in behavioral assessments

Reliability of the ratings from raters

Reliability of ratings/ raters

Reactivity


-When raters know their ratings will be evaluated, ratings are more accurate than when not being checked

Overcoming reactivity

Suprsie spot checks are made from time to time



Kappa coefficients are needed to assess interrater reliability (agreement between raters)



Interrater reilaiblity needs to be assessed during the actual observation periods not training

Training of observers (raters)

Behavioral ratings require that observers be extensively trained on what to observe and how to code what is observed



-need to generate a code boom to train raters what to look for and how to evaluate



-anchor (everyone knows what a 1 or 5 means)

Drift

In the field actual ratings may depart from ratings done at training and become unique to the rater



These unique individual ratings can take many forms

Forms that unique individual ratings of the rater can take

Inconsistency effects


-same beahviour is rated differently on different occasions


-reliability is gone



Shifting standards


-ratings of the same behavior differs across people



Group standards effects


-when groups of raters are observing, they may adopt a informal, implicit rules for observing


-unless known they don't match to trying and reialbilty is gone



Contrast effect and assimilation effects


-behavior rated differently depending on preceding beahviour

Overcoming drift effects

Periodically retraining on yye original coding format is often necessary



Best way is to video tape tge whole thing and then evaluate the tape with raters

Interview

One person asks another a series of questions believed to be diagnostic of a quality or attribute the interviewer is trying to assess



Best known technique for assessment of individual differences

Landy (1985)

Companies interview between 5-20 people for each hire



Given the costs in time, effort and resources for companies the utility, validity, and reliability of interviews have been studied

What is the purpose of interviews

Is to ask questions that reveal whether or not the interviewee has the skills, ability, interest and motivation to do the job, profit from additional traning , be of benefit to organization, enjoy the new position, get along well with co workers



These concerns are about predictive validity

3 forms of interviews

Structured


Semi structured


Unstructured

Structured interviews

Everyone gets the same questions in the same order from a panel



-in essence an orally administered questionnaire

Semi structured interview

A patterned or guided interview covering certain prefeterminided areas of interest

Unstructured interview

Nondirective depth interview where the interviwer sets the situation and encourages interviewee to tall as freely as possible

Interviews gather (observe) different types of information

Observation of a limited sample of behavior such as speech rate, language usage, poise, reaction to being in an unfamiliar situation, nervousness, style of dress, posture



It is empirically unknown whether any of this collected information from interviews relate to or predict the criterion

Interviews can elicit information that may predict the criterion



It is belived

Is often claimed that what a person has done in the past is a good predictor of what they will or are likely to do in the future

The claim that past is a predictor of future behavior is true if

1) the situation and person remain stable



2) the person's interpretation of their behavior in that or similar situation remains constant

To get useful information or the effectiveness of an interview rests on what

Ability to collect useful information from an interview rests solely on the skill of the interviewers


-their skill to ask the right questions and to correctly interpret the respondents answers



(Interviwer are the test)

Interviews can go wrong in predicting outcomes when

Respondents or interviewers conceal important information



Important questions were not asked



Information was not correctly interpreted



Interviewer is insensitive to cues in interviewees behavior



Interviewer is inattentive to information that was reported

Why is sensitivity to responses important

Sensitivity to what was stated and how it was stated may lead to further probing, learning new information and qualifying previous answers

Different types of interviews include

Employment interviews


Mental status exams


Clinical interviews

What is an employment interview

Is the most frequently used pre employment assessment done by organizations



Can vary along the dimensions of traditional to structured

Traditional interviews

Where a number of different areas are discussed with each job applicant



Serves to acquaint the applicant with possible work colleagues and the work enviroment

Structured interviews

Standardized, with each applicant receiving the same questions and responses are scored using a scoring format

Reliability and validity of traditional unstructured interviews

Traditional unstructed interviews are often invalid and are unreliable predictors of future work performance

Hunter and hunter (1984)

Found a predictive validity coefficient of 0.14 between the Interviewer judgments during a traditional interview and future job evaluations and performance



Reasons for low validity are not hard to find



Results from these interviews say more about the Interviewer than interviewee

Traditional interviews have low validity because they are

Plagued with age, gender and attractiveness stereotypes



Halo and horns effects



Do not address the major concerns of the organization

Halo and horns effects

a form of rater bias which occurs when some walks in and projects competence or incompetence in one area, and the supervisor rates the employee correspondingly high or low in all areas

Negative search strategy

Search for any negative information that would disqualify an applicant



Any negative information will be enough to reget applicants unless demand is high enough and few workers are available



When impressions are favorable (halo effect) the rejection rate drops to 25%



(Most interviews operate on negative search)

Webster 1964

Found that one unfavorable impression was enough to sink the applicants chances in 90% of cases

What constitutes as negative information

Poor communication skills



Lack of confidence or poise



Low enthusiasm



Nervousness



Failure to maintain eye contact



(These are all signs of introversion and social anxiety)

What constitutes as postive information

Ability to express oneself



Self confidence



Poise



Enthusiasm



Ability to sell ones self



(All signs of extroverson, assertiveness and social skills)

What causes a good first impression (tipping the balance in your favor)

Looking professional



Well groomed



Project an aura of competence and expertise



Nonverbal cues that imply friendliness and warmth

Structured job interviews

Address issues of reliability and validity that are raised by traditional interviews



Change from interpersonal relationships to job focused questions

Structured employment interviews uses questions that are

Job focused



Pre-planned



Presented in the same order for all



Answers are scored according to a predetermined scoring procedure

Interviewers in structured job interviews

Interviewers are trained on how to ask questions, how to take notes, and score answers



Procedures standardize interviews for each candidate

What do structured interviews focus on

Focuses on the relationship between past behavior and current and future beahviour



Linking past, present and future beahviour provides better predictions for future behavior



Traditional interviews focus on questions that assess attitudes, opinions and interpersonal dynamics

In structured job interviews all job seekers are asked to

Provide specific examples of beahviours they have used in the past



Provide examples of what they would do under specific circumstances



-answers are rated on behaviorlly anchored rating scales

What interview style are used most often

Many companies and organizations use traditional employment interviews



So standard questions result in standard answers

Successful candidates and turnover

People do not have a great track record when it comes to identifying successful job candidates



Harvard business review points out that 80% of employee turnover is due to bad hiring decisions



Hiring is difficult and mistakes are expensive

Society for human resource management reports that

36% of new hires faol withing the frist 18 months



40% of senior managers hired from outside the organization fail within 18 months



It's costs on average one third of a new hires yearly salary to replace them

Reasons why so many hires fail reflect

Improper gathering of information during the interview



Improper analysis of information



Improper interpretation and intergration of data



Implicit reliance on stereotypes and halo/horn effects, heuristic reasoning



(All play key roles in thr success or failure in selecting goof employees)

Mental status exam

15- 20 minute interview that an intake worker assess the likelihood of brain damage, drug or alcohol problems, psychosis, and other major mental and physical health issues

What is the purpose of mental status exams

Purpose Is to assess neurological or emotional problems in terms of variables known to be possible causes

What is noted in mental status exams

Patients appearance, behavior, speech, perception, thoughts, attitudes and general appearance are noted

What is assessed in mental status exams

Emotional states


-flat affect (little fluctuation in emotion)


-emotionsl inappropriatrness


-emotional liability



Intellectual functioning


-speed and accuracy of thinking


-richness of thought content


-memory capacity


-judgmental accacy


-proverbs tests



Attention processes


-level of distraction


-perseverance


-presence of hallucinations


-delusions

What do emotional, intellectual, attention and thought problems associated with

Are markers of schizophrenia, drug dependency, anxiety disorders and brain disease

What do the results from Mental status exams tell us

Tentative diagnosis


Likelihood of injury to self or others


Outcome of Psychotherapy

How to do a mental status exam competently

A complete understanding of the major mental disorders is required:



-Thorough knowledge of various forms of brain damage



-thorough knowledge of neurological impairment



-thorough knowledge of the DSM - 5 coding system



(Not one fixed exam)

Clinical interviews cover same ground as mental status exams but can be broader and also explore

Job prospects



Career alternatives



Self knowledge



Information to make more appropriate life choices



Therapy and therapeutic related outcomes

What is the the purpose of clinical interviews

Task is to obtain important information for the person but what is important depends on the nature and purpose of the interview

Clinical interviews can be broad or narrow depending on

Nature of the referral question



Nature and quality of the background information



Time demands



Concurrent clinical judgments

Interviewers in mental status exams and clinical interviews

The tone, interview climate, and answers elicited hinge on the beahviour of the interviewers

Research on clinical judgments

Just how accurate are individuals or panels of individuals in


-synthesizing and integrating information about another person


-arriving at a correct decision



-how accurate are clinical diagnosis, judgements, evaluations made by single judge or panel

What types of information and outcomes are used

Judges or panelists have a number of scores of information on the person being interviewed



-test scores


-test score patterns


-interview results


-famkly histories


-medical information


-biological information


-school records

Studies on how acutrate judgments by judges ask

Given the wealth of information available to decision makers, how accurate are evaluators, teachers, clinicians, coaches in predicting outcomes



Evaluators are the test instruments


-the validity and reliability of evaluations made by evaluators becomes an issue

What are actuarial methods

Involves converting all available background information into numbers and entering all of the information into regression equations



Method allows your tk see what information are predictors that best predict an outcome

Cleary model

Which is another name for actuarial methods



Are often constructed with clinical judgments in which evaluators arrive at a judgment, evaluation or determination of a particular case

Who comes up with a better prediction


-a regression equation or humans integrating the available data

A regression model often excels or equals diagnoses, predictions, judgments and evaluations made by individuals or panels

Why are evaluators reports not better than regression

Evaluators or judges reports of how they combine and weight data bear little relationship to how



-information is actually combined



-the weight or importance attached to that information



(Orgian and rationalization issues)


-don't know where results come from but come up with reason

Sarbin (1943) combined high school counselors predicton of grade 12 students sucess in college against the accuracy of regression equation

Counselors used college aptitude test scores, grades interview results, scores on vocational interest inventory, personality test scores, post high school interviews



Regression equation only used college aptitude test scores and high school grades and predictors



Counselor prediction came close to regression equation predictions for girls but regression did better for boys


(Regression did better overall)

Meehl (1954)

Asked if clinical judgments were better predicts than regression equations



Judges were clinical psychologist, counselors, teachers, clinical social workers, other professions in varying degrees of education and work experience

What did Meehl find

Found that with Few exceptions (administrated assistants) , actuarial methods yielded as many correct and frequently more correct predictions as did clinical analysis given by professionals

Meehl 1965


-repeated the study but included 50 new clinical outcome studies

67% of studies favored statistical prediction



33% showed no difference between clinical and actuarial judgments

Goldberg (1965)

Reported that statistical predictions from MMPI profiles predicted future mental health status better than did clinical predictions

Grove (1996)


-meta analysis of 136 studies that directly compared clinical and actuarial judgments

64 studies (47%) actuarial methods predicted better than did clinical judgments



64 studies (47%) showed no difference


-clinicians had the advantage because more information was aviable to them to use than actuarial predictions



8 studies (5%) favored clinical judgments

Enhancing clinical predictions


Neither...

-The amount of clinical experience



-The number of years of professional training



Enhanced predictive accuracy over regression equations or the use of mechanical prediction rules



In mechanical prediction, weights are given to each predictor on the basis of past outcomes

Why are clinical judgments sub optimal

Fail to realize that diagnostic cues are probabilistic not absolute categorical cues for outcomes



Fail to account for cultural, sub, cultural or gender differences



Use of racial or gender stereotypes when making judgments



Use illusory correlations rather than decision rules


-looks like a relationship but not



Overuse and rely on inaccurate predictive principles


-ignore bass rate information, regression to the mean or use of too many correlated predictors

The most important reason why clinical judgments sub optimal

Use intuition, emotion or gut feelings when making judgments or interpreting information

What is regression to the mean

What happens when an event happens and u can assume that it will continue to happen and become shocked when it doesn't



What you say is the outlier

Is information intergration or gathering the issue in clinical interviews

In Clinical interviews data synthesis (intergration) is the issue not information gathering

When are Clinical interviews better

Clinical interviews are better than actuarial procedures for obtaining information ok infrequent behavior

test biases

Refers to questions concerning serveral issued such as


-item fairness


-comparable prediction scores across groups


-or the construct validity of the of the test across groups

Concern with test biases

The concern with test bias arises when test characteristics detract from the construct measurement



-it does not refer to attributes associated with test takers

How does the standards define bias

A bias is a systematic error in a test score



A biased assessment is one that systematically under or over estimates the construct it is designed to measure



Bias exists in the test not the people

When is a test not biased

If an achivment test produces different mean scores for different ethics groups but there are actual true score differences between groups then the test is not biased

When is a test not biased

If an achivment test produces different mean scores for different ethics groups but there are actual true score differences between groups then the test is not biased

When is a test biased

If the observed differences in achievement scores are the result of the tesr underestimating or overestimating the achievement of a group then the test is culturally biased

The concept of test bias focuses on what question

Questions about the interpretation of the validity of the test score


(Test performace)

How can tesr bias affect test takers

Test bias refers to systematic error in the estimation of some true value for a group of individuals


construct over or under representation and construct irrelevant components may affect the performance of different groups of test takers

Most controversial finding in Psychology

The persistent one standard deviation difference between the intelligence test performance of black and while students


-15 standard score points

Cultural test bias hypothesis (CTBH)

Any difference between gender, ethnic, racial or group performance is due to biases

Any gender, ethnic, racial, or group performance difference on mental tests can be attributed to

-Inherent artificial biases produced within the test through flawed psychometric methodology



-group differences are believed to stem from test characteristics



-group differences are unrelated to any actual differences in the psychological trait, skill or ability in question

Cultural loading

Refers to the degree of cultural specificity present in the test or items



Can be culturally loaded without being culturally biases



-the greater cultural specificity the greater likelihood of the item being biased when used with individuals from other cultures



-all tests in current use are bound in some way by their cultural specificity

Mean difference hypothesis

Mean level differences in performance on tasks between two groups are believed to constisute test bias



Asserts that there is no valid scientific reasons to believe that performance levels should differ across racial, ethnic, gender groups



Tests that demonstrate differences are biased



-this is not correct as there are no prior basis for deciding that differences don't exist

Thinking of mean difference hypothesis

Require that the distribution of test scores in each population be identical prior to assuming that the test is nonbiased regardless of its valdidty



Portraying a test as biased regardless of its purpose or the validity of its interpretations suggests poor understanding of the construct being assessed and issues of bias

Jensen (1980)

Discusses The Mean differences as bias definition in terms of the egalitarian fallacy



Difference with regard to any aspect of the distribution of mental test scores indicate that something is wrong with the test

Egalitarian fallacy

The idea that all human populations are identical on all mental traits or abilities

Berry and Annis (1974)

Temmne live in a vertical world and inuit live in a horizontal world



Subject to vertical horizontal illusion

Features od a test that indicate fairness

Interpreting test scores



Minimizing error in test presentation and scoring



Enhancing test validity



Accommodations for those with disabilities



Writing appropriate items



Evacuating potential job candidates through standard criteria for all

Irrelevant factors in fair tests

Factors irrelevant to the construct Are eliminated during assessment to help ensure that the construct is measured in a way that is impacted only by knowledge, skills or abilities relevant to the construct itself

Why are other definitions of tesr bias in CTBH or cross group test validity unacceptable as a scientific perspective

The imprecise nature of other uses of the term makes empirical investigation and rational inquiry exceedingly difficult



Other uses of the term invoke specific moral value systems that are the subject of intense emotional debates that do not have a mechanism for rational resolution



-emotional appeals, legal adversarial approaches and political remedies of scientific issues are not scientifically unacceptable and not useful

Once mean group difference are identified there are 4 common explanations for these differences

The differences primarily have a genetic basis



The differences have an environmental basis



The differences are due to the interactive effect of genes and environment



Tests are defective and systematically underestimating the knowledge and skills of minorities and leads to Differential validity (CTBH)

Unfairness as a measurement bias

When test items are unrelated to the intended construct, it can result in test score differences across subgroups

What is Differential item functioning

Differences in the functioning on test scores between defined groups Indicates that individuals from different groups who have the same standing on the construct being measured do not have the same expected test score Happens when test takers of equal abilities do not have the same probability of answering a test item correctly test item correctly Leads predictive bias

Differential item functioning needs what

An indication of Differential item functioning must be accompanied by a suitable explanation for for Differential item functioning to justify an item as biased

Predictive bias

Differences exist in the pattern of associations between test scores and other variables for different groups, causing concerns about bias in the inferences drawn from the use of test scores

Fairness

Fairness is concerned with the validity of interpreting individual scores for their intended uses



-unfairness means that the test score interpretations are invalid for the intended uses

To have fairness

A individuals need to be treated as similarly as possible (important aspect of fairness)



Important to take into account the individual characteristics of the test taker and understanding how there characteristics may interfere with contextual factors of the testing situation and the interpretation of test scores

What are the major issues with giving achievement tests to minority groups

Inappropriate content



Inappropriate standardization samples



Examiner and language bias



Meaurment of different constructs



Differential predictive validity



Qualitatively distinct aptitude and personality



Inequitable social consequences

Inappropriate content

Black and other minority children have not been exposed to the material involved in the test questions



Tests are geared primarily toward white middle class homes, vocabulary, knowledge, and values



Inappropriate content makes the tesr unsuitable for the use with minority children

Inappropriate standardization samples

Ethnic minorities are underrepresented in standardardization samples used in the collection of normative reference data



Inappropriate standard samples makes the test unsuitable for use with minority children

Examiner and language bias

Because most psychologist are white and speak English, they may intimidate black and ethical minorities



Examiner race and language use biased test results



Biases happen because examiners are unable to accurately communicate with minority children and are insensitive to ethnic pronunciation of words on the test

Measurement of different constructs

Tests measure different constructs when used with children from other than middle class culture on which the tests are largely based



Not a valid measure of intelligence in minority groups.

Differential predictive validity

Tests measure constructs more accurately and make more valid predictions for individuals from the groups that tests are mainly based on than other groups

Qualitatively distinct aptitude and personality

Majority and minority groups have different aptitude and personality triats



So test developers should begin with different definitions for different groups



Helms argued that European and African values and beliefs are different which effects responses

Inequitable social consequences

Due to educational and psychological test biases, minority group members are already disadvantaged in educational and vocational markets because of past discrimination, thoughts of inability to learn and are disproportionately assigned to dead end educational tasks



Represent the inequitable social consequences of biased testing



What is a biased item

An item is biased when it is demonstrated to be significantly more difficult for one group than another



-test items must be unidimensional (all items must be measuring same factor)



-items identified as biased must be differentially more difficult for one group than another



-in this definition groups will have different mean test scores but group differences must be reflected on all items an in an equivalent fashion across items

How to determine biased test items

Number of statistical techniques with many based on item response theory and are used to detect Differential item functioning

Research results on biased items

Very little bias in tests at the level of individual items



Some biased items are nearly always found accounting for more than 2-5% of the variance in performance



For every item favoring one group there is an item favoring the other group

Similarity amoung biased items

Very little similarity amoung biased items has been found



Poorly written, sloppy and ambiguous items tend to be identified as biased items with greater frequency than items encountered in a well constructed standardized instruments

How to eliminate biased items

Expert panels of minority psychologists are asked to indicate which items would be too difficult for minority or disadvantaged individuals



Items that are seen as culturally biased by the panel are removed

Use of expert panels show two consistent findings

Expert judges were no better than chance in choosing test items that minority children scored lower than whites



Judges are not able to detect items which are more difficult for minority children and the ethnic background of the judge makes no difference in accuracy of item selection

Methods used for the internal analysis of test items (item biases in construct measurement)

Factor analysis across groups



Correlation of raw item scores with age .



Comparison of item total correlations across groups



Comparisons of parallel forms and test retest correlations

Comparative item selection (Reynolds 1998)

Multiple retesting of item sets across groups



Unbiased tests will show a 90% overlap rate between tests



Biased tests and tests with low reliability will show low overlap



Need large samples for stable results

Bias in construct measurement

Construct measurement of a large number of often used assessment instruments has been investigated across ethnicity and gender with a divergent set of methodologies



No consistent evidence of bias in construct measurement has been found in the many prominent standardized tests investigated

Psyvholgocial tests

Function and are measured in the same manner across people from diverse ethnicities and gender



Tests appear to be inbiased for the groups investigated and mean score differences do not appear to be an artifact of test item bias

What is the recommended method for detecting item bias

Item response theory followed by a logical analysis of item content These methods are used to determine the degree of Differential item functioning (see if items function differently across groups by model parameters associated with the items)

Item response theory models have various item model parameters that describe the item beahviour (three parameter model)

1) item difficulty (most important)


-the point on the difficulty level of the latent trait at which the examinee has a 50% chance of correctly answering item



2) discrimination power of the item (slope)



3) guessing parameter

Rausch model

Single parameter model that models item difficulty

Using Item Response theory to determine Differential item functioning

Compares the item characteristics curves of two groups to create a Differential item fucnti9ng index



Various statistical methods have been developed for measuring the gaps between item characteristic curves across groups of examinees

Partial correlation analysis

Simple but less precise way to determine item bias



Test for differences between groups on the degree to which there exists meaningful variation in observed scores on items not attributable to the total test score



Mewningfulness is based on effect size which is obtained by coefficient of determination



Need to be attentive to experimenter wise error rates

Biases when useing tests to predict future outcomes are constrained by two problems

Biases in the measurement of the criterion/outcome



Correlation between predictor and criterion is limited by the poor measurement characteristics of t he criterion



-sqrt of validity is maximum



-from the standpoint of the application of aptitude, achievement, and intelligence tests in forecasting probabilities of future performance, prediction is the most crucial use of test scores to examine

Predictive accuracy can be determined in a few different ways

An item analysis can determine if items function the same in all groups (no criterion)



Assess unstabdardized regression weights (slope) to see if weights are comparable across groups



Differences in group averages on the test and averages on the criterion



Exaime cut off scores separated by group and assess differences

Job performance tests

Tests which are similar to actual job performance show little diversity across groups


(Little biases)



Biases arise when inferences are made on the basis of test results to behavior unrelated to the test

Regression equations

Regression equations are used to assess biases in prediction



Predictions take tye form of y=ax+b where a is the constant and b is coeffirnct



An unbiased test requires errors on prediction to be independent of group membership and the x y regression line must be the same for each group



-error around regression line should be similar

Homogeneity of regression across groups


-simultaneous regression


-fairness in prediction

When regression equestrian for two groups are equivalent (the prediction is the same for those groups)



When homogeneity of regression across groups does not happen then seperate regression equations should be used for each group

Cleary model

The use of a single equation to make predictions from test scores



Refers to the use of regression weights (slope) to predict job success or outcomes

Clinical use of regression equations

In Clinical practice regression equations are rarely generated for the prediction of future performance



Rather some arbitrary or statistically derived cutoff score us determined failure if below


-usally based on clinical lore or past practices



2 SD below test mean is used to infer a high probability of failure in school performance

Cut off criterion

Using cutoff scores, clinicians are establishing prediction implicit (implied) regression equations about mental aptitude that are assumed to be equivalent across race, sex ect

Gorden 4 types of test bias

Case 1


-groups A and B differ on the test but have identical slopes relating the test and criterion


-example of homogeneity of regression across groups



Case 2


-there are different test scores, different slopes and different intercepts meaning different test validities from the two groups



Case 3


-similar slopes but minority receives higher criterion scores than majority



Case 4


-similar slopes but majority receives higher criterion scores than minority

Interpreting Case 1 and 2

The issue is Differential test valdidty for the two groups



In both cases the regression slope and intercept are examined



-the slope of the line (regressing weight) is the correlation between test score and criterion



-the correlation is the predictive validity coefficient



If the test has significantly different validity coefficients for one group compared to another then slope bias and Differential test validity is present

Common issue in case 2 Differential validity issues

There are often many more test takers in the majority than minority group



This means that the regression weight will be significant for the majority group but not the minority and because of sample size differences, between group comparisons will be significant



In such cases the test will show that the test is more suitable for the majority then the minority group (discriminates against minorty)

Hunter and Schmidt 1997

In a review of 866 black white predation comparisons



There was no evidence for the Hypotheses of Differential or single group validity with regard to the the prediction of job performance across race for whites and blacks

Great secret truths of differences

Large scale industrial samples, tests on armed services personnel, school division wide testing all typically fail to find significant differences in validity coefficients



Validity coefficients for nationally administered tests typically fail to find differences in validity coeffiecnts between rail groups



In terms of predictive validity, abilityit test are equally valid for minority and majority groups in predicting occupational and educational outcomes



When sample size and composition are comparable and the test and criterion are properly constructed, no slope biase is reporter

Why is there slope bias in well constructed tests

Performance on the test and on the criterion are influenced by a number of factors (language skills, age, motivation)



There factors can influence scores on the predictor or criterion



Making the test culturally appropriate does not address the underlying issue


-low scores need to be addressed directly

Intercept bias

Even though tests show comparable predictive validity across groups, intercept bias may still be present



Intercept bias is present if the test consistently over or under estimates performance on the criterion by one group compared to another

Case 3

Although the valdidty coefficient is the same for both groups, any score on the test (x) will lead to different criterion scores for the groups



Test scores have different predictive meaning for two the two groups



Selecting people on the basis of majority scores underpredicts minority group criterion performance



Case 3 is the situation concerning those who view tests as biased

Case 4

Use of majority test scores in predictive regression over predicts minority group performance



Discriminating in favor of them



-evidence on intercept bias indicated that on well constructed tests, there is no significant intercept bias


-or a slight tendency in the opposite direction

When does case 4 happen

Occurs when other variables are correlated with the test and the criterion



-reading ability, language proficiency, but primary test familiarity and test preparation

Current work on test bias

Over the last several decades emphasis has shifted from evaluation of test bias to the design of selection strategies for fair test usage with minority groups



Examples Cleary model, compensatory models



These selection models cause public uproar but what they do is assign different values to other facts to be considered when accepting minority group candidates

Cleary model

The use of regression weights to predict job success of other outcomes



Selection is based strictly on the test and criterion scores without regard for other goals in the selection process

Compensatory models

Select larger proportion of minority group members by lowering acceptable test scores or select applicants based on other criteria (proportion of minority applicants)

Selection models like the Cleary model are called

Expected utility models



-in such models clear statements of values are intended consequences of selection decisions that are made explicit



Issues such as providing equal opportunity, increasing demographic mix, preferential selection of people from historically disadvantaged groups are all part of the selection process

No agreed upon defintion for intelligence



But there are three broad conceptual ideas about intelligence which can be described

Psychometric tradition


-examines the structure of test items, dimensions underlying responses, and correlated of test responses



Information processes in approach


-examines the underlying encoding, processing and solving of various problems



Cogntive approach


-focuses in how people solve real world problems and adapt to real world demands

Development of the Binet scale

In the last 1800s current theory suggested a relationship between head size and school success (craniometry)



Benet failed to find any relationship



In 1899 Binet dropped craniometry as a measure of intelligence and began a search for other measures



Binet returned to measuring intelligence in 1904 at the same time Francis Galton published his work on intelligence

What did Galton believe

That intelligence could be assessed through physical measures


-grip strength, reaction time, keenness of vision, auditory acuity, and mental imagery



Binet wanted something more

Benit wanted a measure that reflected

What people do (not who they are)



A number or numbers that reflected whether questions were answered correctly



Answers to questions that indicated and underlying mental process

What did Benit ask children to do

Asked children to respond to asks that relect common experiences


-counting coins, giving and receiving instructions, making simple inferences, answering questions and solving problems



Tasks were presented by a trained tester



Items were graded in terms of difficulty and covered a wide ranger of problems

These tasks used by Benit were thought to underlie three processes that reflected intelligence

Comprehension


Invention


Correction

General mental ability

General mental ability meant that the reason why children who were correct in one question were also right on others is because intelligence is a General mental ability made up of several different processes (postive manifold)



He thought that performance on a wife range of varied tasks could reflect a measure of General mental ability

Original Benit scale

Original 1905 scale has 25 age graded items



The 1908 scale by Theodore Simon had 32 and an age criterion for each item



Start with the simplest items and progress untill the child continues to make mistakes

What was considered normal intelligence

The criterion benit and Simon adopted was that the age at which children could correctly answer a question 66%- 75% of the time



The age associated with the last correct answer became known as the child's mental age


-children whose intelligence level was less than 0 were identified for special education

Mental age

Benit and Simon defined a child intelligence level as mental age - chronological age

What does benit define by intelligence

Adopted a functional perspective where intelligence must be relected in behaviors thar are adaptive and goal directed



-Take and maintain a definite course of action (comprehensive)



-Capacity to change plans or method to attain a desired end (invention)



-Ability to see errors and correct them (correction)



Intelligent people use information more efficiently to meet their desired goals than do less Intelligent people

What did Stern 1912 argue

That mental age should be divided by chronological age to give an intelligence quotient



-division is more appropriate than subtraction because the relative not absolute difference between mental age and chronological age is important



Interest not becomes the rate of development relative to age

Terman 1916

Brought the binet-simon test to the US where it became the standford- Binet



Ushering in the modern age of intelligence testing

Spearman (physiological efficiency)

Benit saw intelligence relected in people's ability to solve problems that arise when attaining their desired goals



The problem of this idea was that some people had more functionally adaptive problem solving ability than others



Spearmans two factor theory of intelligence saw intelligence as a central general ability (g) and levels of specific abilities (s)


-sought to isolate intellectual power from knowledge content

How does Spearman see intelligence

Intelligence was less about goal directed activities and more about abstract reasoning


-our ability to perceive and apply relationships



Termed this abstract reasoning ability as g (general mental ability)



G is one of the best predictors of occupational and educational success

How is G measured

Analogy problems that require people to perceive relationships between problem components and apply those relationships to the problem



Analogy problems can be expressed verbally or in symbols, pictures, and geometic forms

Functional unity of intelligence

A physiological structure from which a mental energy or process flow



Spearman and others thought thar differences in intellectual functionial reflected a functional unity



One measure of inner unity is neural speed or processing speed


-idea is that the faster a person can process information the higher their level of intelligence

Speed of response as a measure of intelligence

With a speed measure it is critical to seperate speed from knowledge



It is processing power (g) that is being measured not expertise or past knowledge



To separate processing power from expertise you have to use tasks that are completely novel or tasks that are very familiar or easy

Response time from movement times

It is also essential in speed measures to separate response time (decision time) from movement time



Done by separating movement time from response time

Galton and reaction time

Reaction time as a measure of intell was suggested by galton before binet produced the frist intelligence test in 1905



Every methodological and statistical problem conceivable plagued Galtons attempts at measuring response time as a measure of human intelligence



Benits new intelligence test had obvious face validity for intelligence and Galtons idea of chronometry was easily overtaken and soon forgotten

Advantages of chronometry

One advantage lies in the scale of measurement



IQ tests results produce an Ordinal scale



Level of intelligence is always relative to the scores of others in norm group



Speed based measures produce an absolute ratio scale


-An response time of 30 msconds is twice as fast as 60msec no matter who takes the test

Advantages of chronometry

There is a theoretical advantage with chronometry



Binets approach to intelligence was entirely functional


-why a person scored the way they did was not Binets concern



Time based measures permit theoretical development


-why are some people faster than otherw



Fast response times reflect a speedy rate of oscillation in neural responsiveness and are therefore indicative of intelligence

Inspection time as a measure of physiological efficiency

Spearman thought that reaction time might reflect encoding sensitivity and retrieval speed



While informative reaction time data are difficult to interpret because it is difficult to know what processes are being assessed



Inspection time is an alternative measure of physicaologicl efficiency


-dependent variable is the correct answer not response speed

Inspection time accuracy

Accuracy is assessed against exposure time



The longer the exposure time the greater the accuracy but the lower the processing speed



For each person, exposure duration given 75% accuracy is recorded

What actually is inspection time

Since it is concerned with accuracy and exposure duration



It is refered to as the speed of taking in information and encoding sensitivity



Correlations between inspection time and tests loading on G are (-.50) to (.55)



As inspection time increase intelligence test scores decrease

Correlations between response time and test scores

0.2- 0.3

Does inspection time cause intelligence or vis vera

Several studies report that intelligence causes fast inspection time



Developmental studies suggest that inspection time at an early age are more closely related to IQ at a later age than visa versa



Inspection item is a measure of something occurring inside the head, it is not a construct, a process or explanation

What is inspection time taping into

While evidence is incomplete



Inspection time tape into the efficiency with which information is processed after it has been received

Fast inspection time and high IQ scores

Fast inspection time and high IQ scores occur in people whose evoked potential are maximal at 140-200ms after stimulus display

Cigarette smoking

Cigarette smoking is associated with fast inspection time and higher scores on the Ravens matrices



This happens beacue smoking stimulates the brains cholinergic processes underlining one mechanism that underlines intelligence

Cattells two favors theory of intelligence

Proposed that intelligence is composed of two components



Fluid and crystallized abilities



These abilities are called Gf and Gc

Gc crystallized abilities

Relect past lessons or will learned responses that have become crystallized (reading and driving)



Reflected in shared educational experiences and seen in tests of computational speed, word recognition, pattern matching, basic information, vocabulary

Gf fluid abilities

Label applied to an adaptive process of encoding and correctly processing unfamiliar configurations, rearranging those configurations to meet some requirement


(Ragens matrices or black design test)



Is spoken of in the singular but there are several components to this factor

Relationship between fluid and crystallized abilities

Crystallized develops out of fluid because when tasks are new there is no crystallized knowledge to use



When people of equal fluid differe in crystallized, the reason probably reflects educational experience, motivation or environmental factors



Although fluid is largely perceptual and verbal in nature, words (crystallized) are needed to formulate and check Hypotheses and answers



Crystallized abilities such as language and numerical skills are needed to express creative and novel ideas that.come from fluid abilities

What do fluid and crystallized reflect

Fluid is without content reflecting pure intelligence power.



Crystallized reflects book learning, experience, or knowledge and really isn't intelligence


-this is misleading and incorrect

How much fluid and crystallized abilities do we need

How much fluid ability a task requires reflects the culture and learning history



With enough practice or experience any new test or problem could become crystallized

Fluid and crystallized abilities in intelligence tests

Many intelligence tests reflect fluid and crystallized abilities



No matter how bright a person is, a poor reader will not do well on ability tests