• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/46

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

46 Cards in this Set

  • Front
  • Back

test bias

different groups can score differently on a test- is it because the test is bias, or are certain groups actually different?

BITCH and Chitling - alternative special test

while these tests can detect differences between groups, their psychometric properties have not been able to predict useful, real-world outcomes, such as job and academic performance.

intercept bias (scenario two)

when the slope of the regression line is the same for each group, but they intercept the vertical axis at different places. (the test is biased but correctable).

scenario one - group A do better than B on the selection test, this means that

either test is not biased, or the performance measure and the test score are equivalently biased.

scenario 3 (slope bias)

the slope of the regression lines for the groups are different - the test is differentially valid for the two groups. The scatterplot shows more variability for group 2- the test is biased and is not predicative of performance of them.

counter arguments to race discrimination and testing

1. pygmalion effect


2. minority group members still face many disadvantages, especially where group membership is superficially obvious (black children raised in white families still tend to perform worse at school)


3. construct of race is primarily social and has no biological meaning, especially in Aus and US.

Pygmalion experimental design: 1966

children given a non-verbal IQ test, teachers were told the child was either spurting or blooming. Teachers were given a list of the children who performed in 20% in this test. In reality children on this list were chosen at random, all children tested again at the end of one year.

pygmalion effect

school achievement is influenced by teacher expectation. For the earliest grades, students who were identified as the bloomers scored significantly higher at the end of the year.

Stereotype affects test performance GRA

Group 1 told test measured IQ, group 2 told test measured problem solving skills. African Americans did worse in group 1, but not in group 2 (where they did not differ).

Stereotype affects test performance

when asian women were primed about their racial identity before a math exam, they performed better than controls. When women primed about their gender before a math exam they did worse than controls.

disability discrimination act Australia

all tests must measure person for the requirements of the job, and not the person in the abstract. Tests should assess the suitability of an applicant for that specific position, based on selection criteria (information such as person's private life or personality should not be used when making decisions about applicant).

Australian Industrial Relations Commission vs. Coms21 1999

Coms21 hired recruitment consulting to make decisions about a structural re-organisation. Drake provided 5 individual personality profiles/ psychometric reports before firing the people.

fair work commission

handles workplace discrimination issues

law and psychological testing

rulings are inconsistent and commonplace,

test utility

practical usefulness of a test (usually in terms of financial cost/benefit) even if a test is high in validity and reliability, (a test does not have high utility if it costs 1,000,000 to administer) doesn't mean it is practical to use.

example of a test with poor reliability and validity but good test utility

when test score is less important, e.g., lie detectors, useful if participants think they work. But don't use tests low in reliability and validity to interpret a test score.

utility analysis (statistical decision analysis), helps answer two questions:

different techniques used to decide the usefulness of a test/s:


1. which test shall we use?


2. is adding a new test to an existing battery of tests worthwhile?

expectancy table, utility analysis

1. false positive: test incorrectly identifies person as good when they are not


2. false negative: test incorrectly identifies a person as being no good when they are good.


3. selection ratio: ratio between available job positions and number of applicants


4. cut off: minimum test score needed in the test to be hired



utility analysis: if selection test had low criterion validity:

test is no better than selection applicants at random

limitations to expectancy tables:

assumes a linear relationship between job performance and test score, and doesn't take into account other characteristics of the applicant: physical health, minority status

when to apply Signal detection theory

need to apply SDT whenever you have a task that involves discriminating between two stimuli (such as the recognition memory task)

four possible outcomes

correct hit


false positive


correct miss


false negative

sensitivity and response bias

ability to discriminate between stimuli, and criterion for saying yes - if you just looking at sensitivity then your results are confounded with response bias

how to disentangle sensitivity from response bias?

look at false positives as well as correct hits (i.e., hit rate - false positive rate)

d' (d prime)

1. The closer d' is to 1, the better the person is at discriminating between stimuli,


2. If it is 0 then the person is guessing


3. If it is negative, the person is recognising words they didn't see, and not recognising words they did see.



industrial inspection, response bias or sensitivity issue?

decrease in hit rate was due to response bias

what strategies have been put in place that detect people cheating on the hazard perception test by clicking all over the screen?

1. allowing the measurement of false positives


2. stricter definitions of what a hazard is


3. using a modified task: hazard change detection


4.

Item response theory

superior alternative to classical test theory, it's score is called Theta: a function of the examinee's response interacting with the characteristics of the items.

characteristic curve

a plot of ability (level on some trait) vs. the probability of getting a particular question right (item-difficulty index)

the three parameter model

1. item difficulty: level of ability needed to get the item right 50%


2. item discrimination: point where slope is steepest


3. level of guessing

advantages of IRT (4 advantages)

1. no need to calculate an overall total score on a test to estimate someone's ability level


2. people don't need to complete the entire test to get an idea of their level of ability


3. much better for computerised adaptive testing (no need to give people items that they will definitely get wrong)


4. can compare people even when they completed different parts of the same test



disadvantages of IRT

complex to understand


software is still in its infancy


requires large samples to get stable estimates


requires more assumptions than classical test theory

why is the accuracy of diagnostic tests often overestimated?

people forget to consider the base rate (likelihood of the disease occurring in the population)

why do many diagnostic tests adopt a liberal response bias?

to maximise the number of correct diagnoses, better to minimise false negatives even if that results in an increase of false positives. just because a test has a high number of correct diagnoses doesn't mean it's any good at diagnosing (have to consider false positives)

ROC curve (receiver operating characteristic, developed in WWII)

a plot of correct positive rate (sensitivity) versus false positive rate (1 - specificity).

the pass mark on the ROC curve that gives the most accurate classification is the:

point on curve when the sum of sensitivity and specificity is highest.

Thompson effect

a visual illusion: an image reduced in contrast looks like it is moving slower

cactaracts affect 50% of people over the age of 75, how much more likely are they to crash?

2.5 times more

method of constant stimuli

give people a bunch of trials and vary each scene in what you are testing (e.g., speed) show these scenes in a jumbled order and get people to make judgements on these scenes that come in pairs.

in psychometric functions what does a steep line tell us?

the better people can tell apart the speeds.

what does the point where the line crosses the .5 mark tell us?

tells us if there is any systematic speed bias

utility analysis, an example.

new test had higher validity (.76) than the old non-test procedures (0-.5) and saved the company around 6 million dollars a year in 1979.

what has signal detection theory analysis found in regards to jury decision making?

that the sort of instructions that are given to jurors regarding the definition of 'reasonable doubt' affects their response bias (willingness to convict) rather than their sensitivity (ability to distinguish guilty from innocent defendants).

what is the main point of item characteristic curves?

to see how each individual item interacts with level of ability

computerised adaptive testing

select items to administer based on what the participant previously got right.

what can we do about the Thompson effect in terms of driving safety?

1. remove cataracts faster


2. increase contrast in road environment


3. give drivers perceptual training to improve their ability to tell different speeds apart