Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

24 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
Reliability Coefficient
Measure of how much obtained score is true ability
-Interpret directly (70% means 70% is true ability, 30% error)
-A good test should have at least 0.7 or higher
A measure of how much score is actual.....
Classical Test Theory
Results are:
1. True Score (Ability)
-true variance
2. Some Error (Fatigue...)
-error variance
Two parts...
-Establish reliability first (Test can be reliable but not valid.)
Do you get the same results each time?
Does test measure what it is supposed to?
Validity can not exceed...
the square root of reliablity
Types of Reliability
1. Test-retest reliability (Coefficient of stability)
2. Alternate Forms (Considered the best but least used)
3. Internal Consistency (Compares test against itself)
Types of Internal Consistency Reliablity
1. Split-Half (split test, problem is restricted range)
-can use Spearman-Brown Prophecy Formula to make it like 2 tests
2. Inter-Item Consistency (compare items on one test one against the other in a systematic way)
-can use Cronbach's Alpha (compare items on test individually against all others systematically) or Kuder Richardson Formula 20 (special version of Cronbach, use when you have true/false or yes/no dichotomous test items)
Kappa Coefficient
Inter rater reliability
Standard Error of Measurement
-Based on reliability coefficient
-Try to get an idea of what a person's true ability is
-Based on a person's single score but has properties of a normal curve
-the more reliable the test, the less the SE of measurement
based on another coefficient,<br />
testing issue
Standard Error of Mean
-How will sample represent population?
It is best to have ___________ items and _____________ test takers for a test to be most reliable.
Content Validity
Based on expert judgement
-academic tests
Ex: the EPPP
Criterion-Related Validity
-look at relationship between predictor and outcome
-used most often in personnel psych (predicting job performance, etc)
-two types are predictive validity (who will become schizophrenic?, predicts future behavior) and concurrent validity (who is schizophrenic now?, test results NOW)
-two types
Construct Validity
Can not directly define
-Two types are convergent (compare new test with established test that measures same construct) and divergent (discriminant validity - you want your test to have nothing in common with another test of a different construct)
Can not define
-two types
Multitrait-Multimethod Matrix
If it's a single trait, will establish convergent validity - need at HIGH monotrait number to establish convergent validity
-If it's a heterogeneous trait, will need a low trait number to establish divergent validity
single or heterogeneous traits/trait numbers/convergent or divergent validity
Face Validity
Does the test make sense to the people who are taking it?
Just what it says :)
Cross Validation
Give test instrument again and again
-Shrinkage may occur (range of scores will shrink slightly when you initially cross validate instruments)
Criterion-related Validity
-job performance in the future
Incremental Validity
Can we increase that number of correct decisions we are already making?
Three things to establish Incremental Validity
1. base rate - moderate (number of decisions you are already making correctly)
2. selection ratio - need low selection ratio (number of jobs available to number of applicants)
3. validity coefficient - high validity on predictor and criterion
Criterion-Referenced Scores
-Do not compare score to anyone else, just meeting a standard
Ex: Driver's license test
Norm-Referenced Scores
-Score is compared to other individuals
-Two types: percentile ranks (not used as much now) and standard scores (transformed scores that allow you to compare)
Exs: Grade Equivalents, z-scores
Floor Effect
-bunch of test takers at bottom of test range
-need to have enough easy items
Ceiling Effect
-need to have enough difficult items to discriminate between best test takeers