Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle
Toggle OnToggle Off
 Alphabetize
Toggle OnToggle Off
 Front First
Toggle OnToggle Off
 Both Sides
Toggle OnToggle Off
 Read
Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
67 Cards in this Set
 Front
 Back
 3rd side (hint)
What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)

This describes the extent to which items contribute to the stated goals of testing

Think, what does it mean for something to be relevant?


The first dimension of Relevance is
CONTENT APPROPRIATENESS 
If content appropriate, the item assesses the behavior domain the test is intended to evaluate

Think, Appropriate Behavior


2nd dimension of Relevance: TAXONOMIC LEVEL

Does item reflect appropriate cognitive or ability level of population its intended for?

Think, "The 3 Bears" Not, too hot, not too cold, its "just right"


3rd dimension of Relevance: EXTRANEOUS ABILITIES

To what extent are knowledge or skills needed that is outside the domain being eval'd?

Think about all the other things/thought that interfere with getting your reports done!Extraneous info. intrudes!


ITEM DIFFICULTY

It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false(.75)

When p= 0 noone got em right. When 1, all got correct.


ITEM DISCRIMINATION

Extent an item differentiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable

Good section in the notes.
D= + 1 is all in upper and none in lower grp get it right. D=  1 If none in upper grp and all in lower grp get question right. 

CLASSICAL TEST THEORY

Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity



ITEM RESPONSE THEORY (IRT)

Tests based on examinees level on the trait being measured vs. total test score.

Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".


"Item Characteristic Curve".

Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability

Those at "0" are Low ability
High ability are those above 0 and the steeper the slope the better discrimination 

RELIABILITY

The ability of a measure to provide consistent, dependable results.

Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees
obtained scores due to true differences among examinees on what's being measured 

RELIABILITY COEFFICENT

Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret



TESTRETEST RELIABILITY

Administering the same test to same group on 2 diff. occasions.

Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)


ALTERNATE FORM RELIABILITY

2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.

Think, (Form A/Form B)
Primary source of measurement error is content sampling. Hard to develop truly equiv. forms 

INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half B. Coefficient Alpha 
Admin test once to a single group. Coeff. of internal consistency is calculated



SplitHalf

2 scores are derived by splitting test into = halves, and are then correlated. Often uses oddeven# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test

EvenSteven


What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)

This describes the extent to which items contribute to the stated goals of testing

Think, what does it mean for something to be relevant?


The first dimension of Relevance is
CONTENT APPROPRIATENESS 
If content appropriate, the item assesses the behavior domain the test is intended to evaluate

Think, Appropriate Behavior


2nd dimension of Relevance: TAXONOMIC LEVEL

Does item reflect appropriate cognitive or ability level of population its intended for?

Think, "The 3 Bears" Not, too hot, not too cold, its "just right"


3rd dimension of Relevance: EXTRANEOUS ABILITIES

To what extent are knowledge or skills needed that is outside the domain being eval'd?

Think about all the other things/thought that interfere with getting your reports done!Extraneous info. intrudes!


ITEM DIFFICULTY

It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false(.75)

When p= 0 noone got em right. When 1, all got correct.


ITEM DISCRIMINATION

Extent an item differentiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable

Good section in the notes.
D= + 1 is all in upper and none in lower grp get it right. D=  1 If none in upper grp and all in lower grp get question right. 

CLASSICAL TEST THEORY

Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity



ITEM RESPONSE THEORY (IRT)

Tests based on examinees level on the trait being measured vs. total test score.

Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".


"Item Characteristic Curve".

Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability

Those at "0" are Low ability
High ability are those above 0 and the steeper the slope the better discrimination 

RELIABILITY

The ability of a measure to provide consistent, dependable results.

Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees
obtained scores due to true differences among examinees on what's being measured 

RELIABILITY COEFFICENT

Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret



TESTRETEST RELIABILITY

Administering the same test to same group on 2 diff. occasions.

Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)


ALTERNATE FORM RELIABILITY

2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.

Think, (Form A/Form B)
Primary source of measurement error is content sampling. Hard to develop truly equiv. forms 

INTERNAL CONSISTENCY RELIABILITY: 2 types:
A. Split Half B. Coefficient Alpha 
Admin test once to a single group. Coeff. of internal consistency is calculated



SplitHalf

2 scores are derived by splitting test into = halves, and are then correlated. Often uses oddeven# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test

EvenSteven


Cronbach's Coeff. Alpha

A Test is given once and a formula applied to determine reliability. The resultthe avg. reliability obtained from all possible splits of the test.



KuderRichardson Formula

Variation of Cronbach's when items scored dichotomously (Yes/No)



InterRater Reliability

Reliability determined % of agreement between 2 or more raters. Associate:Kappa Statistic



Standard Error of Measurement
(SEM) 
An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of test. The greater the reliability, the smaller the SEM. Know the formula.

Example: The SEM around a WAIS Score of 125 is something like this: A 95% confidence interval is constructed around a score.Essentially, you are saying there is a 95% chance that the true score falls between Score X and Score Y. By adding and subtracting 2 standard errors from person's obtained score falls between these scores you derive this confidence interval.


Standard Error of Estimation(SEE)

Another form of SEM that takes into account regression to the mean. The SEE formula is used in the WAIS and WMS. Centers confidence interval on Estimated True vs. Observed score. Its a correction for truscore regression toward the mean.



Validity

Test Accuracy—Does it measure what its intended to measure



Validity

Test Accuracy—Does it measure what its intended to measure



Content Validity

The extent a test adequately samples the content or behavior domain it is trying to measure

Think Content Validity when scores on the test are important because they provide information on how much the examinee knows about the content domain being tested (e.g., Achievement Tests).


What is the primary way that content validity is established?

Answer: The judgment of subject matter experts. If experts agree items are adequate and representative, then the test is said to have content validity



What qualitative evidence do you look for in a task that has good content validity?

1)coefficient of internal consistency will be large
2)The test will correlate highly with other tests of the same domain 3)preand posttest evaluations of the program designed to increase familiarity with domain will indicate appropriate changes 


Construct validity

When the test has been found to measure the trait or hypothetical construct that it is intended to measure

Mnemonic strategy: We hope that good constructions will converge and diverge at the appropriate junctions. Otherwise the building will fall down


In order to establish construct validity, what must occur?

Answer: there needs to have been a systematic accumulation of evidence showing that the test actually measures the construct it was designed to measurelike intelligence, selfesteem



Convergent Validity

One method of evaluating a tests construct validity. One correlates test scores with scores on measures that do and don't purport to assess the same trait. High correlations with measures of the same trait provide evidence of Convergent Validity



Discriminant Validity

Another aspect of Construct Validity...when there are low correlations with measures of unrelated characteristics



MultitraitMultimethod Matrix

A method of systematically organizing data to assess a test's convergent and discriminant validity. A matrix table is generated and comprised of correlation coefficients. Needs to be two or more traits that a each been assessed using two or

See the notes to understand the various kinds of monomethod coefficients.
Hint: Mono means same and hetero means different. Helps to translate the names of the correlation coefficients contained in the matrix. 

Factor Analysis

A statistical analysis conducted to identify minimum number of common factors required to account for intercorrelations among a set of tests, subtests, or test items.



What method could you use to see if there is good construct validity and associated good convergent and discriminant validity?

Factor Analysis

If you're interested in the steps of factor analysis, take a look at the handout.


Question: How do you determine the meaning of a factor loading and the amount of variability in test scores that is explained by the factor?

Square of the correlation coefficient obtained in the factor analysis.



The correlation between a test and a factor is referred to as a what?

A Factor Loading



Communality

Indicates the total amount of variability in test scores that is explained by the identified factors that have been correlated
(e.g., Factor I and Factor II). 


Specificity

The portion of true score variability that has not been explained by the factor analysis



Orthogonal Rotation

When a rotation is orthogonal, the resulting factors are uncorrelated. The attribute measured by one factor is independent from the attributes measured by the other factor.



Oblique Rotation

When the rotation is oblique, the resulting factors are correlated and the attributes measured by the factors are not independent.



CriterionRelated Validity

whenever test scores are utilized to draw conclusions about and examinees likely standing or performance on another measure this is especially important.

Associate predictive and concurrent validity with Criterionrelated validity. There is a logical mnemonic Just think about that criteria are used to predict and criteria are used to concur.


Which type of validity is key in a situation where the goal of testing is to predict how well an applicant will do on a measure of job performance after they are hired?

Answer: Criterionrelated validity when the resulting criterionrelated validity coefficient is sufficiently large, this confirms at the predictor (or test) has criterion related validity



Concurrent Validity
(associated with criterion related validity) 
When criterion data is collected prior to or at the same time as predictor data

Hint: What type of validity should the Mental Status exam have?


Predictive Validity

Criterion data is collected after predictor data. Preferred when purpose of testing is to predict future performance on the criterion.



How do you interpret criterionrelated validity coefficients?

You square the correlation coefficient to interpret it only when it represents the correlation between two different tests or other variables.



What is shared variability?

When the correlation between two measures is squared, it provides a measure of shared variability.

E.g., how much variability in X can be accounted for/is explained by Y... just remember to find out the answer, you need to square the correlation coefficient


Standard Error of Estimate(SEE)

Used to construct a confidence interval around day predicted or estimated criterion score. The magnitude of the SEE affected by the standard deviation of the criterion scores and the predictor's criterion related validity coefficient



Incremental Validity

The increase in correct decisions that can be expected if the predictor is used as a decisionmaking tool. Criterion and predictor cut off scores must be set.



True Positives

Predicted to succeed by the predictor and are successful on the criterion



False positives

Predicted to succeed by the predictor and are not successful on the criterion



True Negatives

Predicted to be unsuccessful by predictor and are unsuccessful on the criterion



False Negatives

Predicted to be unsuccessful by the predictor and are successful on the criterion



Base rate

The proportion of people who were selected without use of the predictor and who are currently considered successful on the criterion

Formula:
True Positives + Fls Neg _____________________________ Total number of people 

Positive Hit Rate

The True Positives divided by the Total Positives. The positives are people who are i.d. as having the disorder by the predictor. Negatives are people who are not i.d. as having the disorder by the predictor

