Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Hint

Flashcards
»
Psychological Test Construction

Psychological Test Construction

by ekrantz11, Mar. 2006

Subjects: NIPG

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/67

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

67 Cards in this Set

Front
Back
3rd side (hint)

What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)	This describes the extent to which items contribute to the stated goals of testing	Think, what does it mean for something to be relevant?
The first dimension of Relevance is CONTENT APPROPRIATENESS	If content appropriate, the item assesses the behavior domain the test is intended to evaluate	Think, Appropriate Behavior
2nd dimension of Relevance: TAXONOMIC LEVEL	Does item reflect appropriate cognitive or ability level of population its intended for?	Think, "The 3 Bears" Not, too hot, not too cold, its "just right"
3rd dimension of Relevance: EXTRANEOUS ABILITIES	To what extent are knowledge or skills needed that is outside the domain being eval'd?	Think about all the other things/thought that interfere with getting your reports done!--Extraneous info. intrudes!
ITEM DIFFICULTY	It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)	When p= 0 noone got em right. When 1, all got correct.
ITEM DISCRIMINATION	Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable	Good section in the notes. D= + 1 is all in upper and none in lower grp get it right. D= - 1 If none in upper grp and all in lower grp get question right.
CLASSICAL TEST THEORY	Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
ITEM RESPONSE THEORY (IRT)	Tests based on examinees level on the trait being measured vs. total test score.	Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".
"Item Characteristic Curve".	Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability	Those at "0" are Low ability High ability are those above 0 and the steeper the slope the better discrimination
RELIABILITY	The ability of a measure to provide consistent, dependable results.	Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees obtained scores due to true differences among examinees on what's being measured
RELIABILITY COEFFICENT	Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
TEST-RETEST RELIABILITY	Administering the same test to same group on 2 diff. occasions.	Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)
ALTERNATE FORM RELIABILITY	2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.	Think, (Form A/Form B) Primary source of measurement error is content sampling. Hard to develop truly equiv. forms
INTERNAL CONSISTENCY RELIABILITY: 2 types: A. Split Half B. Coefficient Alpha	Admin test once to a single group. Coeff. of internal consistency is calculated
Split-Half	2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test	Even-Steven
What is RELEVANCE in Item Analysis? (This domain is related to the ethical use of tests.)	This describes the extent to which items contribute to the stated goals of testing	Think, what does it mean for something to be relevant?
The first dimension of Relevance is CONTENT APPROPRIATENESS	If content appropriate, the item assesses the behavior domain the test is intended to evaluate	Think, Appropriate Behavior
2nd dimension of Relevance: TAXONOMIC LEVEL	Does item reflect appropriate cognitive or ability level of population its intended for?	Think, "The 3 Bears" Not, too hot, not too cold, its "just right"
3rd dimension of Relevance: EXTRANEOUS ABILITIES	To what extent are knowledge or skills needed that is outside the domain being eval'd?	Think about all the other things/thought that interfere with getting your reports done!--Extraneous info. intrudes!
ITEM DIFFICULTY	It's the % of people who get an item correct. p= 1 means all answered correctly; p=0 means none did. SO,assigned p value with lower numbers = more difficult item. .50 items are typically retained to ensure a mod. difficult level except on true/false-(.75)	When p= 0 noone got em right. When 1, all got correct.
ITEM DISCRIMINATION	Extent an item differ-entiates between those who get a high vs. low score. D= H (highest scorers) minus L (lowest scorers).35 or > is acceptable	Good section in the notes. D= + 1 is all in upper and none in lower grp get it right. D= - 1 If none in upper grp and all in lower grp get question right.
CLASSICAL TEST THEORY	Obtained scores reflect Truth and Error; Item and test parameters are sample dependent. Issues considered: item difficulty, reliability, validity
ITEM RESPONSE THEORY (IRT)	Tests based on examinees level on the trait being measured vs. total test score.	Which test theory uses examinee's performance on prior items to determine the administration of subsequent items? Remember related concept "Item Characteristic Curve".
"Item Characteristic Curve".	Proportion of ppl who answered correctly against the total test score, or on an external criterion, or a derived estimate of ability	Those at "0" are Low ability High ability are those above 0 and the steeper the slope the better discrimination
RELIABILITY	The ability of a measure to provide consistent, dependable results.	Estimate of the proportion of variabiity in examninee's obtained scores due to true differences among examinees obtained scores due to true differences among examinees on what's being measured
RELIABILITY COEFFICENT	Proportion of variability in obtained test scores that reflects true score variablity. Reliability coeff. are never squared to interpret
TEST-RETEST RELIABILITY	Administering the same test to same group on 2 diff. occasions.	Appropriate method for determining reliability when attributes are relatively stable over time (e.g, Aptitude vs. emotion)
ALTERNATE FORM RELIABILITY	2 EQUIVALENT FORMS are ADMINISTERED.The consistency of responding to diff. versions of a test are admin at diff. times.	Think, (Form A/Form B) Primary source of measurement error is content sampling. Hard to develop truly equiv. forms
INTERNAL CONSISTENCY RELIABILITY: 2 types: A. Split Half B. Coefficient Alpha	Admin test once to a single group. Coeff. of internal consistency is calculated
Split-Half	2 scores are derived by splitting test into = halves, and are then correlated. Often uses odd-even# items; Often an underest. of true reliablity. Corrected by Spearman Brown Prophecy formula which provides est. of what reliability coeff. would have been if a full length test	Even-Steven
Cronbach's Coeff. Alpha	A Test is given once and a formula applied to determine reliability. The result--the avg. reliability obtained from all possible splits of the test.
Kuder-Richardson Formula	Variation of Cronbach's when items scored dichotomously (Yes/No)
Inter-Rater Reliability	Reliability determined % of agreement between 2 or more raters. Associate:Kappa Statistic
Standard Error of Measurement (SEM)	An index of the amount of error that can be expected in a person's obtained scores due to the unreliability of test. The greater the reliability, the smaller the SEM. Know the formula.	Example: The SEM around a WAIS Score of 125 is something like this: A 95% confidence interval is constructed around a score.Essentially, you are saying there is a 95% chance that the true score falls between Score X and Score Y. By adding and subtracting 2 standard errors from person's obtained score falls between these scores you derive this confidence interval.
Standard Error of Estimation(SEE)	Another form of SEM that takes into account regression to the mean. The SEE formula is used in the WAIS and WMS. Centers confidence interval on Estimated True vs. Observed score. Its a correction for tru-score regression toward the mean.
Validity	Test Accuracy—Does it measure what its intended to measure
Validity	Test Accuracy—Does it measure what its intended to measure
Content Validity	The extent a test adequately samples the content or behavior domain it is trying to measure	Think Content Validity when scores on the test are important because they provide information on how much the examinee knows about the content domain being tested (e.g., Achievement Tests).
What is the primary way that content validity is established?	Answer: The judgment of subject matter experts. If experts agree items are adequate and representative, then the test is said to have content validity
What qualitative evidence do you look for in a task that has good content validity?	1)coefficient of internal consistency will be large 2)The test will correlate highly with other tests of the same domain 3)pre-and posttest evaluations of the program designed to increase familiarity with domain will indicate appropriate changes
Construct validity	When the test has been found to measure the trait or hypothetical construct that it is intended to measure	Mnemonic strategy: We hope that good constructions will converge and diverge at the appropriate junctions. Otherwise the building will fall down
In order to establish construct validity, what must occur?	Answer: there needs to have been a systematic accumulation of evidence showing that the test actually measures the construct it was designed to measure--like intelligence, self-esteem
Convergent Validity	One method of evaluating a tests construct validity. One correlates test scores with scores on measures that do and don't purport to assess the same trait. High correlations with measures of the same trait provide evidence of Convergent Validity
Discriminant Validity	Another aspect of Construct Validity...when there are low correlations with measures of unrelated characteristics
Multitrait-Multimethod Matrix	A method of systematically organizing data to assess a test's convergent and discriminant validity. A matrix table is generated and comprised of correlation coefficients. Needs to be two or more traits that a each been assessed using two or	See the notes to understand the various kinds of monomethod coefficients. Hint: Mono means same and hetero means different. Helps to translate the names of the correlation coefficients contained in the matrix.
Factor Analysis	A statistical analysis conducted to identify minimum number of common factors required to account for intercorrelations among a set of tests, subtests, or test items.
What method could you use to see if there is good construct validity and associated good convergent and discriminant validity?	Factor Analysis	If you're interested in the steps of factor analysis, take a look at the handout.
Question: How do you determine the meaning of a factor loading and the amount of variability in test scores that is explained by the factor?	Square of the correlation coefficient obtained in the factor analysis.
The correlation between a test and a factor is referred to as a what?	A Factor Loading
Communality	Indicates the total amount of variability in test scores that is explained by the identified factors that have been correlated (e.g., Factor I and Factor II).
Specificity	The portion of true score variability that has not been explained by the factor analysis
Orthogonal Rotation	When a rotation is orthogonal, the resulting factors are uncorrelated. The attribute measured by one factor is independent from the attributes measured by the other factor.
Oblique Rotation	When the rotation is oblique, the resulting factors are correlated and the attributes measured by the factors are not independent.
Criterion-Related Validity	whenever test scores are utilized to draw conclusions about and examinees likely standing or performance on another measure this is especially important.	Associate predictive and concurrent validity with Criterion-related validity. There is a logical mnemonic-- Just think about that criteria are used to predict and criteria are used to concur.
Which type of validity is key in a situation where the goal of testing is to predict how well an applicant will do on a measure of job performance after they are hired?	Answer: Criterion-related validity when the resulting criterion-related validity coefficient is sufficiently large, this confirms at the predictor (or test) has criterion related validity
Concurrent Validity (associated with criterion related validity)	When criterion data is collected prior to or at the same time as predictor data	Hint: What type of validity should the Mental Status exam have?
Predictive Validity	Criterion data is collected after predictor data. Preferred when purpose of testing is to predict future performance on the criterion.
How do you interpret criterion-related validity coefficients?	You square the correlation coefficient to interpret it only when it represents the correlation between two different tests or other variables.
What is shared variability?	When the correlation between two measures is squared, it provides a measure of shared variability.	E.g., how much variability in X can be accounted for/is explained by Y... just remember to find out the answer, you need to square the correlation coefficient
Standard Error of Estimate(SEE)	Used to construct a confidence interval around day predicted or estimated criterion score. The magnitude of the SEE affected by the standard deviation of the criterion scores and the predictor's criterion related validity coefficient
Incremental Validity	The increase in correct decisions that can be expected if the predictor is used as a decision-making tool. Criterion and predictor cut off scores must be set.
True Positives	Predicted to succeed by the predictor and are successful on the criterion
False positives	Predicted to succeed by the predictor and are not successful on the criterion
True Negatives	Predicted to be unsuccessful by predictor and are unsuccessful on the criterion
False Negatives	Predicted to be unsuccessful by the predictor and are successful on the criterion
Base rate	The proportion of people who were selected without use of the predictor and who are currently considered successful on the criterion	Formula: True Positives + Fls Neg _____________________________ Total number of people
Positive Hit Rate	The True Positives divided by the Total Positives. The positives are people who are i.d. as having the disorder by the predictor. Negatives are people who are not i.d. as having the disorder by the predictor

Share This Flashcard Set

Set the Language

Psychological Test Construction

Add to Folders

Upgrade to Cram Premium

Card Range To Study

67 Cards in this Set