Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
EPPP - Test Construction

Eppp - Test Construction

by Bronzey, Jun. 2014

Subjects: eppp

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/55

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

55 Cards in this Set

Front
Back

	relevance	the extent that a test item contributes to the test's goal based on qualitative judgement to include: -content appropriateness -taxonomic level (is it reflective of appropriate cognitive level) -extraneous abilities (does it require additional skills than domain of interest)
	item difficulty	p = total # passing item/total number of examinees .50 is ideal for multple choice .75 ideal for true/false tests
	item discrimination	extent to which a test item discriminates between high versus low scorers on the test or on the entire criterion D = (% upper group answered correctly) - (% lower group answered correctly) range: -1 to +1 D = .35 is acceptable D = .50 is maximum discrimination
	Item Response Theory	-item characteristics are considered to be the same across samples -possible to equate scores to different tests because performance is reported in terms of LATENT trait being measured (not just test score) -easier to develop computer-adaptive tests - ITEM CHARACTERISTIC CURVE is developed to determine relatinoship between examinee's ability and probability he/she will respond correctly. Slope indicates ability to discriminate between high and low achievers
	Classical Test Theory	obtained test score is sum of truth and error difficult to equate scores on different tests because scales are different
	measurement error	random error / unsystematic
	reliability	estimate of the proportion of variance in obtained scores accounted for by differences on attribute measured on test range: 0 to +1 do NOT square reliability coefficient - interpret directly
	test-retest reliability	coefficient of stability administer same test over time and compare scores. measure of stability source of error: factors that occured to time between tests designed to measure stable traits and not affected by repeated measures
	Alternate forms reliability	coefficient of equivalence 2 equal forms of a test are given to same group and scores are compared source of error: content sampling, error of interaction between differen examinees knowledge for stable traits and not affected by repeated measures most rigorous form of reliability, but difficult to develop
	internal consistency	split-half reliability: compare odds/evens or first/second half underestimate of reliability because length is shorter Use Spearman-Brown prophecy formula to estimate reliability of full-length test Cronbach's COEFFICIENT ALPHA is formula to determine inter-item consistency (i.e. Kuder Richardson can be used if item is scored right/wrong) source of error: content sampling, heterogeneous content domain decreases coefficient alpha
	relevance	the extent that a test item contributes to the test's goal based on qualitative judgement to include: -content appropriateness -taxonomic level (is it reflective of appropriate cognitive level) -extraneous abilities (does it require additional skills than domain of interest)
	item difficulty	p = total # passing item/total number of examinees .50 is ideal for multple choice .75 ideal for true/false tests
	item discrimination	extent to which a test item discriminates between high versus low scorers on the test or on the entire criterion D = (% upper group answered correctly) - (% lower group answered correctly) range: -1 to +1 D = .35 is acceptable D = .50 is maximum discrimination
	Item Response Theory	-item characteristics are considered to be the same across samples -possible to equate scores to different tests because performance is reported in terms of LATENT trait being measured (not just test score) -easier to develop computer-adaptive tests - ITEM CHARACTERISTIC CURVE is developed to determine relatinoship between examinee's ability and probability he/she will respond correctly. Slope indicates ability to discriminate between high and low achievers
	Classical Test Theory	obtained test score is sum of truth and error difficult to equate scores on different tests because scales are different
	measurement error	random error / unsystematic
	reliability	estimate of the proportion of variance in obtained scores accounted for by differences on attribute measured on test range: 0 to +1 do NOT square reliability coefficient - interpret directly
	test-retest reliability	coefficient of stability administer same test over time and compare scores. measure of stability source of error: factors that occured to time between tests designed to measure stable traits and not affected by repeated measures
	Alternate forms reliability	coefficient of equivalence 2 equal forms of a test are given to same group and scores are compared source of error: content sampling, error of interaction between differen examinees knowledge for stable traits and not affected by repeated measures most rigorous form of reliability, but difficult to develop
	internal consistency	SPLIT-HALF reliability: compare odds/evens or first/second half underestimate of reliability because length is shorter Use SPEARMAN BROWN prophecy formula to estimate reliability of full-length test Cronbach's COEFFICIENT ALPHA is formula to determine inter-item consistency (i.e. Kuder Richardson can be used if item is scored right/wrong) source of error: content sampling, heterogeneous content domain decreases coefficient alpha
	Inter-rater reliability	kappa statistic or percent agreement kappa statistic accounts for percent of agreement that occurs by chance source of error: rater bias, rater lack of motivation, non-exhaustive categories or not mutually exclusive categories, consensual observer drift -- provide training & emphasize difference between observation and interpretation
	Factors that affect reliability	- test length: the longer the test the higher the reliability (Spearman-Brown can be used to estimate reliability for a given number of items, but tends to OVER estimate -Range of test scores: best when unrestricted, heterogeneous examinees and item difficulty around .5 (or.75) -Guessing: as probability of guessing correct answer increases, reliability decreases
	Standard error of measurement	SEmes = SD (square root of 1-Rxx) provides information to make confidence interval (95% = add & substract 2 SEmeas from score, 99% = add & subtract 3 SEmeas from score) the lower the SD and higher the reliability the smaller teh SEmeas
	Content validity	when test is used to obtain information about examinees familiarity with content/behavior domain -associated with achievement tests -relies on the judgment of subject matter experts to determine if valid
	Construct Validity	When test is used to measure a construct (hypothetical trait) -intelligence, mechanical aptitude, self-esteem, neuroticism -no single way to test, use multiple: 1) assess internal consistency to ensure only 1 construct is being measured 2) study group differences. Does the test differentiate between people who are known to differ on construct 3) test to see if scores change following manipulation of construct (i.e. treatment, education) 4) Assess convergent and discriminant validity 5) Assess factorial validity * most theory laden of validation tests *
	Multitrait - Multimethod Matrix	monotrait-monomethod: correlation between the measure and itself, useful to konw if the matrix is useful (should be high) Monotrait-heteromethod: if high, shows convergent validity because high correlation between same trait on different measures heterotrait-monomethod: if low, shows discriminant/divergent validity because trait should not correlate with a different trait being measured heterotrait-heteromethod: should be low to show discriminant/divergent validity
	Factor Analysis	Conducted to determine minimum number of factors that accoutn for intercorrelations among variables assess construct validity (assess convergent or divergent validity) 1) administer target test along with other tests (of different constructs) to a group 2) correlate scores on each test with scores on every other test (R) - the pattern of correlations indicates how factors are extracted 3) Covert correlation matrix to factor matrix to develop "factor loadings" 4) simplify by "rotating" for ease of interpretation 5) interpret and name factors
	Factor Loadings	correlation coefficients that indicate the degree of association between each test and each factor SQUARE THIS CORRELATION COEFFICIENT TO GET VARIABILITY OF THAT FACTOR ACCOUNTED FOR BY THAT VARIABLE
	Communality	the common variance (shared variability) of the tests scores that is accounted for by the factors. can only be calculated WHEN ORTHOGONAL DO NOT SQUARE - INTERPRET DIRECTLY
	orthogonal	refers to rotation of a factor analysis UNCORELLATED FACTORS allows you to calculate communality
	oblique	refers to rotation of a factor analysis CORRELATED FACTORS
	criterion-related validity	when test scores are used to draw conclusions or predict standing/performance on another measure the test is the predictor and the other measure (preformance) is criterion
	concurrent vs. predictive validity	both are forms of criterion-related validity CONCURRENT: criterion data collected around the same time as predictor data (good if goal is to predict current status) PREDICTIVE: when criterion is measured some time after predictor (preferred if goal is to predict future performance)
	criterion-related validity coefficient	rarely exceeds .60 as low as .2 to .3 can be acceptable square to determine shared variability
	standard error of estimate	similar to SEmeas, except helps to determine confidence interval for PREDICTED CRITERION score (not obtained score, like SEmeas does) SEest = SD of criterion (square root of 1 - validity coefficity rxy) same as SEmeas, +/- 2 SEest is 95% confidence interval and +/- 3 SEest is 99% confidence interval the smaller the standard deviation and larger the validity coefficient the smaller the SEest
	incremental validity	increase in correct decisions that can be expected if the predictor is used as a tool 1) construct a scatterplot 2) set cutoff scores for predictor and criterion
	True Positive	predicted to be successful and meet cutoff for criterion (are successful in reality)
	False Positive	predicted to be successful but do not meet criterion cutoff (i.e. not successful in reality)
	True Negative	predicted to be unsuccessful and do not meet criterion cutoff (are not successfull in reality)
	False Negative	predicted to be unsuccessful but meet criterion cutoff (predicted unsuccessful but are succesful in reality)
	invremental validity formula	positive hit rate - base rate positive hit rate= true positives divided by all positives base rate = (true positives +false negatives) divided by total number of people - base rate is all people who were selected without predictor and are successful what is considered valid or invalid is up to judgement (there is no standard)
	sensitivity	correct identification of TRUE POSITIVES
	specificity	the correct identification of TRUE NEGATIVES
	positive predictive value	percent of validation sample that were accurately identified by predictor as having disorder
	negative predictive value	percent of cases in validation sample who were accurately identified as not having disorder
	relationship between reliability and validity	a test's validity is always limited by its reliability a predictor's validity is less than the square root of the reliability...to increase validity coefficient, you must increase predictor and criterion reliability
	Correction for attenuation	used to estimate the validity if the reliability was 1.0 you need: 1) preditor's current reliability 2) criterion's curren reliability 3) criterion-related validity coefficient tends to overestimate actual validity that can be achieved
	Criterion contamination	tends to inflate relationship between predictor and criterion avoid by: raters are not familiar with predictor scores
	cross-validation	"try out" your predictor with a different group. You will get "shrinkage" of the validity coefficient due to the different sample
	Norm-Referenced Interpretation	compare test score to normative sample raw score converted to another score ex: percentile ranks, standard scores (-): relies on the extent to which an examinee matches the normed sample
	Percentile Ranks	express raw score in terms of percent of normative sample who achieved lower scores (+): easy to interpret distribution is always flat - nonlinear transformation (-): Ordinal scale of measurement, do not provide absolute differences between examinees, transforming scores to percentiles maximizes differences in the middle range and minimizes difference in the extreme ranges
	Standard Scores	norm-referenced interpretation (+) permit comparisons of scores obtained on different tests
	z-score	z = (Score - Mean)/SD Mean: 0 Standard deviation = 1 all scores below the mean are (-) all above the mean are (+) unless "normalized" will retain shape of original scores (i.e. nonlinear transformation)
	percentage score	Criterion-referenced interpretation indicates the precentage of the test that the examinee answered correctly usually a cutoff score is set often used for mastery evaluation when all examinees must meet certain performance also can interpret score on predictor with "likely scores on external criterion" - create a regression equation
	Correction for guessing	involve calculating number correct, number incorrect and number of alternatives for each question if the correction involves subtracting points from examinees scores, the distribution will have a lower mean and larger SD than original distribution

Share This Flashcard Set

Set the Language

Related Flashcards

Eppp - Test Construction

Add to Folders

Upgrade to Cram Premium

Card Range To Study

55 Cards in this Set