Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Test construction EPPP

Test Construction Eppp

by leinani_t11, Jul. 2005

Subjects: construction eppp test

Favorite

Add to folder

Flag

Related Essays

Itp Research Paper
Reliability can be measured by test-retest reliability and Internal consistency reliability. Test-retest reliability is a measure of measurement error by rep...
Nt1310 Unit 4 Assignment Questions
As such, the item banks of standardized tests should be expanded by including performance tasks (e.g., short-answer, long-response, essay, and so on) in addi...
Gifted Student Assessments
They are objective- type assessments, performance assessments, and rating scales/interviews. With the objective- type assessments the article says, “When usi...
Reliability And Validity Of Assessment
Psychological assessment measurements are appraised based on two key measurement constructs: reliability and validity. Reliability of an assessment denotes h...
Basic Achievement Skills Inventory Essay
reasoning, and spatial visualization. This theory is related to both the Basic Achievement Skills Inventory and the Test of Academic Performance. The Basic A...
Personality Evaluation
43). Reliability indicates whether a test shows consistent results of the measured aspect; validity indicates whether a test is showing results linked to the...
Disadvantages Of Standardized Testing In The United States
The way that the United States tests it’s students has changed many times since the beginning of public schooling and education. Nowadays, America uses a met...
Informal Performance Assessment
The close similarity between the performance that is assessed and the performance of interest is the defining characteri...
Computation And Regression Analysis
On average the top 28% obtained on average of 9.1/10 on the quiz while the lowest 28% scored a 4.7/10 on the quiz. This dramatic difference is unacceptable ...
Woodcock Johnson III Diagnostic Reading Battery
The following subtests of the Woodcock-Johnson III Tests of Achievement (WJ III) are included in the

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/73

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

73 Cards in this Set

Front
Back

	1) what does "p" stand for 2) what is the formula for calculating "p" 3) what is the range of "p"? 4) What do larger/smaller values mean? 5) many tests retain items with moderate difficulty levels. What is "p" at this level?	1) item difficulty index 2) total # examinees passing item / total # examinees 3) 0 - 1.0 4) larger = easy item smaller = difficult item 5) p is close to .50
	when p is close to .50: 1) test score variability increases/decreases? 2) test reliability increases/decreases? 3) discrimination between examinees increases/decreases? 4) what is the distribution	1) increase 2) increase 3) increase 4) normal
	1) while most tests look for moderate difficulty levels (p), what type of test prefers retaining more difficult items 2) what value of p is usually optimal for these tests	1) true/false 2) .75 (whereas, most other tests look for moderate difficulty, p=.50)
	1) what is item discrimination 2) to measure item discrimation, you calculate __. What is the symbol for this index? 3) how is it calculated 4) what is the range of this index 5) an item with what index is generally considered acceptable	1) extent to which a test item discriminates BETWEEN examinees 2) item discrimination index (D) 3) D = U - L L = % of examinees in lower scoring group U = % of examinees in higher scoring group 4) -1.0 to +1.0 5) .35 or higher
	1) What is "D" 2) when D = +1.0, what does this mean 3) when D = -1.0, what does this mean	1) D = item discrimination index 2) all upper-scoring group got the item correct none of the lower-scoring group 3) all of the lower-scoring group got the item correct none of the upper-scoring group
	classical test theory vs. item response theory 1) which is sample invariant (same across different samples) 2) which is sample dependent (varies from sample to sample)	1) item response theory 2) classical test theory
	1) Item response theory involves deriving __ for EACH item 2) what does it show	1) item characteristic curve 2) level of ability AND probability of answering item correctly
	On an item characteristic curve 1) what is on the x axis? y axis? 2) how do you determine difficulty level 3) how do you determine the item's ability to discriminate btwn high and low achievers 4) how do you determine the probability of guessing correctly	1) x = ability level y = probability of correct response 2) look where on the curve, 50% got the correct response. Then look for corresponding ability level. 3) slope of the curve 4) y intercept (point at which curve intercepts the vertical axis) - proportion of people with low ability who answered the item correctly
	on an item characteristic curve, what does a steep slope indicate	the steeper the slope, the greater the item's ability to discriminate btwn high and low achievers
	when using an achievement test developed on the basis of item response theory, an examinee's test score will indicate __	ability level
	in classical test theory, an examinee's scores is composed of 2 components: __ & __	1) true score and error
	reliability provides an estimate of the proportion of variability in an examinee's score that is due to __	TRUE differences among examinees on attributes measured by test
	reliability coefficient 1) range 2) what does a low "r" mean? 3) what does a high "r" mean?	1) 0.0 to +1.0 2) r = 0 -> all variability in score is due to error 3) r = +1 -> all variability reflects true score variability (reliable)
	1) r with subscript "xx" stands for 2) r with subscript "xy" stands for	1) reliability coefficient 2) validity coefficient
	a reliability coefficient of .84 indicates that 1) __% of variability in scores is due to TRUE score differences 2) __% is due to error	1) 84% 2) 16%
	which method for estimating reliability is associated with: 1) degree of stability (consistency) 2) coefficient of equivalence 3) coefficient alpha	1) test-retest 2) alternate forms 3) internal consistency
	test-retest reliability 1) what is primary source of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute	1) time sampling factors 2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
	which method for estimating reliability can be used for the following: 1) aptitude 2) mood 3) speeded test	1) test-retest, alternate forms, internal consistency 2) internal consistency 3) alternate forms
	alternate forms reliability 1) 2 primary sources of measurement error 2) it is inappropriate for determining reliability of test measuring what type of attribute	1) content sampling and time sampling factors 2) attribute that is unstable over time, or is affected by repeated measurements (e.g., mood)
	internal consistency reliability 1) methods for evaluating 2) not appropriate for assessing reliability of what type of test? It will produce a coefficient that is too high/low?	1) split-half and coefficient alpha 2) speeded; too high
	1) split-half reliability assesses what type of reliability 2) it usually under/over? estimates a test's true reliability 3) how is this corrected	1) internal consistency 2) under 3) Spearman-Brown prohecy forumula
	As the length of a test decreses, the reliability decreases/increases?	decreases
	1) Cronbach's coefficient alpha assesses what type of reliability 2) it provides the lower/upper boundary of a test's reliability	1) internal consistency 2) lower
	1) KR-20 is used for what type of reliability? 2) it is a variation of what other method 3) how does it differ	1) internal consistency 2) coefficient alpha 3) KR-20 is used when items are scored dichotomously
	which method for evaluating internal consistency reliability is used when items are scored dichotomously	KR-20
	coefficient alpha 1) as the test content become more heterogeneous, coefficient alpha increases/decreases?	1) decreases
	what correlation coefficient is uses with inter-rater reliability	kappa statistic
	for inter-rater reliability, percent agreement will provide an over/under? estimate of the test's reliability	overestimate
	consensual observer drift will aritificially inflate/deflate inter-rater reliability	inflate
	1) what is the most thorough methond for estimating reliability 2) which method is NOT appropriate for speed tests	1) alternate forms 2) internal consistency
	what method is used to estimate the effects of lengthening and shortening a test on its reliability coefficient	Spearman-Brown
	Spearman-Brown tends to over/under? estimate a test's true reliability	overestimate
	when the range of scores is restricted, the reliability coefficient is high/low	low
	is the reliability coefficient high or low when: 1) item has low difficulty 2) item has average difficulty 3) item has high difficulty	1) high 2) low 3) low
	to maximize reliability coefficient 1) increase/decrease test length 2) increase/decrease range of scores 3) increase/decrease heterogeneity among examinees 4) increase/decrease the probabilty of guessing correctly 5) p should be close to __	1) increase 2) increase 3) increase 4) decrease 5) .50 (average item difficulty)
	a reliability coefficient of __ is considered acceptable	.80 or larger
	1) what is the standard error of measurement 2) what is the standard error of estimate	1) used to construct a confidence interval around a measured (obtained) score 2) used to construct a confidence interval around a predicted (estimated) crterion score
	what is the formula for 1) standard error of measurement 2) standard error of estimate	1) SD x square root of 1 minus reliability coefficient squared 2) SD x square root of 1 minus validity coefficient squared
	obtained test scores tend to be inaccurate estimates of true scores 1) scores ABOVE the mean tend to over/under?estimate true scores 2) scores BELOW the mean tend to over/under?estimate true scores	1) over 2) under
	when the standard error of measurement = __, an examinee's obtained scores can be interpreted as her true score	0
	which of the following would be most appropriate for estimating reliability for anxiety 1) test-retest 2) alternate forms 3) coefficient alpha	3
	what are the minimum and maximum values of the standard error of measurement	minimum = 0 maximum - SD of test scores
	how do you establish content validity	judgment of subject matter experts
	1) what are the types of construct validity 2) what methods are used to assess	1) convergent and disciminant 2) multitrait-multimethod matrix AND factor analysis
	multitrait-multimethod matrix 1) which coefficient provides evidence of convergent validity? Is the coefficient large/small? 2) which coefficient provides evidence of discriminant validity? Is the coefficient large/small?	1) large monotrait-heteromethod 2) small heterotrait-monomethod OR small heterotrait-heteromethod
	what does a factor analysis assess?	construct validity (convergent and discriminant)
	In a factor matrix, correlation coefficients (factor loadings) indicate the degree of association btwn __ and __	each test and each factor
	a test has a factor loading of .78 for Factor I. This means that __% of variability in the tests is accounted for by Factor I.	61% (.78 squared)
	what is communality	total amount of variability in test scores explained by identified factors
	Communality for a test is .64 This means that __% of variability in scores is explained by a combination of identified factors	64% NOT squared because it is already square. Communality IS the amount of shared variance.
	according to factor analysis, a test's reliability consists of what two components	communality and specificity communality = factors tests share in common specificity = factors specific to the test (not measured by other tests)
	a communality is a lower/upper? estimate of a test's reliablity coefficient	lower-limit
	if a test has a communality of .64, the reliability coefficient will necessarily be __	.64 or larger
	two types of rotations in a factor analysis 1) which one is associate with uncorrelated factors? 2) with correlated factors?	1) orthogonal 2) oblique
	when factors are orthogonal/oblique?, a test's communality can be calculated from factor loadings	orthogonal
	In factor analysis, when factors are orthogonal, how do you calcualte communality?	communality = SUM of squared factor loadings
	when a criterion-related validity coefficient is large, what does this indicate	predictor has criterion-related validity
	what are the forms of criterion-related validity	concurrent and predictive
	1) validity coefficients rarely exceed __ 2) validity coefficients as low as __ might be acceptable	1) .60 2) .20-.30
	how do you evaluate a predictor's incremental validity	scatterplot
	for a scatterplot used to assess incremental validity, what determines: 1) positive/negative 2) true/false	1) predictor 2) criterion
	how do you calculate incremental validity? How is each component calculated?	incremental validity = postive hit rate - base rate base rate = true positive + false negative divided by total # of people positive hit rate = true positives divided by total positives
	if incremental validity = .34 test can be expected to increase proportion of sucessful employees by __%	34
	relationship btwn predictor reliability and validity 1) what is the equation relationship btwn predictor AND criterion reliability and validity 2) what is the equation	1) predictor's criterion-related validity coefficent is less than or equal to (cannot exceed) the sqaure root of its reliability coefficient predictor's validity coefficient is less than or equal to (cannot exceed) the square root of the predictor's reliability coefficient TIMES the criterion's reliability coefficient
	If a predictor has a reliability coefficient of .81, it's validity coefficient will necessarily be __ (exact number)	.90 or less
	1) what is the correction for attenuation formula used for 2) does it tend to over/under?estimate the actual validity coefficient	to estimate what a predictor's validity coefficient WOULD be if the predictor and/or criterion were perfectly reliable (reliability coefficients = 1.0) 2) overestimate
	criterion contamination 1) tends to inflate the relationship btwn __ 2) results in an artificially high __ coefficient	1) predictor and criterion 2) criterion-related validity coefficient
	when cross-validating a predictor on another smaple, the cross-validation coefficient tends to __	shrink
	"shrinkage" refers to the the shrinking of __ when __	validity coefficient when the predictor is cross-validated
	norm-referenced vs. criterion-referenced Which are the following: 1) percentile ranks 2) percentages 3) regression equation 4) z-score 5) IQ score	1) norm 2) criterion 3) criterion 4) norm 5) norm
	distribution of percentile ranks has what kind of shape	flat (rectangular) regardless of the shape of the raw score distribution
	what is the tranformation called when: 1) distribution of transformed scores DIFFERS in shape from the distribution of raw scores 2) has SAME shape? 3) example of the first?	1) nonlinear transformation 2) linear transformation 3) percentile ranks
	when using correction for guessing, the resulting distribution will have (compared to original distribution) 1) lower/higher mean 2) smaller/larger SD	1) lower 2) larger

Share This Flashcard Set