140 Cards in this Set
 Front
 Back
The earliest recorded use of procedures resembling psychological testing is:
A) United States, circa 1850 AD B) Rome, circa 200 BC C) China, circa 200 BC D) Incan Empire, circa 1400 AD 
China, circa 200 BC


The first intelligence test capable of measuring a general intelligence level was:
A) BinetSimon B) Hermann Ebbinghaus C) Emil Kraeplin D) WeberFechner 
BinetSimon


What is the definition of the term “battery?”
A) a group of items that pertains to a single variable, arranged in order of difficulty or intensity B) a group of several tests or subtests that are administered at one time to one person C) a tool designed to elicit information about a person’s motivations, preferences, attitudes, interests, and opinions D) the numerical system used to rate or to report value on some measured dimension 
a group of several tests or subtests that are administered at one time to one person


Who was the first creator of the laboratory dedicated to research of a purely psychological nature?
A) James McKeen Cattell B) Emil Kraeplin C) Sir Francis Galton D) Wilhelm Wundt 
Wilhelm Wundt


*According to Testing Standards, “the ultimate responsibility for appropriate test use and interpretation” is primarily assigned to:
A) test authors and developers B) test publishers C) test score interpreters D) test user 
test user


*The Woodworth Personal Data Sheet, the first personality test, was used to screen World War I recruits that might suffer from?
A) Dyslexia B) Attention Deficit Disorder C) Mental illness D) Fear of heights 
Mental illness


The 1905 _________ was a series of 30 tests or tasks, varied in content and difficulty, designed mostly to assess judgment and reasoning ability irrespective of school learning.
A) Ebbinghaus Completion Test B) StanfordBinet Intelligence Scale C) Scholastic Aptitude Test D) BinetSimon Scale 
BinetSimon Scale


The primary reason for test misuse lies in the insufficient
A) publication of tests B) knowledge of test users C) instruments available to test users D) use of the test manual by the examiner 
knowledge of test users


A psychological assessment differs from a psychological test in that
A) psychological testing is more complex B) psychological assessment is objective C) psychological assessment is longer and more unique D) psychological testing is very expensive 
psychological assessment is longer and more unique


A standardized or normative sample is:
A) a sample in which all the participants have at least one similar characteristic B) a sample taken after a test has been completed to analyze the results of the test C) a sample taken in order to gauge the performance of others who will later take the test D) a sample that produces results representing the normal curve 
a sample taken in order to gauge the performance of others who will later take the test


Which characteristic made the Minnesota Multiphasic Personality Inventory (MMPI) more successful than previous personality inventories?
A) many of its items had no obvious reference to psychopathological tendencies B) it included 116 statements to which the respondent answered simply “yes” or “no” C) it no longer used the empirical criterion keying technique D) it was more userfriendly 
many of its items had no obvious reference to psychopathological tendencies


*Standardization of psychological tests refers to measurement based on ________and ________.
A)normal curve & repetition of results B)normal curve & unbiased analysis C)normative samples & uniform procedure method D)normative samples & consistent subjects E)normal curve & absence of openended questions 
normative samples & uniform procedure method


*A technique, based on correlation, for reducing a large number of variables to a small set of factors is:
A) scaling B) kurtosis C) factor analysis D) sampling 
factor analysis


*The Rorschach Test is a type of:
A) personality inventory B) neuropsychological test C) thematic apperceptual technique D) projective technique 
projective technique


What came into being following the realization that intelligence is not a unitary concept and that human abilities comprise a broad range of independent factors?
A) scholastic aptitude tests B) neuropsychological tests C) multiple aptitude batteries D) personality inventories 
multiple aptitude batteries


Who was responsible for promoting the field of eugenics, discovering the phenomena of correlation and regression, and also pioneered the twin study method?
*A characteristic of the stanine scale is:
a. it is complex b. it is expensive c. it lacks precision d. it is not time efficient 
it lacks precision


Alternate forms, anchor tests, fixed reference groups and simultaneous norming are all types of:
a. nonlinear transformations b. equating procedures c. item response testing d. performance assessment 
equating procedures


Local norms are characterized by:
a. groups formed in terms of age, sex, ethnicity, or any other variable that may significantly impact test scores b. reference groups drawn from members of a specific, more narrowly defined population or institution c. groups of people who simply happen to be available at the time the test is being constructed d. subgroups formed after a test has been standardized and published, to expand the test’s applicability 
reference groups drawn from members of a specific, more narrowly defined population or institution


If a fifth grader scores at the eighth grade level in arithmetic, it means that:
a. the student’s score is significantly above the average for fifth graders in arithmetic b. the student has mastered eighth grade arithmetic c. the same as if a first grader scored at the fourth grade level, or if a ninth grader scored at the twelfth grade level d. the grade level standards are too low 
the student’s score is significantly above the average for fifth graders in arithmetic


Which of the following demonstrates the difference between percentiles and percentage scores?
a. percentiles reflect an individual’s number of correct responses, while percentage scores reflect the individual’s rank b. the frame of reference for percentiles is the content of the entire test, while the frame of reference for percentage scores is other people c. percentiles reflect an individual’s rank in reference to other people, while percentage scores reflect an individual’s performance in reference to the entire test d. percentiles represent qualitative data, while percentage scores represent quantitative data 
percentiles reflect an individual’s rank in reference to other people, while percentage scores reflect an individual’s performance in reference to the entire test


*Which statement clearly distinguishes between three terms that are often used interchangeably?
a. “reference group” identifies a more specific group of test subjects than a “standardization sample” or a “normative sample” b. “standardization sample” is the first group to receive the test, whereas the “normative sample” is any group from which norms are gathered c. “reference group” may be a “standardization sample” but cannot be a “normative sample” d. “reference group” may be a “normative sample” but cannot be a “standardized sample” 
“standardization sample” is the first group to receive the test, whereas the “normative sample” is any group from which norms are gathered


The changing of the reference group standards for the College Board’s SAT score scale is called:
a. anchor testing b. variating c. equivocating d. recentering 
recentering


*The higher level of performance typically seen in the normative groups of newer versions of general intelligence tests compared to their older counterparts is known as the:
a. deviation from the mean b. Flynn effect c. Standard deviation d. Correlational effect 
Flynn effect


*The foremost requirement of the normative sample is:
a. to be sufficiently large enough to ensure stability of variables b. to be recent c. to have the demographic makeup of the nation’s population d. to be representative of the individuals to be tested 
to be representative of the individuals to be tested


Behavioral sequences:
a. can be converted into nominal scales b. cannot be used normatively c. depend on an orderly progression from one state to another d. can only be based on chronological age 
depend on an orderly progression from one state to another


*Deviation IQ’s were first introduced in 1939 for use in the:
a. OtisLennon School Ability Test b. SAT c. WechslerBellevue d. Kaufmann Adolescent and Adult Intelligence Test 
WechslerBellevue


A ________ expresses the distance between a raw score and the mean of the reference group in terms of the standard deviation of the reference group.
a. tscore b. zscore c. percentile score d. gradeequivalent score 
zscore


*If a person scores lower than any of the people in the normative sample, the problem is one of:
a. insufficient test ceiling b. the test was too easy for the individual c. overly large normative sample d. insufficient test floor 
insufficient test floor


*Which is used when a score distribution approximates but does not quite match the normal distribution?
a. linear transformation b. nonlinear transformation c. normalized standard scores d. stanines 
normalized standard scores


If a test taker reaches the test ceiling on a test, then:
a. the test taker is labeled a genius b. the test taker must retake the test c. the test is insufficient d. the test was wrongly scored 
the test is insufficient


*The Gesell Developmental Schedules and the InfantToddler Developmental Assessment have this in common:
a. they were both developed by Arnold Gesell b. both were tested and edited at Yale c. they both use ordinal scaling d. they derived from Piaget’s stages of development 
they both use ordinal scaling


*Reliability of scores _________ as the error component __________.
a. decreases; remains constant b. remains constant; increases c. increases; decreases d. decreases; decreases 
increases; decreases


What two things does reliability in measurement imply?
a. consistency and precision b. consistency and relatedness c. precision and relatedness d. consistency and validity 
consistency and precision


*Evidence of score reliability is __________ validity.
a. unrelated to b. sufficient for c. necessary and sufficient for d. necessary but not sufficient for 
necessary but not sufficient for


*Traits are considered ________ characteristics, while states are referred to as ________.
a. stable; enduring b. temporary; static c. stable; temporary d. temporary; shortlived 
stable; temporary


The SpearmanBrown formula is related to the idea that:
a. a larger number of observations yields more reliable results b. reliable results do not rely on the number of observations c. a smaller set of observations is quicker to make d. true scores are derived from long tests 
a larger number of observations yields more reliable results


*The KR20 or alpha coefficients are good indicators of _________ in a test.
a. homogeneity b. spiralomnibus formats c. heterogeneity d. alternate forms 
homogeneity


True scores are:
a. equivalent to the test taker’s observed score b. the observed score subtracted from the raw score c. hypothetical scores that would result from errorfree measurement d. normative sample scores of the given distribution 
hypothetical scores that would result from errorfree measurement


The testretest reliability coefficient tells us:
a. extent to which scores will fluctuate as a result of time sampling error b. extent to which scores will fluctuate as a result of scorer reliability c. reliability of the interitem inconsistency and content heterogeneity combined d. reliability of the content heterogeneity by itself 
extent to which scores will fluctuate as a result of time sampling error


Low reliability estimates suggest that:
a. the test is too short b. the test is too long c. not enough data from the normative sample was analyzed d. the test is not very trustworthy 
the test is not very trustworthy


*Theoretically, if an individual took the same test an infinite number of times, his/her mean score would represent his/her:
a. true score b. reliability coefficient c. error component d. observed score 
true score


Reliability and error component are:
a. not related at all b. positively related c. inversely related d. negatively related 
inversely related


_________ is used in determining the consistency of mental tests, that is, the repeatability of their results. It evaluates sources of error and the sizes of those errors.
a. Measurement error b. The reliability coefficient c. The true score d. Heterogeneity 
The reliability coefficient


The phrase “all other things being equal” should serve to:
a. alert the reader to the possibility that several other things do need to be considered besides the specific concept in question b. show that all the aspects of a certain concept are in fact equal c. alert the reader that there may be items in a particular concept that are not equal d. show the reader that there is no need to consider other aspects related to a concept because the are all the same 
alert the reader to the possibility that several other things do need to be considered besides the specific concept in question


Three sources of measurement error with typical reliability include:
a. time sampling error, inconsistency, alternate form b. homogeneity, time sampling error, off balancing c. content sampling error, performance error, group diversity d. interscorer difficulties, time sampling error, content sampling error 
interscorer difficulties, time sampling error, content sampling error


If all the test score variance were true variance, score reliability would be:
a. 1.00 b. 1.00 c. 100 d. 100 
1.00


*What is one of the most frequently used formulas to calculate interitem consistency?
a. Cronbach’s Alpha b. Pearson R formula c. SpearmanBrown formula d. Standard Error of Measurement formula 
Cronbach’s Alpha


If an alternate form of a test is given shortly after taking the original form, there is likely to be:
a. heightened reliability b. testretest reliability c. significant practice effects d. the Flynn effect 
significant practice effects


*The standard error of measurement (SEM) of Test A is 3 and the SEM of Test B is 5. If you wanted to compare these scores, the standard error of the difference would be:
a. 34 b. √34 c. more than 34 d. √8 e. 15 
√34


Item 1: 2 x 8 = ____
Item 2: 5 x 6 = ____ Item 3: 4 x 10 = ____ This problem set can be described as: a. a low coefficient alpha b. a low interitem consistency c. very heterogeneous d. very homogeneous 
very homogeneous


The most appropriate measure used to estimate error for tests scored with a degree of subjectivity would be:
a. scorer reliability b. testretest reliability c. alternate form reliability d. delayed alternate form reliability 
scorer reliability


Which of the following allows for the evaluation of the interaction effects from different types of error sources?
a. heterogeneity theory b. internal conflict theory c. standard error theory d. generalizability theory 
generalizability theory


Which of the following is true of a reliability coefficient?
a. the higher the coefficient the better b. test users must have a coefficient of .85 or higher c. test users must have a coefficient of .65 or higher d. there is a minimum threshold for a reliability coefficient to be considered adequate for all purposes 
the higher the coefficient the better


Which of the following methods for estimating score reliability is prone to practice effects?
a. split half technique b. Cronbach’s alpha c. Alternate form reliability d. Interval methods 
Alternate form reliability


When a test is purposefully designed to include items that are diverse in terms of one or more dimensions, KR20 and coefficient alpha will __________.
a. underestimate content sampling error b. overestimate content sampling error c. round content sampling error to the nearest tenth d. most accurately calculate content sampling error 
overestimate content sampling error


In test score data obtained from a large sample under standardized conditions, measurement error:
a. is eliminated and is no longer an issue b. is assumed to be distributed at random c. is more likely to influence scores in a positive direction than a negative one d. is the same across samples, regardless of their composition and testing circumstances 
is assumed to be distributed at random


*If the class’s scores on a reading comprehension test varied due to individual familiarity of some passages, the most useful procedure for estimating this error would be:
a. scorer reliability b. testretest reliability c. alternate form reliability d. true score reliability 
alternate form reliability


Which of the following best demonstrates the benefits of delayed alternate form reliability?
a. it eliminates the confounding variable of practice effects that are problematic with coefficient alpha b. it yields the same results, regardless of the duration between the two test administrations c. it estimates the effect that lengthening or shortening a test will have on the obtained reliability coefficient d. it provides a good method for estimating time and content sampling error with a single coefficient 
it provides a good method for estimating time and content sampling error with a single coefficient


Which of the following is a true statement about the evaluation of reliability data?
a. small differences in the magnitude of coefficients of different tests are greatly significant b. reliability estimates above 0.50 suggest that the scores derived from a test are trustworthy c. estimates of error may of may not generalize to groups of test takers other than the original sample d. if a test is comprised of subtests, reliability estimates for the total test are the same as those of each subtest 
estimates of error may of may not generalize to groups of test takers other than the original sample


________ is a statistic that represents the standard deviation of the hypothetical distribution if a subject were to take a test an infinite number of times.
a. standard error of the mean b. standard error of measurement c. standard error of the variance d. standard error of the difference between two scores 
standard error of measurement


Score reliability is considered to be a necessary, but not significant, condition for:
a. validity b. accuracy c. recency d. significance 
validity


A major disadvantage of G theory is:
a. it is more comprehensive and thus less accurate b. it is overly used by the psychological testing population today c. it requires multiple observations from the same group d. it does not allow for the evaluation of interaction effects 
it requires multiple observations from the same group


_________ are hypotheticals and do not really exist.
a. True scores b. Observed scores c. Error score components d. Raw scores 
True scores


The IRT model emphasizes the ____________.
a. use for small scale testing b. use of the whole test for reliability of error measurement c. less precise responses by test takers d. use of the individual test items for reliability and error measurement 
use of the individual test items for reliability and error measurement


Which of the following formulas is based on the idea that larger samples yield more reliable results, and is applied to rhh to obtain an estimate for the full portion of a split half test?
a. Cronbach’s alpha b. KuderRichardson formula c. SpearmanBrown formula d. Standard error of measurement formula 
SpearmanBrown formula


Time sampling error is the most likely to occur when measuring:
a. verbal ability b. personality traits c. personality states d. psychological constructs related to ability 
personality states


Which is true about Generalizability Theory?
a. it is often applied to developing new instruments b. it requires multiple observations of the same group c. it is not often used because it does not evaluate interaction effects of different error sources d. it is used to measure test reliability by comparing several tests 
it requires multiple observations of the same group


If the scoring of a test involves subjective judgment:
a. an estimate of time sampling error is essential b. the availability of alternate forms of the test is necessary c. test selection decisions must be made on a case by case basis d. scorer reliability must be taken into account 
scorer reliability must be taken into account


*With item response theory methods, reliability and measurement error are approached from the point of view of:
a. information function of individual test items b. the test as a whole c. the trait assessed by the test d. the standard error of measurement formula that creates a confidence interval 
information function of individual test items


*Delayed alternate form reliability coefficients can be used to evaluate ______ and ______ reliability.
a. interitem; content b. content; time c. time; testretest d. interscorer; interitem 
content; time


To avoid administering the same test twice or developing alternate forms, ________ reliability can be used to test content consistency.
a. delayed alternate form b. interrater c. split half d. testretest 
split half


Statistically significant differences may not necessarily be:
a. reliable b. representative of true scores c. psychologically significant d. valid 
psychologically significant


*When comparing a computer adaptive test (CAT) with a traditional test using item response theory:
a. the CAT is less reliable than a traditional test b. the CAT can be shorter than the traditional test while still remaining reliable c. a traditional test is more reliable and therefore more valid d. a traditional test is more reliable because it eliminates the error involved with technology 
the CAT can be shorter than the traditional test while still remaining reliable


What is the major complaint about the WISCIII as a revision of the WISCR?
a. The changes were mostly cosmetic and did not reorganize the test theoretically. b. The changes departed too dramatically from the WISCR making the WISCIII nearly unrecognizable. c. The changes caused the reliability of the test to decrease. d. There were actually no complaints about the WISCIII. 
The changes were mostly cosmetic and did not reorganize the test theoretically.


Which of the following was not used to assess the reliability of the WIATII?
a. interrater reliability b. testretest reliability c. parallel forms reliability d. internal consistency reliability 
parallel forms reliability


*Small standard error of measurement for the mathematics subtests of the WIATII imply:
a. smaller confidence intervals and lower reliability b. smaller confidence intervals and higher reliability c. larger confidence intervals and lower reliability d. larger confidence intervals and higher reliability 
smaller confidence intervals and higher reliability


Luria’s main focus was to:
a. distinguish between the three blocks b. show the functions that can be divided into the three blocks c. show the integration and interdependence of the three blocks d. create a onetoone mapping of the brain 
show the integration and interdependence of the three blocks


How does the Luria model test children from different ethnicities?
a. Uses three blocks to map the brain b. Excludes tests of acquired knowledge c. Has many subtests d. Includes tests of acquired knowledge 
Excludes tests of acquired knowledge


*Which of the following KABCII scales would be used to test a four year old who is deaf?
a. Knowledge b. Planning c. Learning d. CHC 
Learning


*What two types of tests can’t have their reliability calculated by splithalf procedures?
a. spelling of sounds and punctuation b. punctuation and compatibility c. speeded tests and multiple point scored items d. language and math tests 
speeded tests and multiple point scored items


Which measure on the WJIII Tests of Achievement requires the examinee, within a threeminute period, to read and comprehend simple sentences and then decide if the answer is true or false?
a. Reading Comprehension b. LetterWord Identification c. Reading Fluency d. Passage Comprehension 
Reading Fluency


The WIATII test Math Reasoning evaluates the ability to
a. use nonverbal reasoning skills to solve abstract visual problems b. solve single and multistep math word problems c. solve problems involving basic operations d. complete a worksheetlike set of math problems of increasing difficulty 
solve single and multistep math word problems


*A distinctive feature of the StandfordBinet 5th edition is addition of
a. agegraded norms b. nonverbal routing test c. deviation IQ d. composite scores for each subtest 
nonverbal routing test


The first edition/model of the StanfordBinet to introduce the new form LM
was the a. SB 3rd Edition b. SB 4th Edition c. SB 5th Edition d. Revision of Terman’s Scale in 1937 
SB 3rd Edition


The SB5 is the first intellectual battery to
a. use a deviation IQ b. use a routing method c. cover 5 cognitive factors d. use the pointscale format 
cover 5 cognitive factors


*David Wechsler’s main focus in creating the Wechsler Bellevue Intelligence
Scale in 1939 was to a. provide an extensive psychological test for adults entering the military during WW II. b. design an instrument that would evaluate a single characteristic. c. go beyond global IQ scores to interpret more specific aspects of an individual’s cognitive capabilities through the analysis of subtest scaled scores. d. provide an alternative battery for testing children ages 218. 
go beyond global IQ scores to interpret more specific aspects of an
individual’s cognitive capabilities through the analysis of subtest scaled scores. 

Of the following, which is not one of the broad cognitive areas that tests and clusters of the WoodcockJohnsonIII Cognitive Tests are grouped into?
a. cognitive efficiency b. thinking ability c. written expression d. verbal ability 
written expression


Auditory Processing in the WoodcockJohnsonIII Cognitive Tests refers to:
a. measures of processing speed of auditory stimuli b. the ability to discriminate between similar sounding words c. the ability to analyze, synthesize, and discriminate auditory stimuli d. the ability to break a given whole word down into its phonemes 
the ability to analyze, synthesize, and discriminate auditory stimuli


Which of the following is not included in measures of auditory processing in the WoodcockJohnsonIII?
a. the ability to process distorted speech sounds b. the time it takes to translate individual phonemes into whole words c. phonetic coding d. sound blending 
the time it takes to translate individual phonemes into whole words


Which of the following would be used to determine the reliability of speeded tests and tests with multiplepoint scoring systems?
a. splithalf procedures b. SpearmanBrown formula c. Rasch analysis d. Standard error of measurement formula 
Rasch analysis


Mean score & SD for T

m=50
sd=10 

Mean score & SD for z

m=0
sd=1 

Mean score & SD for CEEB

m=500
sd=100 

Mean score & SD for IQ

m=100
sd=15 

Mean score & SD for sub

m=10
sd=3 

*Any errors that occur when measuring discrete variable are due to_____.
a. measurement error b. bias on the part of the administrator of test c. an incorrect sample size d. inaccurate counting 
inaccurate counting


*Which of the following is true of correlation?
a. sign of the coefficient indicates the degree of relationship between 2 variables b. high correlation implies a causal relationship between 2 variables c. high correlation allows us to make inferences about variables' shared variance d. the closer a correlation is to zero, the higher the degree of relationship between two variables 
high correlation allows us to make inferences about variables' shared variance


*A raw score by itself:
a. can convey meaning b. does not convey any meaning c. can be used to make inferences about a construct d. can give a percentile score 
does not convey any meaning


*If an 8th grader scores at the 6th grade level on a reading achievement test, what does this mean? The student:
a. scored lower on the test than everyone else in his/her 8th grade class b. scored within the same range as most of the 8th graders in his/her school c. got a score that matches the average performance of the 6th graders in the standardization sample d. got a score that matches the avg performance of the 6th grade population that was tested that year 
got a score that matches the average performance of the 6th graders in the standardization sample


*Michael's score on a test is 60. The standard error of measurement of the test is three points. From this information, one may conclude that chances are about
a. 1 in 2 that his true score is included by the range of scores from 54 to 66 b. 99 out of 100 that his true score is included by the range of scores from 54 to 66 c. 50 out of 100 that his true is either 59, 60, or 61 d. 5050 that the error is less than 5 points 
i dunno...guess


*In a normal distribution, a score in one standard devation above the mean. What is its appropriatw percentile rank?
a. 50 b. 75 c. 84 d. 95 e. 99 
84


*What percentage of zscores fall between 3.0 and 3.0 standard deviations?
a. 50% b. 75% c. 95% d. 99% 
99%


*The mean of a test is 39. You get a 44 and find it is equivalent to a T score of 65. What is the standard deviation of the test?
a. 2 b. 4 c. 6 d. 10 e. 15 
4


*Restrictions of range results in _____ correlation coefficients.
a. no effect on b. higher c. lower d. heigher weights for 
lower


*On the Weschler intelligence tests, at each age level, approximately 68% of those tested will have IQ scores between
a. 90 and 110 b. 85 and 115 c. 70 and 130 d. 55 and 145 
85 and 115
