Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

16 Cards in this Set

  • Front
  • Back
In factor analysis,‭ ‬a‭ ‬factor loading indicates the correlation between:
a test and an identified factor

A factor loading provides information about a test’s factorial validity.

a. CORRECT In factor analysis, a factor loading is the correlation coefficient that indicates the correlation between a test and an identified factor.
Assuming no constraints in terms of time,‭ ‬money,‭ ‬or other resources,‭ ‬the best‭ (‬most thorough‭) ‬way to demonstrate that a test has adequate reliability is by using which of the following techniques‭?
equivalent forms

The most thorough method for assessing reliability is the one that takes into account the greatest number of potential sources of measurement error.

Because equivalent (alternate) forms reliability takes into account error due to both time and content sampling, it is the most thorough method for establishing reliability and, consequently, is considered by some experts to be the best method.
To maximize the inter-rater reliability of a behavioral observation scale,‭ ‬you should make sure that coding categories:
are mutually exclusive

When a person's behavior is to be observed and recorded, that behavior must be operationalized in order for the observations to be meaningful. For example, a psychologist interested in obtaining data about aggressiveness in children might record data using categories such as “hits others” or “destroys property.”

a. CORRECT Coding categories must be discrete and mutually exclusive. For example, if the behavioral categories for aggressiveness were “aggressive acts” and “emotional displays,” the same behavior might be recorded twice, and an unreliable picture of a child's behavior would be obtained.
To maximize the ability of a test to discriminate among test takers,‭ ‬a test developer will want to include test items that vary in terms of difficulty.‭ ‬If the test developer wants to add more difficult items to her test,‭ ‬she will include items that have an item difficulty index of:

The item difficulty index ranges from 1.0 (which occurs when everyone in a sample answers the item correctly) to .0 (which occurs when no one answers the item correctly).

An item difficulty level of .10 indicates a difficult item (only 10% of examinees in the sample answered it correctly) and is the best answer of those given.

When an item's difficulty index is .90, this means that it is a very easy item – i.e., it was answered correctly by 90% of examinees in the sample.

An item difficulty level of .50 indicates an item of moderate difficulty (50% of examinees answered the item correctly).
A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when:
the test consists of‭ ‬80‭ ‬items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology.

The reliability of a test is affected by several factors including the length of the test and the heterogeneity of the sample in terms of the abilities or other attributes measured by the test items.

All other things being equal, longer tests are more reliable than shorter tests. In addition, the reliability coefficient (like any other correlation coefficient) is larger when there is an unrestricted range of scores – i.e., when the tryout sample contains examinees who are heterogeneous with regard to the attribute(s) measured by the test.
Criterion contamination has which of the following effects‭?
It artificially increases the predictor''s criterion-related validity coefficient.
Criterion contamination occurs when a rater's knowledge of a person's predictor performance biases how he/she rates the person on the criterion.
Criterion contamination has the effect of artificially inflating the correlation between the predictor and the criterion.
Cronbach's alpha is an appropriate method for evaluating reliability when:
all test items are designed to measure the same underlying characteristic.

To answer this question, you need to know that Cronbach's alpha is another name for coefficient alpha and is used to assess internal consistency reliability.

Cronbach's alpha is an appropriate method for evaluating reliability when the test is expected to be internally consistent – i.e., when all test items measure the same or related characteristics.
In a normal distribution,‭ ‬which of the following represents the‭ ‬lowest score‭?
T score of‭ ‬25

To answer this question, you must be familiar with the relationship between percentile ranks, z-scores, T-scores, and IQ scores in a normal distribution.

A T score is a standardized score with a mean of 50 and a standard deviation of 10. Therefore, a T-score of 25 is two and one-half standard deviations below the mean and represents the lowest score of those given in the answers.

Wechsler IQ scores have a mean of 100 and standard deviation of 15. Therefore, an IQ score of 70 is two standard deviations below the mean.

A z-score of -1.0 is one standard deviation below the mean.

A percentile rank of 20 is slightly less than one standard deviation below the mean.
Which of the following techniques would be most useful for combining test scores when superior performance on one test can compensate for poor performance on another test‭?
multiple regression

This question is asking about the preferred technique for combining test scores when a high score on one measure can compensate for a low score on another measure.

Multiple regression is a compensatory technique for combining test scores. When using this technique, a low score on one test can be offset (compensated for) by a high score on another tests.
Incremental validity is a measure of:
decision making accuracy.

Incremental validity refers to the increase (“increment”) in decision-making accuracy that results from the use of a new predictor (e.g., the increase in accurate hiring decisions).
Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speeded test‭?
alternate forms

Speeded tests are designed so that all items answered by an examinee are answered correctly, and the examinee’s total score depends primarily on his/her speed of responding. Because of the nature of these tests, a measure of internal consistency will provide a spuriously high estimate of the test's reliability.

Alternate-forms reliability is an appropriate method for establishing the reliability of speeded tests.

Split-half reliability is a type of internal consistency reliability and is not appropriate for speeded tests.
Which of the following item difficulty‭ (‬p‭) ‬levels maximizes the differentiation of examinees into high-‭ ‬and low-performing groups:
An item difficulty level (p) ranges in value from 0 to +1.0 with a value of 0 indicating a very difficult item and a value of +1.0 indicating a very easy item. A difficulty index of .50 indicates that 50% of examinees in the try-out sample answered the item correctly.

When p equals .50, this means that the item provides maximum differentiation between the upper- and lower-scoring examinees – i.e., a large proportion of examinees in the upper group answered the item correctly, while a small proportion of examinees in the lower group answered it correctly.
In a normal distribution,‭ ‬a T score of‭ ___ ‬is equivalent to a percentile rank of‭ ‬16.

To identify the correct answer to this question, you need to be familiar with the areas under the normal curve and know that T scores have a mean of 50 and a standard deviation of 10.

In a normal distribution, a percentile rank of 16 and a T score of 40 are both one standard deviation below the mean.
In a distribution of percentile ranks,‭ ‬the number of examinees receiving percentile ranks between‭ ‬20‭ ‬and‭ ‬30‭ ‬is:
equal to the number of examinees receiving percentile ranks between‭ ‬50‭ ‬and‭ ‬60.

Knowing that a distribution of percentile ranks is flat (rectangular) would have helped you identify the correct answer to this question.

The flatness of a percentile rank distribution indicates that scores are evenly distributed throughout the full range of the distribution. In other words, at least theoretically, the same number of examinees fall at each percentile rank. Consequently, the same number of examinees obtain percentile ranks between the ranks of 20 and 30, 30 and 40, etc.
When using criterion-referenced interpretation of scores obtained on a job knowledge test,‭ ‬you would most likely be interested in which of the following‭?
the total number of test items answered correctly by an examinee

As its name implies, criterion-referenced interpretation entails interpreting an examinee's score in terms of a criterion, or standard of performance.

One criterion that is used to interpret a person's test score is the total number of correct items. This criterion is probably most associated with "mastery testing." A person is believed to have mastered a content area when he/she obtains a predetermined minimum score on the test that is designed to assess knowledge of that area.