• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/29

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

29 Cards in this Set

  • Front
  • Back

Instrument

any type of measurement device (e.g. test, questionnaire, interview schedule, or personality scale).

Instrumentation

heading in formal research reports for the section where measurement devices are described.

Validity


BLUF: does an instrument measure the construct that you are trying to measure?



it is the extent that an instrument measures what it is designed to measure and accurately performs the function(s) it is purported to perform.



Always a matter of how valid a test is, not whether it is valid--no test is perfectly valid, because:


1. all tests tap only a sample of the behavior underlying the constructs being measured.


2. some traits researchers want to measure are inherently difficult to measure.

Content validity


the appropriateness of an instrument's contents. Example: achievement tests--what questions and in what format should you ask to evaluate comprehension?



Researchers make judgments on content validity, and generally:


1. a broad sample of content is usually better than a narrow one.


2. important material should be emphasized.


3. questions should be written to measure the appropriate skills, such as knowledge of facts and definitions, application of definitions to new situations, drawing inferences, making critical appraisals, etc.

Face validity

whether an instrument appears to be valid on the face of it. Example: pilot training test--drawings of shapes moving in space seemingly less valid than drawings of airplanes moving in space.

Criterion

standard by which a test is being judged.

Predictive validity

the extent to which a test predicts the outcome it is supposed to predict at some future time. Example: SAT tests predicting college abilities/performance.

Concurrent validity

the extent to which a test predicts your current state. Example: mid-term exam predicts current understanding in the class.

Validity coefficient

a correlation coefficient used to express validity, ranges from 0 to 1, but in practice we usually don't see higher than 0.6 as a strong correlation.



In general, higher number (i.e. closer to 1) is more valid.



Construct validity

the type of validity that relies on subjective judgments and empirical data. Offers only indirect evidence regarding the validity of a measure.



To determine construct validity, researchers begin by hypothesizing about how the construct the instrument is designed to measure should affect or relate to other variables.

Construct

stands for a collection of related behaviors that are associated in a meaningful way.



Example: "Depression" is construct that stands for lethargy, apathy, loss of appetite, etc, etc

Reliability

BLUF: how repeatable is the measure?



it is the consistency of results between measurements.



Equation:


Reliability = (System Variance)/(System Variance + Error Variance)



Reliability is necessary, but not sufficient for validity. If you have to pick--go for VALIDITY first, then RELIABILITY second.


Interobserver reliability

reliability or agreement between observers/testers.

Correlation coefficient

correlation between variables represented as a number from 0 to 1, where 1 = perfect correlation. Note: 1 is never found in the field.

Reliability coffecients

correlation coefficient to decribe reliability, can be extended to the particular type of reliability.



Low reliability = below 0.7


Good reliabilty = 0.7 and above



Example: Interobserver reliability = reliability coefficient (correlation) between observers.

Two methods for estimating the reliability of a test:

1. Test-retest reliability


2. Parallel-forms reliability

Test-retest reliability


(aka Stability)

BLUF: researchers measure *two different points in time*.



extent to which the instrument gives the same score each time you use it (i.e. pre-test and post-test). Expect high correlation with stable tests. Note: Time is a factor in the stability.



Example: Personality test over time -- results may be less reliable if you wait a long time between testing.

Parallel-forms reliability

BLUF: test comes in two versions with same content, over *two points in time*.



when a test comes in two parallel (equivalent) forms that are designed to be interchangeable with each other (different items but same content).



Example: GRE tests (Version A and Version B)

Split-half reliability

BLUF: comparing test content to itself based on results, at *single point in time*.



researcher administers a tests but scores the items as though they consisted of two separate tests by doing a even-odd split.

Cronbach's alpha


(aka coefficient alpha)

BLUF: comparing test content to itself based on results, at *single point in time*.



measures internal consistency (extent to which items in test are homogenous *WITHIN that test only*). Based on computing all possible split-halfs possible in a single administration of a test.



higher alpha value (higher internal consistency) = more reliable test

Norm-referenced tests (NRTs)

tests designed to facilitate a comparison of an individual's performance with that of a norm group (peer sample or population).



NRTs are intentionally built to be of medium difficulty.



Example: examinee test results are percentile rank of 64 means that individual scored better than 64% of the individuals in the norm group.



Criterion-referenced tests (CRTs)

tests designed to measure the extent to which individuals examinees have met performance standards (i.e. specific criteria).



When building CRTs, item difficulty typically is of little concern--expert judgment is used to determine the desired level of performance and how to test for it.



Example: PT test

Which type of test should be favored in research?

Answer depends on the research purpose. Two general guidelines:


1. If purpose is to describe specifically what examinees know and can do -- use CRTs.


2. If the purpose is to examine how a local group differs from a larger norm group -- use NRTs.

Achievement test

measures knowledge and skills individuals have acquired.



Example: Midterm exam

Aptitude test

designed to predict some *specific type* of achievement.



The most widely used aptitude tests typically have low to modest validity (validity coefficients between 0.20 - 0.60).



Example: SAT

Intelligence test

designed to predict achievement *in general*.



Most popular intelligence test have low to moderate validity (0.20 - 0.60) and are:


1. culturally loaded -- measures skills acquired in some cultural milieu


2. measure knowledge and skills that can be acquired with instruction



Example: IQ test, Wonderlic test

Three basic approaches to reducing social desirability in participant responses:

1. by administering personality measures anonymously


2. to observe behavior unobtrusively (without the participant's awareness, to the extent that is ethically possible) and rate selected characteristics


3. use projective techniques (provide loosely structured or ambiguous stimuli such as ink blots)

Likert-type scales

scales that have choices from "strongly agree" to "strongly disagree". Objective in terms that responses are restricted to certain choices and each statement presents a clear statement on a single topic.

Reverse scoring

mixing positive and negative statements to highlight/reduce response bias (firewall marking answers without considering individual items).