• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/132

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

132 Cards in this Set

  • Front
  • Back
What two concepts does the definition of “measurement” consist of?
rules for assigning symbols to objects in order to:

1. Represent quantities of attributes numerically (scaling)
or
2. Define whether the objects fall in the same or different categories with respect to a given attribute (classification)
What is “measurement”?
the act of assigning numbers or symbols to characteristics of things according to a set of rules indicating:

-how numbers will be assigned to attributes
or
-whether an object belongs to one category or another
In psychology, we are usually measuring _______________.
-some attribute of people (or sometimes animals)
What is psychometrics?
the science of psychological measurement
What is the name of “the science of psychological measurement”?
-psychometrics
What is psychometric soundness?
-how consistently and how accurately a psychological test measures what it purports to measure
What are the three defining characteristics of a psychological test? (full answer)
1. A psychological test is a sample of behaviour
2. The sample is obtained under standardized conditions
3. There are established rules for scoring or for obtaining quantitative (numeric) information from behaviour sampling
What are the three defining characteristics of a psychological test? (quick version)
-sample of behaviour
-standardization
-scoring rules
The quality of a test is largely determined by the_______________________________.
-representativeness of the sample of behaviour

*sample of all the ways a person could demonstrate a certain behaviour – not every single way
A respondent’s behavior/response on a test is used to _____________________________. (2)
-measure some specific attribute
or
-predict some specific outcome
What are scoring rules?
- set of rules or procedures for describing in quantitative terms the subject’s behavior in response to the test
-must be comprehensive and well-defined so different examiners will assign similar/identical scores when scoring the same set of responses
What are the three general categories that most psychological tests fit into?
-tests of performance
-behaviour observations
-self-reports
What is a test of performance?
-one of the three general categories of psychological tests
-respondents are given some well-defined task that they try their best to perform successfully
-the test score is determined by the respondent’s success in completing each task
What are behaviour observations?
-one of the three general categories of psychological tests
-involve observing the subject’s behavior and responses in a particular context
-no single, well defined task (may not even know they’re being measured!)
What are self-reports?
-one of the three general categories of psychological tests
-participants are asked to report or describe their feelings, attitudes, beliefs, values, opinions, or physical or mental state
-Include a variety of surveys/questionnaires
Many personality inventories fall into which of the three general categories of psychological tests?
-self-report tests
What are some concerns over psychological tests?
-might have an undue impact on an individual (high stakes decisions)
---e.g. not getting into grad school based on GRE
-might change an individual’s decisions/actions
---e.g. fear of bad GRE score --> doesn’t apply to grad school
-might have an undue impact on an individual (high stakes decisions)
---e.g. not getting into grad school based on GRE

-might change an individual’s decisions/actions
---e.g. fear of bad GRE score --> doesn’t apply to grad school
Test users and developers have a responsibility to make sure that decisions made with the aid of tests are _____ and _____.
-accurate and fair
What are the important sources for test ethics?
1. Ethical Principles of Psychologists and Code of Conduct (American Psychological Association)
2. Standards for Educational and Psychological Testing (APA)
--is a book, specific to testing
--good test instruction
What are some relevant principles from the “Ethical Principles of Psychologists and Code of Conduct” (APA)?
-Boundaries of competence
-Maintaining confidentiality
-Use of assessments
The Standards for Educational and Psychological Testing (APA) discusses the appropriate technical and professional standards to be followed in what four areas?
-construction
-evaluation
-interpretation
-application of psychological tests
What is a scale?
-a set of numbers whose properties model the properties of the objects to which the numbers are assigned
What are the four types of measurement scales?
-nominal
-ordinal
-interval
-ratio
What is a nominal scale?
-numbers are used to classify and identify
-numbers are substituted for names/labels
-actual numbers chosen are arbitrary
--e.g. 1 for male, 2 for female
What is an ordinal scale?
-assigns numbers to individuals so that the rank order of the numbers corresponds to the rank order of the individuals in terms of the attribute being measured
-implies nothing about how much greater one ranking is than another
-no zero point
-who is higher than who, but don’t know how far apart (e.g. 1st, 2nd, 3rd in a race)
Percentiles are a(n) ___________level scale
-ordinal
What is an interval scale?
-contains equal intervals between numbers
-therefore, can calculate averages
-zero point has no intrinsic meaning (e.g. zero degrees Celsius does not mean an absence of temperature)
Which type of scale (NOIR) is most common in psychology? Give example + caveat.
-interval
---e.g. IQ (0 doesn’t imply lack of intelligence)

-caveat: they may not BE interval scales but that’s how
we treat them
---e.g. is there an equal difference between happy 1 2 3 4 5 sad?
What is a ratio scale?
-Contains equal intervals between numbers
-Has a true zero point
-Can form ratios (e.g., 2 meters is twice as tall as 1 meter)
For which type(s) of scales (NOIR) can we calculate averages?
-interval and ratio
What is Central Tendency?
-where a distribution is centered
-Mean
-Mode
-Median
What are some advantages of calculating a mean?
-Mathematical manipulations
-Generally a more stable estimate of the central tendency of the population
What are some disadvantages of calculating a mean?
-susceptible to extreme scores
-value may not be a data point (i.e. no one actually got it as a score)
-need to assume interval scale properties
What is a mode?
-the most common score (the score obtained by the most people)
What can we do when there is more than one mode?
-average the scores (if they are adjacent)
-report two numbers, but maintain the distribution is uni-modal (if they are near each other)
-report bimodality
How do you calculate the median?
-odd n: middle score
-even n: average of the two middle scores
What is the equation to calculate variance?
What does a large variance indicate?
-individual scores tend to differ substantially from the mean and therefore from each other
Why can variance be difficult to interpret?
because it is based on the squared deviations around the mean
Why is SD preferable to variance?
-it “undoes” the square of the deviation
-is a more interpretable measure of dispersion
What is standard deviation?
-the average amount the scores deviated from the mean
-the square root of the variance
How can you use a table to help calculate standard deviation?
What is a z-score?
-tell you how many standard deviations above or below the mean a particular score is
- # of standard deviations away from the mean, and in what direction
What is the calculation for a z-score?
What does it mean to have a set of matched or paired data?
-where there are 2+ variables measured for the same person
---e.g. IQ and school performance
What is a Product Moment Correlation?
- rXY
- statistic that determines the degree of relation present between two variables
-calculated over n pairs of observations
-perfect correlation = all points in the scattergram fall on a straight line
What does the Pearson’s r score indicate?
+1 indicates a perfect direct relation between X and Y
-1 indicates a perfect indirect relation between X and Y
0.0 indicates no linear relation between the two
How do we calculate a product moment correlation? (Pearson’s r)
-With X and Y as Z scores (ZX and ZY, mean = 0, SD = 1)

rxy = ...
What is the computation (in words) of Pearson’s r?
The “average cross product of Z scores”
What is the symbol for Pearson’s r?
rXY
What is another name for product moment correlation?
-Pearson’s r
What unit is associated with correlations?
-none, because it is calculated using Z scores
Why is transforming scores useful?
-raw scores can be completely arbitrary (what does a score of 10 MEAN?)
-a transformation can take into account information not contained in the raw score itself
-can compare different scores
What is the calculation to transform a score from an old scale to a new scale?
asterisk = new
asterisk = new
What is this equation for?
What is this equation for?
-calculation to transform a score from an old scale to a new scale
-asterisk = new
What is a t score?
-mean of 50 and SD of 10
-round up to a whole number
What does a percentile rank represent?
-the percentage of a comparison group that earned a raw score less than or equal to the score of that particular individual
-what percentage of the group scored equal or less than the score being compared
-Percentiles divide the data set into 100 equal parts
What percentiles correspond to each whole SD in a normal distribution?
-3 = 1st
-2 = 2nd
-1 = 16th
0 = 50
1 = 84
2 = 98
3 = 99
Draw the empirical rule (with percents, SDs, and percentiles).
What is a “norm-based interpretation”?
-when a person’s test score is interpreted by comparing that score with the scores of several other people
-uses normative data (norms)
What is Normative Data?
-the scores of a standard population who have already taken the test
What does a norm-based score indicate?
-where an individual stands in comparison with the particular normative group that defines the set of standards.
What is the most common form of norm?
-percentile rank
-represents the percentage of the norm group that earned a raw score less than or equal to the score of that particular individual
What are age norms? Implications?
- relates a level of test performance to the age of people who have taken the test
--- requires a representative sample at each of several ages
-important because many psychological characteristics change over time; vocabulary, mathematical ability, moral reasoning, etc.
What is the difference between norm-referenced vs criterion-referenced evaluation?
-Norm referenced tests provide information about an individual’s performance relative to other people

-Criterion-referenced tests evaluate a person’s score with reference to a set standard (e.g., a driving test)
What are the 5 steps in test development?
-Test conceptualization
-Test construction
-Test tryout
-Item analysis
-Test revision
What are psychological traits?
-any distinguishable, relatively enduring way in which on individual varies from another
-the strength of the trait is based on observing a sample of behaviour (direct observation or self-report)
What is the first step in developing a test to measure a trait?
- define the trait or construct to be measured
What is the “domain” of a psychological trait?
-all possible behaviours that could conceivably be indicative of a particular construct
What is “domain sampling”?
-taking a sample of all possible behaviours that are indicative of a trait
How do we identify a construct when developing a test?
-come up with a list of what should be included and excluded from the construct
---e.g. want to measure risk-taking, but exclude drugs, risky sex, or criminal activity
What are the two broad types of items (and responses) on a test?
-selected response format

-constructed response
What is “selected response format”?
-multiple choice
-True/False
-forced choice
-rating scales
What is a “constructed response”?
-essay
-tell a story
-describe a picture
For item formatting, how do achievement/aptitude tests differ from trait/attitude tests?
-achievement/aptitude: must select the response that is keyed as correct

-strength of trait/attitude: select the response that best describes themselves (no right/wrong)
What are the three elements of a multiple choice question?
-A stem
-A correct response option
-Several incorrect distracters
What are some important considerations when writing multiple choice items?
-only one correct alternative
-answer options must match grammatically with stem
-answer options of similar length
-reasonable distractors
-include as much of the item as possible in the stem to avoid unnecessary repetition
What is scaling?
-the process of setting rules for assigning numbers in measurement
What is a Likert format item?
-point scale
-total score is computed as the sum of scores on individual items
-usually an equal number of positive/negatives
What were Likert’s original categories?
1 = strongly approve
2 = approve
3 = undecided
4 = disapprove
5 = strongly disapprove
What is the first step in the test construction process?
-to generate an item pool
What are some common issues that arise in item writing?
-item length
-vocabulary
-double-barreled
-double negatives
-ambiguity of meaning
What is a Thurstone scale?
-score is the mean of the scale values of the statements they check
---e.g. capital punishment scale
What is a Guttman scale?
-items range from sequentially weaker to stronger expressions of the attitude, belief or feeling
-all respondents who agree with the stronger statements of the attitude will also agree with milder statements
-e.g. measuring “veganness”
What are paired comparisons?
-choose between two items/statements
-e.g. I like to arrange flowers vs solve computer software problems
What is cumulative scoring?
-the higher the score on the test, the higher the test taker is on the ability/trait/characteristic

-for each response made in a particular way (e.g., correct answer, answering “true”), the test taker earns cumulative credit
What is ipsative scoring?
-measures relative strength of characteristics within the same individual
-lack of item independence makes pure ipsative data inappropriate for many statistical analyses
-in pure ipsative scoring, the response choices are pitted against one another so that choosing one option, by definition, makes the respondent higher on one scale and lower on the other
---e.g. chocolate vs strawberry vs vanilla
What are two examples of the empirical method of test development?
-criterion-keyed approach
-contrasted groups approach
What is the contrasted groups approach of test development?
-example of empirical method
-focuses on looking for differences in the way two groups respond to the same item
-select the items that best distinguish between the groups
What is the criterion-keyed approach of test development?
-example of empirical method
-focuses on examining the correlation between each item and some criterion (e.g., performance in university)
-good items correlate highly with criterion
What are some important characteristics of the empirical method of test development?
-item content is only of secondary importance
-no theory guiding development: the test will predict, but not explain, behaviour
-VERY criterion specific (what predicts one criterion well may do a very poor job predicting another)
What is a product moment correlation looking for?
-the degree of relation present between two variables
How do you calculate Pearson’s r (in words)?
-transform all of the scores to Z scores
-multiply the Z scores for each person (e.g. midterm x final exam)
-add all of those up
-divide by n
Draw an example of calculating Pearson’s r. (midterm and final exam scores)
What is “error score”?
-any deviation from the mean could be called error
What is “true score”?
-average observed value (over infinite # of measurements)
What is systematic measurement error?
-errors that are consistently in the same direction
What is random measurement error?
-errors that can be positive or negative
-average out to zero over an infinite number of measurements

**basis for classical measurement theory
What is the difference between systematic and random measurement error?
-systematic errors are consistently in the same direction
-random errors can be positive or negative (and average out to zero over an infinite number of measurements)
What is the basis for classical measurement theory?
-random measurement error
What do theories of test reliability do?
-estimate the effects of inconsistency on the accuracy of psychological measurement
What is the basic equation of Test Theory?
X: score obtained by an individual on a test 
T:  individual’s “true” score on that attribute
E:  an error score associated with the  measurement (features  of the individual or the situation that can affect test scores but have nothing to do with the a
X: score obtained by an individual on a test
T: individual’s “true” score on that attribute
E: an error score associated with the measurement (features of the individual or the situation that can affect test scores but have nothing to do with the attribute being measured)
What are the two basic assumptions of Test Theory?
1. X = T + E
2. Error is random (+ve or -ve)
What are the implications of assuming error is random (Test Theory)? EQUATIONS.
* ε = Expected Value, which is the average value in the longrun (i.e., over an infinite number of measurements)
* ε = Expected Value, which is the average value in the longrun (i.e., over an infinite number of measurements)
What are the implications of assuming error is random (Test Theory)?
-average error (over infinite # of measurements) = 0
-average observed value (over infinite # of measurements) = true score
-the correlation between true score and error is zero (because errors are random)
What is the equation to calculate a reliability coefficient?
*once we start calculating, the P becomes an r (b/c sample not population)
*once we start calculating, the P becomes an r (b/c sample not population)
What is this equation for?
What is this equation for?
reliability coefficient
What does “reliability” indicate?
-the proportion of variance in tests scores that is due to or accounted by variability in TRUE scores
-a test with little measurement error = reliable
--- the X – T differences are small
Given observed score variance and reliability, how could you calculate the value of error variance?
-take proportion of variance attributable to error (1-reliability), then apply that % to observed score variance
---e.g. 1-.92 = .08 = 8%, 8% of 400 is 32
-plug it in to Pxx = true score variance/observed score variance
-then subtract true score variance from observed score variance
--e.g. .92 = true score variance/400, TSV = 368, 400-368=32
What is the definition of reliability according to the parallel test model?
-X’ is a randomly parallel test/item
-it is assumed that an infinite universe (domain, or population) of randomly parallel tests exist
-reliability is the “correlation between any two randomly parallel tests”
-high correlation = a high reliability
-X’ is a randomly parallel test/item
-it is assumed that an infinite universe (domain, or population) of randomly parallel tests exist
-reliability is the “correlation between any two randomly parallel tests”
-high correlation = a high reliability
What are the three different equation/definitions for reliability?
How can we assess reliability?
-test-retest reliability
-parallel forms reliability
-split-half reliability
-internal consistency (alpha)
What is test-retest reliability an estimate of?
- stability
What are some problems with split-half reliability?
(1) Correlation is based on use of ½ the number of test items. Therefore, this is not the reliability of the test (correlation needs to be corrected to estimate reliability of full test)

(2) How do you decide to split the test? (e.g. Odd v.s even? First v.s. second half? Random?)
---too many ways, each way will provide a different answer!
What is the solution to the problems with split-half reliability?
-use coefficient alpha
What is the equation to calculate Cronbach’s Alpha?
K = # of test items
i = variance of test ITEM
x = variance of total score (sum on all items)
K = # of test items
i = variance of test ITEM
x = variance of total score (sum on all items)
What is this equation, and what do the parts of it mean?
K = # of test items
i = variance of test ITEM
x = variance of total score (sum on all items)
K = # of test items
i = variance of test ITEM
x = variance of total score (sum on all items)
When is Chronbach’s alpha applicable?
- as an indicator of the internal consistency reliability of unifactor tests, or homogeneous measures
---i.e. the items should be correlated and measure one thing
What are the three important factors that affect the reliability of a test?
1. The mean inter-item correlation of the test (r)
2. The length of the test (K)
3. The variance of the true scores in the sample (σ2T)
What is the mean inter-item correlation of a test?
-r
-correlation between the items
-can be interpreted as the average single item reliability (the extent to which each item represents an observation of the same thing observed by the other items)
How can the mean inter-item correlation of a test be interpreted as? Why?
-the average single item reliability

-because each item can be considered a parallel mini test, and one definition of reliability is the correlation between two parallel tests
Draw a graph/chart of mean inter-item correlation.
average of these cells = mean inter-item correlation
average of these cells = mean inter-item correlation
What is this equation and what do the symbols mean?
What is this equation and what do the symbols mean?
Standardized Alpha
K = # of test items
rij = average inter-correlation among test items
What is the equation for standardized alpha?
K = # of test items
rij = average inter-correlation among test items
K = # of test items
rij = average inter-correlation among test items
How does the length of the test affect its reliability?
-remember: ε(X) = T
-each item on a test is like a mini-test
-the more items, the more the random errors cancel each other out (sometimes +, sometimes -)
-longer the better!
What is the Spearman Brown prophecy formula used for?
- can be used to predict the effect of lengthening a test (i.e., what will the new reliability estimate be once we make the test longer?)
What is the Spearman Brown prophecy formula?
newrxx = predicted “new” reliability
oldrxx = reliability of original test
n = ratio of new to old test length of the form (new length/old length)
newrxx = predicted “new” reliability
oldrxx = reliability of original test
n = ratio of new to old test length of the form (new length/old length)
What equation is this? What does each part mean?
What equation is this? What does each part mean?
newrxx = predicted “new” reliability
oldrxx = reliability of original test
n = ratio of new to old test length of the form (new length/old length)
What an important assumption of the Spearman Brown Prophecy Formula?
-items added are parallel to the original items (they measure the same thing)
What is this equation?
What is this equation?
- Spearman-Brown rearranged in order to determine how much longer a test needs to be to obtain a desired reliability
How does the variance of the true scores in the sample affect the reliability of a test?
-increasing the variance of the true scores increases this ratio
-broad range of samples = greater true score variance --> variance of observed scores will increase
-variance of errors should NOT increase (uncorrelated with true scores)
-difficult to d
-increasing the variance of the true scores increases this ratio
-broad range of samples = greater true score variance --> variance of observed scores will increase
-variance of errors should NOT increase (uncorrelated with true scores)
-difficult to develop a reliable measure of an attribute that people do not differ very much on
-same test might show different levels of reliability depending on the sample it’s used on
What can tell us how much variability should be expected on the basis of measurement error?
-the Standard Error of Measurement