Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
132 Cards in this Set
- Front
- Back
What two concepts does the definition of “measurement” consist of?
|
rules for assigning symbols to objects in order to:
1. Represent quantities of attributes numerically (scaling) or 2. Define whether the objects fall in the same or different categories with respect to a given attribute (classification) |
|
What is “measurement”?
|
the act of assigning numbers or symbols to characteristics of things according to a set of rules indicating:
-how numbers will be assigned to attributes or -whether an object belongs to one category or another |
|
In psychology, we are usually measuring _______________.
|
-some attribute of people (or sometimes animals)
|
|
What is psychometrics?
|
the science of psychological measurement
|
|
What is the name of “the science of psychological measurement”?
|
-psychometrics
|
|
What is psychometric soundness?
|
-how consistently and how accurately a psychological test measures what it purports to measure
|
|
What are the three defining characteristics of a psychological test? (full answer)
|
1. A psychological test is a sample of behaviour
2. The sample is obtained under standardized conditions 3. There are established rules for scoring or for obtaining quantitative (numeric) information from behaviour sampling |
|
What are the three defining characteristics of a psychological test? (quick version)
|
-sample of behaviour
-standardization -scoring rules |
|
The quality of a test is largely determined by the_______________________________.
|
-representativeness of the sample of behaviour
*sample of all the ways a person could demonstrate a certain behaviour – not every single way |
|
A respondent’s behavior/response on a test is used to _____________________________. (2)
|
-measure some specific attribute
or -predict some specific outcome |
|
What are scoring rules?
|
- set of rules or procedures for describing in quantitative terms the subject’s behavior in response to the test
-must be comprehensive and well-defined so different examiners will assign similar/identical scores when scoring the same set of responses |
|
What are the three general categories that most psychological tests fit into?
|
-tests of performance
-behaviour observations -self-reports |
|
What is a test of performance?
|
-one of the three general categories of psychological tests
-respondents are given some well-defined task that they try their best to perform successfully -the test score is determined by the respondent’s success in completing each task |
|
What are behaviour observations?
|
-one of the three general categories of psychological tests
-involve observing the subject’s behavior and responses in a particular context -no single, well defined task (may not even know they’re being measured!) |
|
What are self-reports?
|
-one of the three general categories of psychological tests
-participants are asked to report or describe their feelings, attitudes, beliefs, values, opinions, or physical or mental state -Include a variety of surveys/questionnaires |
|
Many personality inventories fall into which of the three general categories of psychological tests?
|
-self-report tests
|
|
What are some concerns over psychological tests?
-might have an undue impact on an individual (high stakes decisions) ---e.g. not getting into grad school based on GRE -might change an individual’s decisions/actions ---e.g. fear of bad GRE score --> doesn’t apply to grad school |
-might have an undue impact on an individual (high stakes decisions)
---e.g. not getting into grad school based on GRE -might change an individual’s decisions/actions ---e.g. fear of bad GRE score --> doesn’t apply to grad school |
|
Test users and developers have a responsibility to make sure that decisions made with the aid of tests are _____ and _____.
|
-accurate and fair
|
|
What are the important sources for test ethics?
|
1. Ethical Principles of Psychologists and Code of Conduct (American Psychological Association)
2. Standards for Educational and Psychological Testing (APA) --is a book, specific to testing --good test instruction |
|
What are some relevant principles from the “Ethical Principles of Psychologists and Code of Conduct” (APA)?
|
-Boundaries of competence
-Maintaining confidentiality -Use of assessments |
|
The Standards for Educational and Psychological Testing (APA) discusses the appropriate technical and professional standards to be followed in what four areas?
|
-construction
-evaluation -interpretation -application of psychological tests |
|
What is a scale?
|
-a set of numbers whose properties model the properties of the objects to which the numbers are assigned
|
|
What are the four types of measurement scales?
|
-nominal
-ordinal -interval -ratio |
|
What is a nominal scale?
|
-numbers are used to classify and identify
-numbers are substituted for names/labels -actual numbers chosen are arbitrary --e.g. 1 for male, 2 for female |
|
What is an ordinal scale?
|
-assigns numbers to individuals so that the rank order of the numbers corresponds to the rank order of the individuals in terms of the attribute being measured
-implies nothing about how much greater one ranking is than another -no zero point -who is higher than who, but don’t know how far apart (e.g. 1st, 2nd, 3rd in a race) |
|
Percentiles are a(n) ___________level scale
|
-ordinal
|
|
What is an interval scale?
|
-contains equal intervals between numbers
-therefore, can calculate averages -zero point has no intrinsic meaning (e.g. zero degrees Celsius does not mean an absence of temperature) |
|
Which type of scale (NOIR) is most common in psychology? Give example + caveat.
|
-interval
---e.g. IQ (0 doesn’t imply lack of intelligence) -caveat: they may not BE interval scales but that’s how we treat them ---e.g. is there an equal difference between happy 1 2 3 4 5 sad? |
|
What is a ratio scale?
|
-Contains equal intervals between numbers
-Has a true zero point -Can form ratios (e.g., 2 meters is twice as tall as 1 meter) |
|
For which type(s) of scales (NOIR) can we calculate averages?
|
-interval and ratio
|
|
What is Central Tendency?
|
-where a distribution is centered
-Mean -Mode -Median |
|
What are some advantages of calculating a mean?
|
-Mathematical manipulations
-Generally a more stable estimate of the central tendency of the population |
|
What are some disadvantages of calculating a mean?
|
-susceptible to extreme scores
-value may not be a data point (i.e. no one actually got it as a score) -need to assume interval scale properties |
|
What is a mode?
|
-the most common score (the score obtained by the most people)
|
|
What can we do when there is more than one mode?
|
-average the scores (if they are adjacent)
-report two numbers, but maintain the distribution is uni-modal (if they are near each other) -report bimodality |
|
How do you calculate the median?
|
-odd n: middle score
-even n: average of the two middle scores |
|
What is the equation to calculate variance?
|
|
|
What does a large variance indicate?
|
-individual scores tend to differ substantially from the mean and therefore from each other
|
|
Why can variance be difficult to interpret?
|
because it is based on the squared deviations around the mean
|
|
Why is SD preferable to variance?
|
-it “undoes” the square of the deviation
-is a more interpretable measure of dispersion |
|
What is standard deviation?
|
-the average amount the scores deviated from the mean
-the square root of the variance |
|
How can you use a table to help calculate standard deviation?
|
|
|
What is a z-score?
|
-tell you how many standard deviations above or below the mean a particular score is
- # of standard deviations away from the mean, and in what direction |
|
What is the calculation for a z-score?
|
|
|
What does it mean to have a set of matched or paired data?
|
-where there are 2+ variables measured for the same person
---e.g. IQ and school performance |
|
What is a Product Moment Correlation?
|
- rXY
- statistic that determines the degree of relation present between two variables -calculated over n pairs of observations -perfect correlation = all points in the scattergram fall on a straight line |
|
What does the Pearson’s r score indicate?
|
+1 indicates a perfect direct relation between X and Y
-1 indicates a perfect indirect relation between X and Y 0.0 indicates no linear relation between the two |
|
How do we calculate a product moment correlation? (Pearson’s r)
|
-With X and Y as Z scores (ZX and ZY, mean = 0, SD = 1)
rxy = ... |
|
What is the computation (in words) of Pearson’s r?
|
The “average cross product of Z scores”
|
|
What is the symbol for Pearson’s r?
|
rXY
|
|
What is another name for product moment correlation?
|
-Pearson’s r
|
|
What unit is associated with correlations?
|
-none, because it is calculated using Z scores
|
|
Why is transforming scores useful?
|
-raw scores can be completely arbitrary (what does a score of 10 MEAN?)
-a transformation can take into account information not contained in the raw score itself -can compare different scores |
|
What is the calculation to transform a score from an old scale to a new scale?
|
asterisk = new
|
|
What is this equation for?
|
-calculation to transform a score from an old scale to a new scale
-asterisk = new |
|
What is a t score?
|
-mean of 50 and SD of 10
-round up to a whole number |
|
What does a percentile rank represent?
|
-the percentage of a comparison group that earned a raw score less than or equal to the score of that particular individual
-what percentage of the group scored equal or less than the score being compared -Percentiles divide the data set into 100 equal parts |
|
What percentiles correspond to each whole SD in a normal distribution?
|
-3 = 1st
-2 = 2nd -1 = 16th 0 = 50 1 = 84 2 = 98 3 = 99 |
|
Draw the empirical rule (with percents, SDs, and percentiles).
|
|
|
What is a “norm-based interpretation”?
|
-when a person’s test score is interpreted by comparing that score with the scores of several other people
-uses normative data (norms) |
|
What is Normative Data?
|
-the scores of a standard population who have already taken the test
|
|
What does a norm-based score indicate?
|
-where an individual stands in comparison with the particular normative group that defines the set of standards.
|
|
What is the most common form of norm?
|
-percentile rank
-represents the percentage of the norm group that earned a raw score less than or equal to the score of that particular individual |
|
What are age norms? Implications?
|
- relates a level of test performance to the age of people who have taken the test
--- requires a representative sample at each of several ages -important because many psychological characteristics change over time; vocabulary, mathematical ability, moral reasoning, etc. |
|
What is the difference between norm-referenced vs criterion-referenced evaluation?
|
-Norm referenced tests provide information about an individual’s performance relative to other people
-Criterion-referenced tests evaluate a person’s score with reference to a set standard (e.g., a driving test) |
|
What are the 5 steps in test development?
|
-Test conceptualization
-Test construction -Test tryout -Item analysis -Test revision |
|
What are psychological traits?
|
-any distinguishable, relatively enduring way in which on individual varies from another
-the strength of the trait is based on observing a sample of behaviour (direct observation or self-report) |
|
What is the first step in developing a test to measure a trait?
|
- define the trait or construct to be measured
|
|
What is the “domain” of a psychological trait?
|
-all possible behaviours that could conceivably be indicative of a particular construct
|
|
What is “domain sampling”?
|
-taking a sample of all possible behaviours that are indicative of a trait
|
|
How do we identify a construct when developing a test?
|
-come up with a list of what should be included and excluded from the construct
---e.g. want to measure risk-taking, but exclude drugs, risky sex, or criminal activity |
|
What are the two broad types of items (and responses) on a test?
|
-selected response format
-constructed response |
|
What is “selected response format”?
|
-multiple choice
-True/False -forced choice -rating scales |
|
What is a “constructed response”?
|
-essay
-tell a story -describe a picture |
|
For item formatting, how do achievement/aptitude tests differ from trait/attitude tests?
|
-achievement/aptitude: must select the response that is keyed as correct
-strength of trait/attitude: select the response that best describes themselves (no right/wrong) |
|
What are the three elements of a multiple choice question?
|
-A stem
-A correct response option -Several incorrect distracters |
|
What are some important considerations when writing multiple choice items?
|
-only one correct alternative
-answer options must match grammatically with stem -answer options of similar length -reasonable distractors -include as much of the item as possible in the stem to avoid unnecessary repetition |
|
What is scaling?
|
-the process of setting rules for assigning numbers in measurement
|
|
What is a Likert format item?
|
-point scale
-total score is computed as the sum of scores on individual items -usually an equal number of positive/negatives |
|
What were Likert’s original categories?
|
1 = strongly approve
2 = approve 3 = undecided 4 = disapprove 5 = strongly disapprove |
|
What is the first step in the test construction process?
|
-to generate an item pool
|
|
What are some common issues that arise in item writing?
|
-item length
-vocabulary -double-barreled -double negatives -ambiguity of meaning |
|
What is a Thurstone scale?
|
-score is the mean of the scale values of the statements they check
---e.g. capital punishment scale |
|
What is a Guttman scale?
|
-items range from sequentially weaker to stronger expressions of the attitude, belief or feeling
-all respondents who agree with the stronger statements of the attitude will also agree with milder statements -e.g. measuring “veganness” |
|
What are paired comparisons?
|
-choose between two items/statements
-e.g. I like to arrange flowers vs solve computer software problems |
|
What is cumulative scoring?
|
-the higher the score on the test, the higher the test taker is on the ability/trait/characteristic
-for each response made in a particular way (e.g., correct answer, answering “true”), the test taker earns cumulative credit |
|
What is ipsative scoring?
|
-measures relative strength of characteristics within the same individual
-lack of item independence makes pure ipsative data inappropriate for many statistical analyses -in pure ipsative scoring, the response choices are pitted against one another so that choosing one option, by definition, makes the respondent higher on one scale and lower on the other ---e.g. chocolate vs strawberry vs vanilla |
|
What are two examples of the empirical method of test development?
|
-criterion-keyed approach
-contrasted groups approach |
|
What is the contrasted groups approach of test development?
|
-example of empirical method
-focuses on looking for differences in the way two groups respond to the same item -select the items that best distinguish between the groups |
|
What is the criterion-keyed approach of test development?
|
-example of empirical method
-focuses on examining the correlation between each item and some criterion (e.g., performance in university) -good items correlate highly with criterion |
|
What are some important characteristics of the empirical method of test development?
|
-item content is only of secondary importance
-no theory guiding development: the test will predict, but not explain, behaviour -VERY criterion specific (what predicts one criterion well may do a very poor job predicting another) |
|
What is a product moment correlation looking for?
|
-the degree of relation present between two variables
|
|
How do you calculate Pearson’s r (in words)?
|
-transform all of the scores to Z scores
-multiply the Z scores for each person (e.g. midterm x final exam) -add all of those up -divide by n |
|
Draw an example of calculating Pearson’s r. (midterm and final exam scores)
|
|
|
What is “error score”?
|
-any deviation from the mean could be called error
|
|
What is “true score”?
|
-average observed value (over infinite # of measurements)
|
|
What is systematic measurement error?
|
-errors that are consistently in the same direction
|
|
What is random measurement error?
|
-errors that can be positive or negative
-average out to zero over an infinite number of measurements **basis for classical measurement theory |
|
What is the difference between systematic and random measurement error?
|
-systematic errors are consistently in the same direction
-random errors can be positive or negative (and average out to zero over an infinite number of measurements) |
|
What is the basis for classical measurement theory?
|
-random measurement error
|
|
What do theories of test reliability do?
|
-estimate the effects of inconsistency on the accuracy of psychological measurement
|
|
What is the basic equation of Test Theory?
|
X: score obtained by an individual on a test
T: individual’s “true” score on that attribute E: an error score associated with the measurement (features of the individual or the situation that can affect test scores but have nothing to do with the attribute being measured) |
|
What are the two basic assumptions of Test Theory?
|
1. X = T + E
2. Error is random (+ve or -ve) |
|
What are the implications of assuming error is random (Test Theory)? EQUATIONS.
|
* ε = Expected Value, which is the average value in the longrun (i.e., over an infinite number of measurements)
|
|
What are the implications of assuming error is random (Test Theory)?
|
-average error (over infinite # of measurements) = 0
-average observed value (over infinite # of measurements) = true score -the correlation between true score and error is zero (because errors are random) |
|
What is the equation to calculate a reliability coefficient?
|
*once we start calculating, the P becomes an r (b/c sample not population)
|
|
What is this equation for?
|
reliability coefficient
|
|
What does “reliability” indicate?
|
-the proportion of variance in tests scores that is due to or accounted by variability in TRUE scores
-a test with little measurement error = reliable --- the X – T differences are small |
|
Given observed score variance and reliability, how could you calculate the value of error variance?
|
-take proportion of variance attributable to error (1-reliability), then apply that % to observed score variance
---e.g. 1-.92 = .08 = 8%, 8% of 400 is 32 -plug it in to Pxx = true score variance/observed score variance -then subtract true score variance from observed score variance --e.g. .92 = true score variance/400, TSV = 368, 400-368=32 |
|
What is the definition of reliability according to the parallel test model?
|
-X’ is a randomly parallel test/item
-it is assumed that an infinite universe (domain, or population) of randomly parallel tests exist -reliability is the “correlation between any two randomly parallel tests” -high correlation = a high reliability |
|
What are the three different equation/definitions for reliability?
|
|
|
How can we assess reliability?
|
-test-retest reliability
-parallel forms reliability -split-half reliability -internal consistency (alpha) |
|
What is test-retest reliability an estimate of?
|
- stability
|
|
What are some problems with split-half reliability?
|
(1) Correlation is based on use of ½ the number of test items. Therefore, this is not the reliability of the test (correlation needs to be corrected to estimate reliability of full test)
(2) How do you decide to split the test? (e.g. Odd v.s even? First v.s. second half? Random?) ---too many ways, each way will provide a different answer! |
|
What is the solution to the problems with split-half reliability?
|
-use coefficient alpha
|
|
What is the equation to calculate Cronbach’s Alpha?
|
K = # of test items
i = variance of test ITEM x = variance of total score (sum on all items) |
|
What is this equation, and what do the parts of it mean?
|
K = # of test items
i = variance of test ITEM x = variance of total score (sum on all items) |
|
When is Chronbach’s alpha applicable?
|
- as an indicator of the internal consistency reliability of unifactor tests, or homogeneous measures
---i.e. the items should be correlated and measure one thing |
|
What are the three important factors that affect the reliability of a test?
|
1. The mean inter-item correlation of the test (r)
2. The length of the test (K) 3. The variance of the true scores in the sample (σ2T) |
|
What is the mean inter-item correlation of a test?
|
-r
-correlation between the items -can be interpreted as the average single item reliability (the extent to which each item represents an observation of the same thing observed by the other items) |
|
How can the mean inter-item correlation of a test be interpreted as? Why?
|
-the average single item reliability
-because each item can be considered a parallel mini test, and one definition of reliability is the correlation between two parallel tests |
|
Draw a graph/chart of mean inter-item correlation.
|
average of these cells = mean inter-item correlation
|
|
What is this equation and what do the symbols mean?
|
Standardized Alpha
K = # of test items rij = average inter-correlation among test items |
|
What is the equation for standardized alpha?
|
K = # of test items
rij = average inter-correlation among test items |
|
How does the length of the test affect its reliability?
|
-remember: ε(X) = T
-each item on a test is like a mini-test -the more items, the more the random errors cancel each other out (sometimes +, sometimes -) -longer the better! |
|
What is the Spearman Brown prophecy formula used for?
|
- can be used to predict the effect of lengthening a test (i.e., what will the new reliability estimate be once we make the test longer?)
|
|
What is the Spearman Brown prophecy formula?
|
newrxx = predicted “new” reliability
oldrxx = reliability of original test n = ratio of new to old test length of the form (new length/old length) |
|
What equation is this? What does each part mean?
|
newrxx = predicted “new” reliability
oldrxx = reliability of original test n = ratio of new to old test length of the form (new length/old length) |
|
What an important assumption of the Spearman Brown Prophecy Formula?
|
-items added are parallel to the original items (they measure the same thing)
|
|
What is this equation?
|
- Spearman-Brown rearranged in order to determine how much longer a test needs to be to obtain a desired reliability
|
|
How does the variance of the true scores in the sample affect the reliability of a test?
|
-increasing the variance of the true scores increases this ratio
-broad range of samples = greater true score variance --> variance of observed scores will increase -variance of errors should NOT increase (uncorrelated with true scores) -difficult to develop a reliable measure of an attribute that people do not differ very much on -same test might show different levels of reliability depending on the sample it’s used on |
|
What can tell us how much variability should be expected on the basis of measurement error?
|
-the Standard Error of Measurement
|