Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
77 Cards in this Set
- Front
- Back
Classical Test Theory |
Score X consists of a true score (T) and a random error component (E) ---> random errors around T have normal distribution and mean of 0 SD of E = standard error of measurement Ration of variance of T divided by variance of observed scores = reliability |
|
Descriptive & Inferential Statistics |
*Descriptive = quantitatively described main features of data, including central tendency (e.g., mean, median, mode) and variability (SD, variance); variability surrounding central tendency is reliability (tighter distribution of variability = higher reliability); higher reliability reduced sensitivity. leptokurtic = peaked kurtosis platykurtic = flat distribution *Inferential = Help you reach conclusions. T-Test, ANOVA, ANCOVA, Regression, etc. |
|
Item Response Theory |
Item scores are test independent; focused on item-level characteristics, whereas classical test theory focuses on the test-level characteristics Item-level responses are analyzed to compare probability of correct answer against an ability (item characteristic curve) e.g., GRE, GMAT |
|
Probability Theory |
Basic foundations of probability theory mirror games of chance |
|
Bayesian Methods |
Important for effort measures! (I.D. of improbable performances) Probability of B given A = A given B x prior probability of B / prior probability of A Posterior Probability, Prior Probability, and Likelihood |
|
Central Tendency in Practice |
Skewed distributions will alter the rank order of mean, median, and mode Regression toward the mean |
|
Normal Variance |
Variance is the average of the squared differences of each observation in a distribution from the mean. The SD is the square root of variance. |
|
Normalization and Transformations |
Take distributions of data points that depart from "true" normality in some way and transform them to be "fit" to a normal curve |
|
Creation of Normalized Standard Scores |
Most often done to put all measure of a protocol or battery on the same scale of comparison -Often done with T-Score, which is more nuanced than z-score |
|
Expected Good Performance in Unimpaired Populations |
e.g., JOLO, BNT, etc. - so sensitive to neurologic insult -These tests do not have normal distribution, so transformation distributions are used to force artificially even intervals -Percentiles are derived from natural distribution of raw score and are optimal to minimize misinterpretation of tests with non-normal distribution CAUTION: Be aware of true distribution for interpretation |
|
Reliability |
Consistency of the results under varying test administration conditions. Differences in test scores can be attributed to technically true differences versus chance errors Reliability Co-Efficient = 0.0 - 1.0 |
|
Test- Retest Reliability |
stability of scores on repeated administrations of a test to the same person Error variance = random fluctuation in performance Test-Retest interval also influences outcome |
|
Alternate Forms Reliability |
Captures both stability of performance over time and consistency of responses to different samples of items tapping the same knowledge of performance -Reduced error variance due to practice effects |
|
Split - half reliability |
evaluation of the internal consistence of a test by splitting it in half - some error may vary over the life of the testing session Spearman Brown Formula can be used to calculate the likely effect of lengthening a test to a certain number of items - lengthening increases consistency but not necessarily stability over time. |
|
Inter - Item Reliability |
Estimation of two sources of error: 1) content sampling 2) heterogeneity of the domain of knowledge of behavior Greater homogenity produces better interterm reliability. Kuder-Richardson formula (KR20) Cronbach Coefficient Alpha |
|
Interrater Reliability |
Scorer Variance that may affect outcome of tests when scoring is not highly standardized or when judgment comes into play |
|
Validity |
Accuracy with which meaningful and relevant measurements can be made with a test, protocol, etc. |
|
Content Validity |
degree to which the test covers a representative sample of the knowledge or behavior domain we seek to sample -reflects what information can be generalized from the test -Not the same as face validity! -Cannot be expressed as coefficient |
|
Predictive Validity |
represents the relative success with which the test predicts to a criterion we have define and set forth in advance -Represented as coefficient |
|
Concurrent Validity |
Represents degree to which a test measures what it intended to measure by looking at performance against a previously validated measure (can stand in for predictive validity) -Represented as coefficient |
|
Construct Validity |
The degree to which a test measures a psychological theoretical construct or trait |
|
Convergent Validity |
Demonstrated when two or more approaches to measurement of some trait are positively correlated |
|
Discriminant Validity |
Exemplified by a low correlations coefficient b/w similar approaches to measurement of different traits |
|
Multitrait Multimethod Matrix |
A composition of correlation coefficients of two or more traits and two or more methods Monotrait-Monomethod (reliability) Monotrait-Heteromethod (High = converg. val) Heterotrait-Monomethod (Low = discrim val) Heterotrait-Heteromethod (low = discrim val) |
|
Sensitivity & Specificity |
Describe how well the test discriminates b/w individuals with and without a specified condition.
Caution must be used when using different samples that vary from the original population (e.g., 50% Base Rate vs. 5% Base Rate would have lower sensitivity |
|
Predictive Value |
PPP tells us what proportion of the time we were correct when we stated that a condition was present based on a test result. NPP tells us what proportion of the time were correct stating on the basis of our test that an indivdiual does not have a condition |
|
Likelihood Ratios |
For a positive test, compares true positives to false positives For a negative test, compares false negatives to true negatives A likelihood ration of 1 = result is just as likely regardless Prevalence does NOT affect the statistic |
|
Pre-Test Probability |
Probability that a patient has a condition prior to knowing a test result (reflects base rate of condition) |
|
Post-Test Probability |
Probability that the patient has the condition given a positive result (how well the test rules in the condition) |
|
Incremental Validity |
Extent to which use of the test improves the post-test probability with respect to the pre-test probability Hard to achieve with very high or very low frequency conditions - better as Base Rate approaches 50% |
|
Choosing Cutoffs |
Goal: Minimize False positive errors Minimize false positive errors by maximizing specificity (e.g., goal for PVTs) - sacrifices sensitivity Maximize sensitivity which increases reduces specificity and raises risk for false positive errors (e.g., goal to id all ppl showing any degree of impairment in order to broadly offer intervention) |
|
Receiving Operator Characteristic Curve (ROC) |
Aids in determination of the best cut-off score that excludes that most people without the condition Plot of sensitivity (Y Axis) and 1-Specificity (X Axis) Cutoff chosen = should represent the point at which balance b/w sens. and spec. makes sense for condition |
|
Standard Error of Measurement (SEM) |
Empirical measure of error variance around a single true score based on the theoretical distribution and classical assumptions about errors including equal variances or error distributions. Hard to calculate b/c true score rarely known |
|
Standard Error of the Estimate (SEE) |
SD of true scores if the observed score is held constant -Given obtained score, clinician can calculate range of scores in which the true score is likely to fall and can interpret test performance using a confidence interval |
|
Confidence Intervals |
Allow for an estimate of uncertainty of an obtained test score based on properties of test (e.g., reliability) and properties of normative sample (e.g., distribution) As reliability decreased, est. true score closer to mean, and confidence band widens |
|
Low Score Interpretation |
This decision is based on assumptions of central limit theorem and probability statistics Prob. of abnormal scores in normal populations - so good to use pattern of test scores for diagnostics rather than psychometric variability alone |
|
Profile Analysis |
Plotting standardized scores on a battery of tests in terms of graph or profile and making inferences about cognitive functioning and diagnosis Interindividual and intraindividual interpretation |
|
Statistical vs. Clinical Significance |
Reliable Change Index = established minimum magnitude of change for certainty that two scores differ If statistically diffierent, must determine clinical meaninfulness |
|
Reliability of Difference Scores |
Reliability of difference for intra- and interindividual comparison |
|
Reliable Change Index |
Uses standard error of the difference and computes a z-score for the difference b/w scores based on normal probability distribution. RCI falling +/- 1.96 reflects significant change Must also be adjusted for practice effects |
|
Regressions Equations |
Regression Equations can be used to estimate pre-morbid intellectual functioning based on demographics and specific test performance, to account for demographics, or to assess change (factors in practice effects and regression to the mean) |
|
Percentiles to Z-Scores |
z = -3 ----> 0.13%ile z = -2.5 ---> 1st %ile z = -2 ---> 2nd &ile z = -1.5 ---> 7th z = -1 ---> 16th z = -.5 ---> 32nd z = 0 ---> 50th z = 0.5 ---> 70th z = 1 ---> 84th z = 1.5 ---> 94th z = 2.0 ---> 98th z = 2.5 ---> 99th z = 3 ---> 99.87 68% of population fall between -1 and 1 SD |
|
Approaches to NP Assessment |
Flexible Battery: domains systematically screened and detailed assessment where deficits noted Fixed Battery: e.g., Halsted-Reitan Process: focus on understanding of behavioral processes that occurred while obtaining score |
|
Deficit Measurement |
Contrasting scores against a comparison standard to determine reliable improvement or decline 1) Normative Comparison 2) Individual Comparison |
|
Normative Comparison |
Species Comparison Standard: capacity shared by all healthy humans (e.g., language dev't) Population Averages: Reflect Average Performance of a large sample; normal curve is expected but can be skewed [e.g., WCST FMS (+); BNT (-)] |
|
Individual Comparison |
Allows for determination of change w/in individuals - must determine pre-morbid fctng |
|
Pre-Morbid |
Can be developed from: Historical Records Estimates derived from psychometric methods Behavioral Observations See Stuckey Table 7.2 |
|
Sensitivity & Specificity |
High Sensitivity: rules out the diagnosis w/negative result (Sn-Nout) High Specifity: rules in the diagnosis w/positive result (Sp-Pin) Prevalence rates of given condition affect positive/negative predictive power but not sens. and spec. |
|
Post-Test Probability |
Probability of patient having condition with a positive diagnostic result |
|
A score reflects: |
1) Actual brain-behavior rx 2) measurement error 3) score bias (e.g., normative bias, culture, educ) 4) error in predicting pre-morbid ability 5) fatigue 6) poor motivation/engagement 7) pain 8) anxiety/depression |
|
Statistical Models for Defining Impairment |
1) Parametric Statistical Modeling: based on extrapolation of central limit theorem - relative to reference group 2) Bayesian Statistical Modeling: probability model using corrective variable to improve accuracy of prediction - based on individual comparison |
|
Establishing Cut-Offs for Impairment |
1) 1 SD below mean 2) -1.5 SD or -2.0 SD Using 1 SD increased sensitivity but increased false positives/decreases specificity Using 2SD increases specificity but decreases sensitivity/false negatives *Use same threshold throughout evaluation |
|
Sensitivity & Specificity & Hit Rate |
Sensitivity : Probability that a test identifies the presence of a condition Specificity: Probability that the test identifies the absence of the condition Hit Rate: Probability that a test correctly predict presence or absence of the condition |
|
Z- Scores Percentages to Percentiles |
1: 34, 2: 13.5, 3: 2.35, >3: 0.15% |
|
3 Moments of a Distribution |
1. Mean 2. Variance 3. Skew |
|
Floor/Ceiling Effects |
Truncated tails in the context of limitations in range of item difficulty. -High Floor: large proportion of examinees obtain scores at or near lowest possible score -Low Ceiling: high number of examinees obtains scores at or near highest possible score *May limit usefulness of test w/certain populations |
|
Normalizing Transformations |
Only use if: 1) they come from a large and representative sample -or- 2) any deviation from normality arises from defects in test rather than sample characteristics |
|
Homoscedasticity |
Uniform Variance Across the Range of Scores |
|
Extrapolation/Interpolations |
Use Mulitiple Regression to provide norms for missing cells, etc. |
|
Common Sources of Bias and Error in Test/Re-Test Situations |
BIAS -Intervening Variables (surgery, medical intervention, extraneous events) -Practice Effects -Demographic (maturational effects, education, gender, etc.) ERROR -Statistical Errors (measurement error, regression to the mean) -Random or uncontrolled events |
|
Test Stability Coefficients |
-Those provided in manual must be used with caution as normally based on normal populations with much shorter testing intervals than clinical populations -Some evidence that duration of interval has less of an impact than subject characteristics |
|
Adequate Reliability Coefficient? |
.80 or high needed for tests used in individual assessment .90 or higher for tests used to make important decisions (IQ) .95 is optimal standard Clinically acceptable? < .60 is unreliable; .60 marginal ; > .70 Adequate; > .80 High; > .90 Very High |
|
Limits of Reliability |
It is possible to have reliable test that is not valid, but the reverse is not possible! |
|
Obtained Scores |
x (Obtained score) = t (true score) + e (error) |
|
Reliable Change Index |
Indicator of the probability that an observed difference b/w two scores from the same examinee on the same test can be attributed measurement error. When there is low probability, one may infer that it reflects other factors (e.g., illness, treatment effects, prior exposure to test) |
|
Flynn Effect |
General trend for increased IQ's over time with each subsequent generation - estimated to contribute to an increase of 0.3 IQ points per year -some evidence that Flynn effect is more pronounced in fluid/nonverbal tests than crystallized/verbal tests |
|
Normal Curve demarcated by z-scores |
---I------I-----------I--------------I------------I----------I------I--- -3 -2 -1 0 +1 +2 +3 .15 2.35 13.5 34% 34% 13.5 2.35 |
|
Practice Effects |
Stability Coefficients are low: there may be no systematic effects of prior exposure, the rx of the prior exposure may be nonlinear, or ceiling effects/restriction of range related to prior exposure may be attenuating the coefficient |
|
Adequate Reliability |
If reliability is .85, 85% of variance can be accounted for by trait being examiner (15% is error variance) -Sattler--> .80 or higher are needed for tests used in individual assessment; .90 or above for tests used for decision making (e.g., IQ); .95 is optimal standard Clinically reliable? <.60 are unreliable >.60 moderately reliable >.70 relatively reliable (adequate) .80-.89 high .90+ very high |
|
Limits of reliability |
Is is possible to have a reliable test that is not valid, but you cannot have a test that is valid and not reliable |
|
Measurement Error |
Estimated true score = sum of the mean score of the group to which individual belongs and the deviation of his or her obtained score from the normative mean weighted by test reliability As reliability approaches 1, estimated trued scores approach obtained scores As reliability approaches 0, estimated true scores approach the mean test score So, estimated trues scores will always be closer to the mean than obtained scores are (except when obtained score = mean score) |
|
Standard Error of Measurement |
when sample SD and the reliability of obtained scores are known, an estimated of the SD of obtained scores about true scores may be calculated -provides estimate of the amount of error in a person's observed score -inversely related to the reliability of a test; the greater the reliability, the smaller the SEM, and the more confidence the examiner can have in precision of the score |
|
Standard Error of Estimation |
estimating confidence intervals for estimated true scores determines the likely range within which true scores fall |
|
Standard Error of Prediction |
the likely range of obtained scores expected on retesting with an alternate form |
|
Validity |
the degree to which a test actually measures what it is intended to measure rarely exceeds .40 or .30 |
|
Sensitivity Specificity |
Sensitivity = True positive/(true positives + false negatives) *proportion of condition of interest positive examinees who are correctly identified as such by a test Specificity = True negative/(true negative + false positive) *proportion of condition of interest negative examinees who are correctly classified as such by a test Positive Likelihood Ratio = single index of overall accuracy indicating the odds that positive test came from a COI+ examine As LR approaches 1, test classification approximates random assignment of examinees |
|
PPP |
Positive predictive power = probability that a pt with a positive test realist has the COI Negative predictive power = probability that a pt with negative test result does not have the COI when predictive power is .50, its are approximately equally likely to be COI+ as COI- |