• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/77

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

77 Cards in this Set

  • Front
  • Back

Classical Test Theory

Score X consists of a true score (T) and a random error component (E) ---> random errors around T have normal distribution and mean of 0




SD of E = standard error of measurement




Ration of variance of T divided by variance of observed scores = reliability





Descriptive & Inferential Statistics

*Descriptive = quantitatively described main features of data, including central tendency (e.g., mean, median, mode) and variability (SD, variance); variability surrounding central tendency is reliability (tighter distribution of variability = higher reliability); higher reliability reduced sensitivity.




leptokurtic = peaked kurtosis


platykurtic = flat distribution




*Inferential = Help you reach conclusions. T-Test, ANOVA, ANCOVA, Regression, etc.

Item Response Theory

Item scores are test independent; focused on item-level characteristics, whereas classical test theory focuses on the test-level characteristics




Item-level responses are analyzed to compare probability of correct answer against an ability (item characteristic curve)




e.g., GRE, GMAT

Probability Theory

Basic foundations of probability theory mirror games of chance

Bayesian Methods

Important for effort measures! (I.D. of improbable performances)




Probability of B given A = A given B x prior probability of B / prior probability of A




Posterior Probability, Prior Probability, and Likelihood

Central Tendency in Practice

Skewed distributions will alter the rank order of mean, median, and mode




Regression toward the mean

Normal Variance

Variance is the average of the squared differences of each observation in a distribution from the mean. The SD is the square root of variance.

Normalization and Transformations

Take distributions of data points that depart from "true" normality in some way and transform them to be "fit" to a normal curve

Creation of Normalized Standard Scores

Most often done to put all measure of a protocol or battery on the same scale of comparison




-Often done with T-Score, which is more nuanced than z-score

Expected Good Performance in Unimpaired Populations

e.g., JOLO, BNT, etc. - so sensitive to neurologic insult




-These tests do not have normal distribution, so transformation distributions are used to force artificially even intervals




-Percentiles are derived from natural distribution of raw score and are optimal to minimize misinterpretation of tests with non-normal distribution




CAUTION: Be aware of true distribution for interpretation

Reliability

Consistency of the results under varying test administration conditions.




Differences in test scores can be attributed to technically true differences versus chance errors




Reliability Co-Efficient = 0.0 - 1.0

Test- Retest Reliability

stability of scores on repeated administrations of a test to the same person




Error variance = random fluctuation in performance




Test-Retest interval also influences outcome

Alternate Forms Reliability

Captures both stability of performance over time and consistency of responses to different samples of items tapping the same knowledge of performance




-Reduced error variance due to practice effects

Split - half reliability

evaluation of the internal consistence of a test by splitting it in half - some error may vary over the life of the testing session




Spearman Brown Formula can be used to calculate the likely effect of lengthening a test to a certain number of items - lengthening increases consistency but not necessarily stability over time.

Inter - Item Reliability

Estimation of two sources of error:


1) content sampling


2) heterogeneity of the domain of knowledge of behavior




Greater homogenity produces better interterm reliability.




Kuder-Richardson formula (KR20)


Cronbach Coefficient Alpha

Interrater Reliability

Scorer Variance that may affect outcome of tests when scoring is not highly standardized or when judgment comes into play

Validity

Accuracy with which meaningful and relevant measurements can be made with a test, protocol, etc.

Content Validity

degree to which the test covers a representative sample of the knowledge or behavior domain we seek to sample




-reflects what information can be generalized from the test




-Not the same as face validity!




-Cannot be expressed as coefficient

Predictive Validity

represents the relative success with which the test predicts to a criterion we have define and set forth in advance




-Represented as coefficient

Concurrent Validity

Represents degree to which a test measures what it intended to measure by looking at performance against a previously validated measure




(can stand in for predictive validity)




-Represented as coefficient

Construct Validity

The degree to which a test measures a psychological theoretical construct or trait

Convergent Validity

Demonstrated when two or more approaches to measurement of some trait are positively correlated

Discriminant Validity

Exemplified by a low correlations coefficient b/w similar approaches to measurement of different traits

Multitrait Multimethod Matrix

A composition of correlation coefficients of two or more traits and two or more methods




Monotrait-Monomethod (reliability)


Monotrait-Heteromethod (High = converg. val)


Heterotrait-Monomethod (Low = discrim val)


Heterotrait-Heteromethod (low = discrim val)

Sensitivity & Specificity

Describe how well the test discriminates b/w individuals with and without a specified condition.



Caution must be used when using different samples that vary from the original population (e.g., 50% Base Rate vs. 5% Base Rate would have lower sensitivity

Predictive Value

PPP tells us what proportion of the time we were correct when we stated that a condition was present based on a test result.




NPP tells us what proportion of the time were correct stating on the basis of our test that an indivdiual does not have a condition

Likelihood Ratios

For a positive test, compares true positives to false positives




For a negative test, compares false negatives to true negatives




A likelihood ration of 1 = result is just as likely regardless




Prevalence does NOT affect the statistic

Pre-Test Probability

Probability that a patient has a condition prior to knowing a test result (reflects base rate of condition)

Post-Test Probability

Probability that the patient has the condition given a positive result (how well the test rules in the condition)

Incremental Validity

Extent to which use of the test improves the post-test probability with respect to the pre-test probability




Hard to achieve with very high or very low frequency conditions - better as Base Rate approaches 50%

Choosing Cutoffs

Goal: Minimize False positive errors




Minimize false positive errors by maximizing specificity (e.g., goal for PVTs) - sacrifices sensitivity




Maximize sensitivity which increases reduces specificity and raises risk for false positive errors (e.g., goal to id all ppl showing any degree of impairment in order to broadly offer intervention)

Receiving Operator Characteristic Curve (ROC)

Aids in determination of the best cut-off score that excludes that most people without the condition




Plot of sensitivity (Y Axis) and 1-Specificity (X Axis)




Cutoff chosen = should represent the point at which balance b/w sens. and spec. makes sense for condition

Standard Error of Measurement (SEM)

Empirical measure of error variance around a single true score based on the theoretical distribution and classical assumptions about errors including equal variances or error distributions.




Hard to calculate b/c true score rarely known

Standard Error of the Estimate (SEE)

SD of true scores if the observed score is held constant




-Given obtained score, clinician can calculate range of scores in which the true score is likely to fall and can interpret test performance using a confidence interval

Confidence Intervals

Allow for an estimate of uncertainty of an obtained test score based on properties of test (e.g., reliability) and properties of normative sample (e.g., distribution)




As reliability decreased, est. true score closer to mean, and confidence band widens





Low Score Interpretation

This decision is based on assumptions of central limit theorem and probability statistics




Prob. of abnormal scores in normal populations - so good to use pattern of test scores for diagnostics rather than psychometric variability alone

Profile Analysis

Plotting standardized scores on a battery of tests in terms of graph or profile and making inferences about cognitive functioning and diagnosis




Interindividual and intraindividual interpretation

Statistical vs. Clinical Significance

Reliable Change Index = established minimum magnitude of change for certainty that two scores differ




If statistically diffierent, must determine clinical meaninfulness

Reliability of Difference Scores

Reliability of difference for intra- and interindividual comparison

Reliable Change Index

Uses standard error of the difference and computes a z-score for the difference b/w scores based on normal probability distribution.




RCI falling +/- 1.96 reflects significant change




Must also be adjusted for practice effects

Regressions Equations

Regression Equations can be used to estimate pre-morbid intellectual functioning based on demographics and specific test performance, to account for demographics, or to assess change (factors in practice effects and regression to the mean)

Percentiles to Z-Scores

z = -3 ----> 0.13%ile


z = -2.5 ---> 1st %ile


z = -2 ---> 2nd &ile


z = -1.5 ---> 7th


z = -1 ---> 16th


z = -.5 ---> 32nd


z = 0 ---> 50th


z = 0.5 ---> 70th


z = 1 ---> 84th


z = 1.5 ---> 94th


z = 2.0 ---> 98th


z = 2.5 ---> 99th


z = 3 ---> 99.87




68% of population fall between -1 and 1 SD



Approaches to NP Assessment

Flexible Battery: domains systematically screened and detailed assessment where deficits noted




Fixed Battery: e.g., Halsted-Reitan




Process: focus on understanding of behavioral processes that occurred while obtaining score

Deficit Measurement

Contrasting scores against a comparison standard to determine reliable improvement or decline




1) Normative Comparison


2) Individual Comparison

Normative Comparison

Species Comparison Standard: capacity shared by all healthy humans (e.g., language dev't)




Population Averages: Reflect Average Performance of a large sample; normal curve is expected but can be skewed [e.g., WCST FMS (+); BNT (-)]

Individual Comparison

Allows for determination of change w/in individuals - must determine pre-morbid fctng





Pre-Morbid

Can be developed from:




Historical Records


Estimates derived from psychometric methods


Behavioral Observations




See Stuckey Table 7.2

Sensitivity & Specificity

High Sensitivity: rules out the diagnosis w/negative result (Sn-Nout)




High Specifity: rules in the diagnosis w/positive result (Sp-Pin)




Prevalence rates of given condition affect positive/negative predictive power but not sens. and spec.

Post-Test Probability

Probability of patient having condition with a positive diagnostic result

A score reflects:

1) Actual brain-behavior rx


2) measurement error


3) score bias (e.g., normative bias, culture, educ)


4) error in predicting pre-morbid ability


5) fatigue


6) poor motivation/engagement


7) pain


8) anxiety/depression

Statistical Models for Defining Impairment

1) Parametric Statistical Modeling: based on extrapolation of central limit theorem - relative to reference group




2) Bayesian Statistical Modeling: probability model using corrective variable to improve accuracy of prediction - based on individual comparison

Establishing Cut-Offs for Impairment

1) 1 SD below mean


2) -1.5 SD or -2.0 SD




Using 1 SD increased sensitivity but increased false positives/decreases specificity




Using 2SD increases specificity but decreases sensitivity/false negatives




*Use same threshold throughout evaluation

Sensitivity & Specificity & Hit Rate

Sensitivity : Probability that a test identifies the presence of a condition




Specificity: Probability that the test identifies the absence of the condition




Hit Rate: Probability that a test correctly predict presence or absence of the condition

Z- Scores Percentages to Percentiles

1: 34, 2: 13.5, 3: 2.35, >3: 0.15%

3 Moments of a Distribution

1. Mean


2. Variance


3. Skew



Floor/Ceiling Effects

Truncated tails in the context of limitations in range of item difficulty.




-High Floor: large proportion of examinees obtain scores at or near lowest possible score




-Low Ceiling: high number of examinees obtains scores at or near highest possible score




*May limit usefulness of test w/certain populations

Normalizing Transformations

Only use if:




1) they come from a large and representative sample


-or-


2) any deviation from normality arises from defects in test rather than sample characteristics

Homoscedasticity

Uniform Variance Across the Range of Scores

Extrapolation/Interpolations

Use Mulitiple Regression to provide norms for missing cells, etc.

Common Sources of Bias and Error in Test/Re-Test Situations

BIAS


-Intervening Variables (surgery, medical intervention, extraneous events)


-Practice Effects


-Demographic (maturational effects, education, gender, etc.)




ERROR


-Statistical Errors (measurement error, regression to the mean)


-Random or uncontrolled events

Test Stability Coefficients

-Those provided in manual must be used with caution as normally based on normal populations with much shorter testing intervals than clinical populations




-Some evidence that duration of interval has less of an impact than subject characteristics

Adequate Reliability Coefficient?

.80 or high needed for tests used in individual assessment




.90 or higher for tests used to make important decisions (IQ)




.95 is optimal standard




Clinically acceptable? < .60 is unreliable; .60 marginal ; > .70 Adequate; > .80 High; > .90 Very High



Limits of Reliability

It is possible to have reliable test that is not valid, but the reverse is not possible!

Obtained Scores

x (Obtained score) = t (true score) + e (error)

Reliable Change Index

Indicator of the probability that an observed difference b/w two scores from the same examinee on the same test can be attributed measurement error.




When there is low probability, one may infer that it reflects other factors (e.g., illness, treatment effects, prior exposure to test)

Flynn Effect

General trend for increased IQ's over time with each subsequent generation - estimated to contribute to an increase of 0.3 IQ points per year




-some evidence that Flynn effect is more pronounced in fluid/nonverbal tests than crystallized/verbal tests

Normal Curve demarcated by z-scores

---I------I-----------I--------------I------------I----------I------I---


-3 -2 -1 0 +1 +2 +3


.15 2.35 13.5 34% 34% 13.5 2.35



Practice Effects

Stability Coefficients are low: there may be no systematic effects of prior exposure, the rx of the prior exposure may be nonlinear, or ceiling effects/restriction of range related to prior exposure may be attenuating the coefficient

Adequate Reliability

If reliability is .85, 85% of variance can be accounted for by trait being examiner (15% is error variance)




-Sattler--> .80 or higher are needed for tests used in individual assessment; .90 or above for tests used for decision making (e.g., IQ); .95 is optimal standard




Clinically reliable?


<.60 are unreliable


>.60 moderately reliable


>.70 relatively reliable (adequate)


.80-.89 high


.90+ very high

Limits of reliability

Is is possible to have a reliable test that is not valid, but you cannot have a test that is valid and not reliable

Measurement Error

Estimated true score = sum of the mean score of the group to which individual belongs and the deviation of his or her obtained score from the normative mean weighted by test reliability




As reliability approaches 1, estimated trued scores approach obtained scores




As reliability approaches 0, estimated true scores approach the mean test score




So, estimated trues scores will always be closer to the mean than obtained scores are (except when obtained score = mean score)

Standard Error of Measurement

when sample SD and the reliability of obtained scores are known, an estimated of the SD of obtained scores about true scores may be calculated




-provides estimate of the amount of error in a person's observed score




-inversely related to the reliability of a test; the greater the reliability, the smaller the SEM, and the more confidence the examiner can have in precision of the score

Standard Error of Estimation

estimating confidence intervals for estimated true scores




determines the likely range within which true scores fall

Standard Error of Prediction

the likely range of obtained scores expected on retesting with an alternate form

Validity

the degree to which a test actually measures what it is intended to measure




rarely exceeds .40 or .30

Sensitivity


Specificity

Sensitivity = True positive/(true positives + false negatives)




*proportion of condition of interest positive examinees who are correctly identified as such by a test




Specificity = True negative/(true negative + false positive)




*proportion of condition of interest negative examinees who are correctly classified as such by a test




Positive Likelihood Ratio = single index of overall accuracy indicating the odds that positive test came from a COI+ examine




As LR approaches 1, test classification approximates random assignment of examinees

PPP

Positive predictive power = probability that a pt with a positive test realist has the COI




Negative predictive power = probability that a pt with negative test result does not have the COI




when predictive power is .50, its are approximately equally likely to be COI+ as COI-