• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/25

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

25 Cards in this Set

  • Front
  • Back

What characteristic deems a test relaible?

If the test is reasonably free from measurement error, it can be considered this.

How can we find a persons true score, or a score without error?

We cannot find this score. Classical test theory only assumed that a person has this.


(True Score)- (Observed score) = ?

The difference between these two scores leads to measurement error

According to basic sampling theory, what shape does the distribution of random error take?

A normal distribution/bell curve. The centre represents the true score, the distribution about the true score represents the error.

How can we estimate a persons true score?

we can estimate someones true score by finding the mean of their scores after repeated testing.

Why do we use a standard error of measurement?

because it is assumed that all people will have the same distribution of random error. The basic rate of error is represented by the standard deviation of error.

What is the domain sampling model?

this model considers problems that are caused when you use a limited number of items to


represent a larger and complicated construct.

What happens to reliability as a sample becomes larger according to the domain sampling model?

The reliability increases when sample sizes become larger. The entire domain is better represented.



Describe Item Response Theory

This testing strategy uses computers, which evaluate a persons skill level after each question, and adjusts the questions accordingly. This creates a more reliable estimate of ability.

Which three methods are commonly used to test for reliability?

test retest, parallel forms, and internal consistency

Describe the test-retest method. Under what conditions is it useful?

this method of measuring reliability is used to evaluate the error associated with administering a test at two different times.It can only be used when we are measuring a trait or characteristic that is constant.

What are two reasons that test scores might change after a period of time?

1) there is a change in the true score


2) measurement error

What is the carry over effect?

this effect takes place when the first testing session influences the scores from the second testing session. This is only an issue if the changes are random and unpredictable.

What are practice effects?

this is a type of carry over effect, where the test takers skills have improved with practice and their scores increase.

What is parallel forms reliability?

this involves comparing two tests that measure the same attribute. They use different items, but the rules used to select the items are the same.

What is the Split Half Method

this method involves randomly splitting a test into 2 halves that are scored separately, and the scored are compared to each other. The reliability of each half would be less reliable than the whole test. To correct for this we can use the spearman brown formula

What is the Kuder-Richardson 20?

This method is used in measuring the reliability of a test in which the items are dichotomous. (right or wrong, yes or no etc)

Coefficient alpha

this is a general way to estimate reliability, similar to the KR20, but can be used when items are not dichotomous

How is a difference score obtained, and how do you compare difference scores?

these scores are created by subtracting one test score from another. They are compared in standardized units or Z scores

Why are behavioural observation methods typically unreliable?

These are unreliable because of discrepancies between the true scores, and the scored recorded by the observer.

What is interrater/interscorer/interobserver/inderjudge reliability ?

this reliability considers the consistency among different judges



What does the Kappa statistic measure?

this is a method used for assessing the level of agreement among several observers. It includes correction for chance agreement. Uses a scale of 1 to -1

What is standard error of measurement? What are they used for.

this allows us to estimate the degree to which a test provides inaccurate readings. The larger the SEM, the less accurate the measurement. A small SEM tells us that an individual score is close to the measured value They are used to create confidence intervals around specific observed scores.

What is considered a good reliability estimate for tests?

It depends on the test, typically though, between .70 and .80 is good enough for most purposes. In clinical settings, higher ratings are needed (.90 and up)

What are 3 things we can do if reliability is low?

1) increase the number of test items, because longer tests are more reliable.


2) Remove items that reduce reliability by performing factor analysis or discriminability analysis - examine the correlation between each item and the total score


3) Correction for attenuation - estimating what the correlation between 2 items would have been if they had not been measured with error.