• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/56

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

56 Cards in this Set

  • Front
  • Back
What is raw score?
The direct numerical report of a person’s test performance
What do raw scores refer to?
Generally, the raw score refers to the number of correct answers.
What are raw scores really?
-raw scores are measurement, they have no inherent meaning.
They are not all measured in the same or equal units.
-the only way we can talk about how to evaluate those scores is to use a referent for interpretation.
What is a criterion referenced interpretation?
Translating a test score into a statement about the behavior to be expected of a person, with that score in relation to a specified subject matter.
How do teachers use criterion referenced interpretations?
design and use questions that have been shown to be answered correctly by those students who know the material.
In criterion referenced interpretation what is the AME?
-Assessment: the test given
-Measurement: the scores obtained
-Evaluation: interpretation of the scores
For CR interpretation how do we interpret scores?
Many times, this means identifying whether a person has mastered material based on comparing them to a cutscore.
based on the relation with the subject matter.

-Many times, this means identifying whether a person has mastered material based on comparing them to a cutscore.
What is a norm referenced test?
Score interpretation in which a student’s or group’s performance is compared to that of a norm group.
What are we interested in when it comes to norm referenced tests?
We are interested in a norm-referenced “interpretation” of test scores
What is the AME of norm referenced interpretations?
-Assessment: the test given
-Measurement: the scores obtained
-Evaluation: interpretation of the scores

For NR interpretation, we interpret scores based on comparing a person’s test score to the test scores of the person’s peers
In order to evaluate any Norm Referenced test what must we do?
we must evaluate the adequacy of the norm group.
Who's responsibility to develop suitable norms for each of the groups on which the test is to be used?
The test publishers.
Who's responsibility is it to only use tests normed for the particular sample in which they are interested?
The test user's.
What can happen when you use an inappropriate norm group?
Using an inappropriate norm group can have a radical impact on the interpretations of a student’s score, and the consequences of the score interpretations.
What are the four common types of norm groups?
-National norms
-State/Regional norms
-Special-Group norms
-Local norms
What is the most common norming method in educational testing?
National norms.
What is the interesting thing about national norms?
They are almost always reported separately by different age or grade levels.
What are Stratified sampling techniques used for?
to ensure the adequacy of the norm group
What must happen In order for us to compare a student’s performance to those from the same state/region?
the test must have been normed on those people from that region originally.
What is a good example of state/regional norms?
-If using a nationally-normed IQ test to place students into gifted programs, different states should not set different scores for admission to the program. If a child moves across state lines, he may or may not be gifted.
-For this reason, nationally-normed tests are preferred.
What is a special group norm used for?
Used for some decision making purposes when norms based on the general population would fail to enable us to make the necessary decisions
What is a good example of special norm groups?
My husband applied for a job in his field. They gave him a test to determine the extent to which he compared with “typical” people who work in the field.
Probably not normed nationally.
Instead, probably normed based on people in that field because they have different knowledge and background
What is something that is good to remember about speical norm groups?
It is not appropriate for diagnostic decisions to use a test that was normed using a special population unless it will only be used with that population
When are local norms used?
Used where people can compare performance with that of other people locally.
How are local norms used in education?
We want to compare our students’ test performance with that of other students from the same school. They are in the same environment.
What can be misleading about local norms in education?
However, this can be misleading if the mean score of the local students varies widely from the national mean.
The relative interpretation of students’ performance may be inflated by using local norms.
What do norm referenced interpretations do?
NR interpretations compare a person’s score to the scores of a norm group based on the distribution of the norm group.
How is the distribution of a norm referenced interpretation determined?
The distribution of the group is determined by the mean and standard deviation.
What is the shape of the distribution of scores like?
-a normal distribution.
-Assume that all items are equal in difficulty.
When a norm referenced group is used what will happen with the scores?
-When NR is used, the scores from the norm group will always be normally distributed.
-We assume traits we are measuring are normally distributed in the population, based on the central limit theorem.
-Upon repeated sampling, the score distribution will become more and more normal.
When a norm referenced group is used what will happen with the scores?
-When NR is used, the scores from the norm group will always be normally distributed.
-We assume traits we are measuring are normally distributed in the population, based on the central limit theorem.
-Upon repeated sampling, the score distribution will become more and more normal.
When a norm referenced group is used what will happen with the scores?
-When NR is used, the scores from the norm group will always be normally distributed.
-We assume traits we are measuring are normally distributed in the population, based on the central limit theorem.
-Upon repeated sampling, the score distribution will become more and more normal.
What is important to remember when determining a scoring metric?
-there are many normal distributions, each with a unique mean and standard deviation.
-we can choose which mean and standard deviation we want for the assessment.
In order to do this, we must first change our raw scores to a standard score.
-We call these standard scores z-scores.
-Z-scores have a set mean of zero and standard deviation of one.
When converting z scores?
-There are a number of standardized metric that are accepted and interpretable.
-We alleviate interpretation problems of negative scores and decimals with them.
What is important to remember about converting z scores?
-We literally choose the mean and standard deviation we want.
-We can do this with NR because we have a normal distribution and all we are doing is changing the scale.
What are the various types of Relative Status scores?
-Percentiles (percentile rank)
-Deciles
-Quartiles
-Grade equivalents
-Age equivalents
What is something interesting about relative status scores?
-There are other types of scores that are reported in NR testing systems.
-They are NOT standard scores.
-They do tell us about where a person falls relative to the scores from a norm group.
What are percentiles?
-The point in a score distribution describing the percentage of people below that point.
-e.g., If a person has a percentile rank of 46, then 46% of people have scores falling below this score.
What are the special percentile ranks?
-50th: median. The point at which 50% of the scores fall below and 50% of the score fall above.
-25th, 75th: The first and third quartile.
What is the first limitation of percentiles?
The size of percentile units is not constant in terms of standard-score units.
There are fewer people at extreme percentiles.
Thus, it takes a larger change in score to move up at extreme percentiles than it does at the middle of the distribution.
What is the second limitation with percentiles?
Gains and losses cannot be compared meaningfully because percentiles are not measured in equal units.
You cannot add, subtract, multiply, or divide percentiles.
A person at the 50th percentile does not have twice as much knowledge as a person at the 25th.
What is a great example of why percentile scores can be discieving?
For example, a student scores 86 on a standardized mathematics test scored like a Weschler IQ test.
This test represents a score that is within 1 SD of the mean, or an average ability.
However, this score corresponds to the 17th percentile.
When we see a score of “17th percentile,” we automatically assume that a student is in BIG trouble academically.
However, a percentile rank of 17 is still in the “average” range.
It is not until approximately the 5th percentile that the score suggests special needs.
As you can see, percentiles pose BIG interpretation problems!
What are decile and quartile scores?
-Decile and quartile ranks are very similar to percentile ranks.
-These scores try to cut out some of the interpretation problems inherent in percentile ranks.
-Decile:
How many people out of 10 score below you?
-Quartile:
Only scores of 0, 1, 2, and 3 exist.
Which quartile are you in?
What are age equivalents?
Age-equivalents: convey the meaning of test performance in terms of the typical child at a given age.
What are grade equivalents?
Grade-equivalents: provide information in terms of the typical child in a given grade.
What are one of the most common methods for reporting standardized test results prior to high school?
Grade equivalents.
What are some other types of ranked scores?
-Like percentiles, age- and grade-equivalents are two other types of derived, ranked scores.
-They are NOT standard scores.
-These two types of scoring metrics are built in generally the same way, so I will discuss grade-equivalents.
What are the steps to build grade equivalents?
To build grade-equivalents:
A norm group of students are given a test in September at the beginning of the school year.
If a student scores at the median of the scores from the 3rd graders in the norm group, we give them a grade-equivalent score of 3.0.
This stands for: 3rd grade, 1st month of school.
If a student scores at the median of the scores from the 4th graders in the norm group, we give them a grade equivalent score of 4.0
This stands for: 4th grade, 1st month of school.
To build grade-equivalents:
If a student’s score should fall between these two median scores from the norm group, “interpolation” is used to determine the grade-equivalent.
Each of the decimal points corresponds to one of the 10 months of the school year.
3.1 refers to the average performance for 3rd graders in October.
5.7 refers to the average performance for 5th graders in March.
If a student scores halfway between the median performance of beginning 3rd and 4th grade students, we assign them a grade-equivalent of 3.5.
This score corresponds to the median performance of a 3rd grade student halfway through 3rd grade (January).
What are the first five limitations with grade equivalents?
-Grade equivalents for low-scores in the low grades and high scores in the high grades are impossible to establish because they are extrapolated from existing observations.
Basically, all we have with grade equivalents is an educated guess.
Even in grades where norms exist, it is appropriate for 50% of the children in a classroom to score below grade level.
-Getting everyone to “grade level” is impossible because of how the scores are derived.
-Grade equivalents tend to exaggerate the significance of small differences in scores.
Because of large within-grade variability it is possible for a child only moderately below the median for his grade to appear as much as a year or two below grade level expectancies.
e.g., A 6th grader who obtains a grade equivalent of 1.3 does not mean the child functions on the same level as a 1st grader in the 4th month of school.
-Grade equivalents are not comparable across subject matter.
A person with a 6.6 in reading and a 6.2 in math can have a higher percentile score in math.
Grade equivalents are an artifact of the way the subject-matter in question is measured, and the curriculum of the particular school district.
-Grade equivalents assume that growth across years is uniform.
However, the rate of growth is actually larger for younger students.
What is interesting about other scores based on the normal distribution?
-These can be considered as standard scores like we discussed previously because they are based on a distribution mean and a constant standard deviation.
-However, they are developed a little bit differently.
What is interesting about stanine scores?
-Stanines have a mean of 5 and a standard deviation of 1.96.
-Although there is a constant mean and standard deviation, there are 9 scoring categories that are all ½ SD in size.
What is something interesting about STEN scores?
-STEN scores also have a mean of 5 and a standard deviation of 1.96, but we never talk about them that way.
-They differ from stanines in that there are 10 categories instead of 9.
How do you get a STEN score?
-The mean is the starting point, and no one category includes a means core.
-Start at the distribution mean, and go ½ SD to each side; scores for these two categories are 5 and 6.
Continue to build categories for upper and lower scores that are ½ SD in size until you have categories for score groups from 1 to 10.
How do you get a stanine score?
-Start at the distribution mean, and go ¼ SD to each side; score of 5.
-Continue to build categories for upper and lower scores that are ½ SD in size.
e.g., The score categories on either side of the mean will be scores of 4 and 6.
-Continue to build categories until you have categories for score groups from 1 to 9.
Why do many people not like percentiles?
-Because the relative distance between two percentile scores is uninterpretable.
In other words, what is the difference between a percentile of 66 and a percentile of 67?
-And, this is not the same distance between percentiles of 23 and 24.
Why were stanines developed exactly?
Stanines were developed to counteract some of the limitations with percentiles.
However, others feel that the categories are too big and that we don’t really know where a person ranks on the distribution given their stanine score.
For example, two people who both have stanines of 7 could have raw scores approximately ½ SD apart!