• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/33

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

33 Cards in this Set

  • Front
  • Back
Measurement
rules for assigning numbers to qualities of objects to designate the quantity of the attribute. Attributes do not inherently have numeric values; humans invent rules to measure attributes.

requires numbers to be assigned to objects according to rules. Rules for measuring temperature, weight, and other physical attributes are familiar to us. Rules for measuring many variables for nursing studies, however, have to be created. Whether data are collected by observation, self-report, or some other method, researchers must specify the criteria according to which numbers are to be assigned.
Advantages of Measurement
-removes guesswork and ambiguity in gathering and communicating information. Consider how handicapped health care professionals would be in the absence of measures of body temperature, blood pressure, and so on. Without such measures, subjective evaluations of clinical outcomes would have to be used. Because measurement is based on explicit rules, resulting information tends to be objective, that is, it can be independently verified. Two people measuring the weight of a person using the same scale would likely get identical results. Not all measures are completely objective, but most incorporate mechanisms for minimizing subjectivity.

Measurement also makes it possible to obtain reasonably precise information. Instead of describing Nathan as “tall,” we can depict him as being 6 feet 3 inches tall. If necessary, we could achieve even greater precision. Such precision allows researchers to make fine distinctions among people with different degrees of an attribute.

Finally, measurement is a language of communication. Numbers are less vague than words and can thus communicate information more clearly. If a researcher reported that the average oral temperature of a sample of patients was “somewhat high,” different readers might develop different conceptions about the sample’s physiologic state. If the researcher reported an average temperature of 99.6°F, however, there is no ambiguity.
Levels of Measurement:

Nominal measurement
-Lowest level

-using numbers simply to categorize attirbutes.

-ex: gender, blood type

-numbers do not have quant. meaning.
->ex: assigned males the number 1 and females 2

information only about categorical equivalence and nonequivalence and so the numbers cannot be treated mathematically. It is nonsensical, for example, to compute the average gender of the sample by adding the numeric values of the codes and dividing by the number of participants.
Levels of Measurement:

Ordinal Measurement
Ranks objects based on their relative standing on an attribute.

-Ex: orders people from heaviest to lightest
-Ex: ability to perform activities of daily living: 1 = completely dependent; 2 = needs another person’s assistance; 3 = needs mechanical assistance; and 4 = completely independent. The numbers signify incremental ability to perform activities of daily living independently.

does not, however, tell us how much greater one level is than another. For example, we do not know if being completely independent is twice as good as needing mechanical assistance. As with nominal measures, the mathematic operations permissible with ordinal-level data are restricted.
Levels of measurement:

Interval measurement
occurs when researchers can specify the ranking of objects on an attribute and the distance between those objects.

example, the Stanford-Binet Intelligence Scale—a standardized intelligence (IQ) test used in many countries—is an interval measure. A score of 140 on the Stanford-Binet is higher than a score of 120, which, in turn, is higher than 100. Moreover, the difference between 140 and 120 is presumed to be equivalent to the difference between 120 and 100. Interval scales expand analytic possibilities: interval-level data can be averaged meaningfully, for example. Many sophisticated statistical procedures require interval measurements.
Levels of measurement: Ratio measurement
the highest level. Ratio scales, unlike interval scales, have a rational, meaningful zero and therefore provide information about the absolute magnitude of the attribute. The Fahrenheit scale for measuring temperature (interval measurement) has an arbitrary zero point. Zero on the thermometer does not signify the absence of heat; it would not be appropriate to say that 60°F is twice as hot as 30°F. Many physical measures, however, are ratio measures with a real zero. A person’s weight, for example, is a ratio measure. It is acceptable to say that someone who weighs 200 pounds is twice as heavy as someone who weighs 100 pounds. Statistical procedures suitable for interval data are also appropriate for ratio-level data.
How to tell btw levels of measurement tip:
How can you tell the measurement level of a variable? A variable is nominal if the values could be interchanged (e.g., 1 = male, 2 = female OR 1 = female, 2 = male—the codes are arbitrary). A variable is usually ordinal if there is a quantitative ordering of values AND if there are only a small number of values (e.g., very important, important, not too important, unimportant). A variable is usually considered interval if it is measured with a composite scale or psychological test. A variable is ratiolevel if it makes sense to say that one value is twice that of another (e.g., 100 mg is twice as much as 50 mg).
Errors of measurement:

Obtained score
Obtained score = True score ± Error

The obtained (or observed) score could be, for example, a patient’s heart rate or score on an anxiety scale.
Errors of measurement: True score and errors of measurement
is the true value that would be obtained if it were possible to have an infallible measure. The true score is hypothetical—it can never be known because measures are not infallible.

The difference between true and obtained scores is the result of distorting factors. Some errors are random or variable, whereas others are systematic, representing a source of bias. The most common factors contributing to measurement error are the following:
The most common factors contributing to measurement error are the following:
The most common factors contributing to measurement error are the following:

Situational contaminants. Scores can be affected by the conditions under which they are produced. For example, environmental factors (e.g., temperature, lighting, time of day) can be sources of measurement error.

Response-set biases. Relatively enduring characteristics of respondents can interfere with accurate measurements (see Chapter 13).

Transitory personal factors. Temporary states, such as fatigue, hunger, or mood, can influence people’s motivation or ability to cooperate, act naturally, or do their best.

Administration variations. Alterations in the methods of collecting data from one person to the next can affect obtained scores. For example, if some physiologic measures are taken before a feeding and others are taken after a feeding, used to measure an attribute. For example, a student’s score on a 100-item test then measurement errors can potentially occur.

Item sampling. Errors can be introduced as a result of the sampling of items of research methods will be influenced somewhat by which 100 questions are included.
Reliability
quantitative measure is a major criterion for assessing its quality. Reliability is the consistency with which an instrument measures the attribute. If a scale weighed a person at 120 pounds one minute and 150 pounds the next, we would consider it unreliable. The less variation an instrument produces in repeated measurements, the higher its reliability.

Reliability also concerns a measure’s accuracy. An instrument is reliable to the extent that its measures reflect true scores—that is, to the extent that measurement errors are absent from obtained scores. A reliable instrument maximizes the true score component and minimizes the error component of an obtained score.

Three aspects of reliability are of interest to quantitative researchers: stability, internal consistency, and equivalence.
Stability
extent to which similar results are obtained on two separate occasions. The reliability estimate focuses on the instrument’s susceptibility to extraneous influences over time, such as participant fatigue. Assessments of stability are made through test–retest reliability procedures. Researchers administer the same measure to a sample twice and then compare the scores.
Reliability coefficient
numeric index that quantifies an instrument’s reliability, to objectively determine how small the differences are. Reliability coefficients (designated as r) range from .00 to 1.00.[*] The higher the value, the more reliable (stable) is the measuring instrument. In the example shown in Table 14.1, the reliability coefficient is .95, which is quite high.
Test - retest reliability
Test–retest reliability is relatively easy to compute, but a major problem with this approach is that many traits do change over time, independently of the instrument’s stability. Attitudes, mood, knowledge, and so forth can be modified by experiences between two measurements. Thus, stability indexes are most appropriate for relatively enduring characteristics, such as temperament. Even with such traits, test–retest reliability tends to decline as the interval between the two administrations increases.
Internal consistency reliability.... and Coefficient alpha (Cronbach's alpha)
Internal consistency reliability is the most widely used reliability approach among nurse researchers. This approach is the best means of assessing an especially important source of measurement error in psychosocial instruments, the sampling of items. Internal consistency is usually evaluated by calculating coefficient alpha (or Cronbach’s alpha). The normal range of values for coefficient alpha is between .00 and +1.00. The higher the reliability coefficient, the more accurate (internally consistent) the measure.
Equivalence
Equivalence, in the context of reliability assessment, primarily concerns the degree to which two or more independent observers or coders agree about the scoring on an instrument. With a high level of agreement, the assumption is that measurement errors have been minimized.
Interrater (interobserver) reliability procedures
The degree of error can be assessed through interrater (or interobserver) reliability procedures, which involve having two or more trained observers or coders make simultaneous, independent observations. An index of equivalence or agreement is then calculated with these data to evaluate the strength of the relationship between the ratings. When two independent observers score some phenomenon congruently, the scores are likely to be accurate and reliable.
Random...
Review page

Computation procedures for reliability coefficients are not presented in this textbook, but formulas can be found in Polit (1996) or Waltz et al. (2005). Although reliability coefficients can technically be less than .00 (i.e., a negative value), they are almost invariably a number between .00 and 1.00.

review page 376
Validity
important criterion for evaluating a quantitative instrument

degree to which an instrument measures what it is supposed to measure. When researchers develop an instrument to measure hopelessness, how can they be sure that resulting scores validly reflect this construct and not something else, such as depression?

Reliability and validity are not totally independent qualities of an instrument. A measuring device that is unreliable cannot possibly be valid. An instrument cannot validly measure an attribute if it is erratic and inaccurate. An instrument can, however, be reliable without being valid. Suppose we wanted to assess patients’ anxiety by measuring the circumference of their wrists. We could obtain highly accurate and precise measurements of wrist circumferences, but such measures would not be valid indicators of anxiety. Thus, the high reliability of an instrument provides no evidence of its validity; low reliability of a measure is evidence of low validity.

As with reliability, validity has a number of aspects. One aspect is known as face validity. Face validity refers to whether the instrument looks as though it is measuring the appropriate construct, especially to people who will be completing the instrument.

three other aspects of validity are of greater importance in assessments of an instrument: content validity, criterion-related validity, and construct validity.
Content Validity
concerns the degree to which an instrument has an appropriate sample of items for the construct being measured and adequately covers the construct domain. Content validity is crucial for tests of knowledge, where the content validity question is: “How representative are the questions on this test of the universe of questions on this topic?”

Content validity is also relevant in measures of complex psychosocial traits. Researchers designing a new instrument should begin with a thorough conceptualization of the construct so the instrument can capture the full content domain. Such a conceptualization might come from rich first-hand knowledge, an exhaustive literature review, or findings from a qualitative inquiry.

An instrument’s content validity is necessarily based on judgment. No totally objective methods exist for ensuring the adequate content coverage of an instrument, but it is increasingly common to use a panel of substantive experts to evaluate the content validity of new instruments. Researchers typically calculate a content validity index (CVI) that indicates the extent of expert agreement. We have suggested a CVI value of .90 as the standard for establishing excellence in a scale’s content validity (Polit & Beck, 2006).
Criterion Validity
ssessments, researchers seek to establish a relationship between scores on an instrument and some external criterion. The instrument, whatever abstract attribute it is measuring, is said to be valid if its scores correspond strongly with scores on the criterion.

After a criterion is established, validity can be estimated easily. A validity coefficient is computed by using a mathematic formula that correlates scores on the instrument with scores on the criterion variable. The magnitude of the coefficient is an estimate of the instrument’s validity. These coefficients (r) range between .00 and 1.00, with higher values indicating greater criterion-related validity. Coefficients of .70 or higher are desirable.

Sometimes a distinction is made between two types of criterion-related validity.


Validation via the criterion-related approach is most often used in applied or practically oriented research. Criterion-related validity is helpful in assisting decision makers by giving them some assurance that their decisions will be effective, fair, and, in short, valid.
Predictive Validity
Predictive validity refers to an instrument’s ability to differentiate between people’s performances or behaviors on a future criterion. When a school of nursing correlates students’ incoming high school grades with their subsequent grade-point averages, the predictive validity of high school grades for nursing school performance is being evaluated
Concurrent Validity
refers to an instrument’s ability to distinguish among people who differ in their present status on some criterion. For example, a psychological test to differentiate between patients in a mental institution who could and could not be released could be correlated with current behavioral ratings of health care personnel. The difference between predictive and concurrent validity, then, is the difference in the timing of obtaining measurements on a criteri
Construct validity
a key criterion for assessing the quality of a study, and construct validity has most often been linked to measurement issues. The key construct validity questions with regard to measurement are: “What is this instrument really measuring?” and “Does it validly measure the abstract concept of interest?” The more abstract the concept, the more difficult it is to establish construct validity, however; at the same time, the more abstract the concept, the less suitable it is to rely on criterion-related validity. What objective criterion is there for such concepts as empathy, role conflict, or separation anxiety?

Construct validation is essentially a hypothesis-testing endeavor, typically linked to a theoretical perspective about the construct. In validating a measure of death anxiety, we would be less concerned with its relationship to a criterion than with its correspondence to a cogent conceptualization of death anxiety.

Construct validation can be approached in several ways, but it always involves logical analysis and testing relationships predicted on the basis of firmly grounded conceptualizations. Constructs are explicated in terms of other abstract concepts; researchers make predictions about the manner in which the target construct will function in relation to other constructs.

One approach to construct validation is the known-groups technique. I

Another method involves examining relationships based on theoretical predictions.

Another approach to construct validation employs a statistical procedure known as factor analysis,

In summary, construct validation employs both logical and empirical procedures. As with content validity, construct validity requires a judgment pertaining to what the instrument is measuring. Construct validity and criterion-related validity share an empirical component, but, in the latter case, there is a pragmatic, objective criterion with which to compare a measure rather than a second measure of an abstract theoretical construct.
One approach to construct validation is the known-groups technique.
groups that are expected to differ on the target attribute are administered the instrument, and group scores are compared. For instance, in validating a measure of fear of the labor experience, the scores of primiparas and multiparas could be contrasted. Women who had never given birth would likely experience more anxiety than women who had already had children; one might question the validity of the instrument if such differences did not emerge.
One approach to construct validation : examining relationships based on theoretical predictions.
Researchers might reason as follows: According to theory, construct X is related to construct Y; scales A and B are measures of constructs X and Y, respectively; scores on the two scales are related to each other, as predicted by the theory; therefore, it is inferred that A and B are valid measures of X and Y. This logical analysis is fallible, but it does offer supporting evidence.
Another approach to construct validation employs a statistical procedure known as factor analysis,
which is a method for identifying clusters of related items on a scale. The procedure is used to identify and group together different measures of some underlying attribute and to distinguish them from measures of different attributes.
Interpretation of validity
As with reliability, validity is not an all-or-nothing characteristic of an instrument. An instrument does not possess or lack validity; it is a question of degree. An instrument’s validity is not proved, established, or verified but rather is supported to a greater or lesser extent by evidence.

Strictly speaking, researchers do not validate an instrument but rather an application of it. A measure of anxiety may be valid for presurgical patients on the day of an operation but may not be valid for nursing students on the day of a test. Of course, some instruments may be valid for a wide range of uses with different types of samples, but each use requires new supporting evidence. The more evidence that can be gathered that an instrument is measuring what it is supposed to be measuring, the more confidence people will have in its validity.
Sensitivity
the ability of a measure to identify a “case” correctly, that is, to screen in or diagnosis a condition correctly. A measure’s sensitivity is its rate of yielding “true positives.”
Specificity
measure’s ability to identify noncases correctly, that is, to screen out those without the condition. Specificity is an instrument’s rate of yielding “true negatives.” To determine an instrument’s sensitivity and specificity, researchers need a reliable and valid criterion of “caseness” against which scores on the instrument can be assessed.
Tip on sensitivity and specificity
There is, unfortunately, a tradeoff between the sensitivity and specificity of an instrument. When sensitivity is increased to include more true–positive cases the number of false–negative cases increases. Therefore, a critical task is to develop the appropriate cut-off point (i.e., the score value used to distinguish cases and noncases). Instrument developers use sophisticated procedures to make such a determination.
Liklihood ratios
come into favor because it summarizes the relationship between specificity and sensitivity in a single number. The likelihood ratio addresses the question, “How much more likely are we to find that an indicator is positive among those with the outcome of concern compared with those for whom the indicator is negative?” For a positive test result, then, the likelihood ratio (LR+) is the ratio of true–positive results to false–positive results. The formula for LR–is sensitivity, divided by 1 minus specificity. For the data in Table 14.2, LR+ is 2.99: we are about three times as likely to find that a self-report of smoking really is for a true smoker than it is for a non-smoker. For a negative test result, the likelihood ratio (LR–) is the ratio of false–negative results to true–negative results. For the data in Table 14.2, the LR–is .60. In this example, we are about half as likely to find that a self-report of non-smoking is false than we are to find that it reflects a true nonsmoker. When a test is high on both sensitivity and specificity (which is not especially true in our example), the LR+ value is high and the LR–value is low.
Guidelines for critiquing data quality in quant. studies
Page 383