Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
assessment test 2 (term 1 year 5)

Assessment Test 2 (Term 1 Year 5)

by fraser2017, Nov. 2022

Favorite

Add to folder

Flag

Related Essays

Importance Of Integrated Assessment In The Learning Process
Reliability in assessment refers to consistency and clarity of relationship between what is taught and what is assessed. There is a possibility of error occu...
Charater Reliability Paper
Reliability, or precision, can be defined as the degree to which a measurement is able to be reproduced with nearly the same value when evaluated several tim...
Standard Error Of Measurement Summary
According to this equation, if the test is perfectly reliable, the true score variance would equal to the observed score variance and the reliability would e...
Computation And Regression Analysis
If the deviation of errors of measurement associated with the test scores are high, the accuracy of the scores will be low due to measurement errors. Giv...
Reliability And Validity In Criminal Investigation
There are types of reliability; stability or test- retest, alternate form and internal consistency as Rose (2013) asserts. On the other hand, validity refe...
Fear Of Success Test Review
A way of finding reliability in the test is to use a test-retest method. Administering the test in separate points can be used while looking into how they fe...
Reliability And Validity Of Assessment
Psychological assessment measurements are appraised based on two key measurement constructs: reliability and validity. Reliability of an assessment denotes h...
Personality Evaluation
43). Reliability indicates whether a test shows consistent results of the measured aspect; validity indicates whether a test is showing results linked to the...
Test-Retest Reliability
Phelan, C. & Wren, J. (2005-2006). Exploring Reliability in Academic Assessment. Retrieved from https://www.uni.edu/chfasoa/reliabilityandvalidity.htm Pro...
Childhood Assessment Tools
Unreliable and invalid assessment tool. Having the experience to work with documents which are neither reliable nor valid. Moss (1994) discussed that reliab...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/52

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

52 Cards in this Set

Front
Back

	What are we trying to measure in reliability	When we get a test result from a difficult to measure psychological construct it may be an over or under estimate measure of the construct We need to know how much variability there is in the total test score (Need acceptable levels if consistency in order for them to be meaningful) We are trying to measure a person's true score on what the test is assessing What we actually measure is the observed score on the test
	What is the error of measurement	The difference between the true score and the obsevered score This error is estimated in the standard error of measurement
	What does the error of measurement or standard error of measurement mean	Doesn't mean mistake It is the variability of observed scores around the true score
	What is reliability	The degree the test scores are free from measurement errors Test scores that are free of errors are consistent and stable test results Higher the reliability (0-1) the lower the measurement error and the more confidence one can have that the observed score mirrors the true score
	When is reliability applicable	Whenever something is measured reliability is an issue Not limited to psychological tests Ex. Blood pressure tests have lower reliability than well constructed psychological test Economic indicators (GDP, poverty, SES) are unreliable
	Who created classical test theory	Also called theory of true and error scores Spearman 1907
	What does classical test theory assume	Assumed that a person has a true score that could be measured if there were no errors of measurement But there are errors These errors are the differences between the observed score and the true score Observed score X = Ture score + Error measurement Error measurement = Observed score X - true score
	What is the underlying assumption of classical test theory	Measurement errors are randomly distributed around the true score. Random means that chance factors or nonsystematic error increases or decrease observed scores If a person repeats the same test their results would produce a normal distribution of errors around each person's true score mean In a reliable test it is assumed that these error distributions overlap and differ only due to true scores
	Pooled variance of errors	Tells us the magnitude of the variability of the sample observed scores around the true score of the sample Pooled standard deviations from all test takers becomes the basic measure of the error present in a test
	Pooled standard deviation in classic test theory	Is called the standard error of measurement The mean of repeated testing is the true score estimate The standard deviation of all these measurements is the standard error of measurement
	What is the standard error of measurement used to calculate	Calculates the range of scores around the observed score within which the true score is likely to fall Allows us to calcite the confidence interval around the observed mean True score us believed to fall within + or - 1 standard deviation. (95%)
	What is domain sampling theory	Classical test theory contains elements of domain sampling theory The concern is to estimate true score from a limited sample of items where sampling from the full domain is impossible From a sample a true score is estimated
	What is the main problem that domain sampling theory explores	The problem is how much error of measurement is there from one sample of items This is an important issue when the sample of test items is small relative to the size of the domain of items Reliability increases as sample size approaches the size of the domain
	In order to overcome these issues domain sampling theory uses repeated random sampling of items from the domain	Each test is an unbiased estimate of the true score Due to measurement and sampling error these estimates will differ These differences will be random and normally distributed The mean of the correlations between thr various test scores is the test reliability One does not average the sample correlations. Each correlation is converted into a z score which are averaged and transformed back to a correlation
	What does domain sampling theory allows us to do	Allows for a calculation of maximum, unbiased reliability estimate that a test could achieve
	What are 3 scores of measurement error/ reliability (high reliability the lower the error)	Content sampling error Time sampling error Other sources of error
	What is content sampling error	Error that results from differences between the sample of items (test) and the domain of items it comes from Is the largest scores of error in test scores (should be concern) Is the easiest and most accurately estimated source of measurement error
	How is content sampling error determined	Is determined by how well the domain Is sampled Do the items sample all components of the domain? Do the items test all relevant forms of knowledge? Is estimated by analyzing the degree of similarity amoung the items making up the test -analyze the correlations between test items with the examinees standing on the construct being measured
	What are time sampling errors	Random changes in the test taker or testing environment impact test performance Error reflect random fluctuations in performance from one situation or time to another
	What do time sampling errors do	They limit our ability to generalized test results across different situations Major concern for psychological testing since tests are rarely given in exactly the same envirment Methods have been developed to estimate this error
	What are other possible sources of error	Include errors in testing, administrative and scoring Clerical errors committed while adding up a scores Administrative errors on an individually administered test When scoring relies heavily on the subjective judgement of the tester subtle discriminations in scoring can happen -need to calculate inter rater and inter scorer agreement
	Ratio of reliability	Reliability is usally expressed as a correlation coefficient but it is preferable to express it as a ratio of the variances of true score of a sample to the observer score of the sample In the ratio, reliability is the proportion of observed score variance that is accounted for by true score variance Closer to 1 the higher the reliability Can be rearranged so reliability of the observed score variance is equal to true score variance plus error variance
	How to get an estimate of error variance	1- variance of true divided by variance of observed 1- coefficient correlation Ex Is test reliability coefficient is .8 then error is .2 Means 80% of test score reflects true score variability and 20% reflects random, nonsystematic error variability
	What is reliability index	Reflects the correlation between true and observed scores Can't be calculated directly because true scores are unknown The index is equal to the square root of the reliability coedfienct Is .8 the reliability index is .9 This means that the correlation between observed score and true score is 0.9
	There are many ways to estimate reliability	Which way is chosen will depend on what the test is presumed to measure and what the test constructor wants to demonstrate
	Ways to estimate reliability	1) test retest reliability (stability coefficient) 2) Parallel forms reliability (alternate or equivalent forms) 3) split half reliability
	What is test retest reliability coeffienct	Assumes construct is stable over time -not used for constructs that change over time
	Error concept in test retest reialbiliy due to time intervals	If time interval is short we see random fluctuations (practice effects) If time interval is long we see random fluctuations, unknown sources of error, changes in the construct over time There is no single best time interval but the optimal interval is determined by the way the test results are to be used and the nature If tye construct
	What do postive correlations in teat retest reliability mean	Generalize across time (scores are stable) Low susceptibility to testing or test taker conditions Geberalize over testing environments
	Limitations of test retest correlations	Assumes construct is stable over time -not used for constructs that change over time Depending on the interval, correlations may be susceptible to carry over and practice effects -presence of either overestimates true reliability when effects are random Follows classical test theory -they assumes attribute stability -variability of test scores in assessment of the construct are seen as errors
	What us parallel forms reliability	Two or more equivalent but different forms of a test are given over several time periods and results are corrected Tests must be truly parallel in terms of content, difficult and over relevant characteristics
	Why is parallel forms the most informative form of reliability for psychological studies	1) contains estimate of consistency over time 2) contains two or more samples of items from the domain 3) can estimate error attributable to selection from item sets 4) practice or carry objects effects are reduced
	Two kinds of parallel testing ????.	Concurrent reliability -when the two tests are given close in time, Scorsese of error are die to random factors and content sampling of items
	The nature of items in parallel forms	Same number Cover the same domain Expressed in the same way Equal difficulty
	Drawbacks of parallel from reliability	Practice or carry over effects are reduced Practice or carry over effects change the meaning of second or their testing Creation of the many items needed for parallel forms is costly and time consuming
	What is split half reliability (internal reliability)	Reflect errors related to content sampling. These estimates are based on the relationship between items within the test Test responses are split in half and two halves are correlated Underestimates true reliability of the whole test -Spearman brown correction used to fix underestimate
	How can tests be split	Frist half second half -if test is long and all items are equal difficulty Odd even split -if test items are of increasing difficulty, practice effects, fatigue, or declining attention effect scores on later in the test
	Spearman brown correction	Assumed equal variances in both halves of the test The correction underlines the general point that reliability increases as the number of items drawn from the domain increases Rsb= 2r/ 1+ r Rsb is split half correlation r= half test score reliability
	KR20 (Kuder and Richardson)	Is a measure of internal reliability Considers all possible splits simultaneously This reliability measure can only be used for items that can be scored in a dichotomous manner (only 2 options) Not used today often
	Coefficient alpha	Is a measure of internal reliability Examines the consistency of responses to all test items regardless of how those items are scored Can be thought of as thr average of all possible split half coefficients corrected for the length of the whole test Sensitive to content sampling measurment error like split half reliability Also sensitive to heterogeneity of the test content
	What is heterogeneity of test content	The degree to which the test items measure unrelated characteristics An item heterogeneity increases alpha coefficient decreases
	Relationship between KR20 and cronbach coeffiencent alpha	KR20 is a simplified version of the alpha coefficient Alpha coefficient reduces to KR20 when all items are dichotomous
	What does cornbach coeffiecnt alpha tell us	Coefficient provides the lowest bound estimate of reliability A high alpha suggests that the true reliability is higher A low alpha only means that the true reliability may be higher To overcome this issue 95% confidence intervals around alpha can be made
	Limitations of coefficient alpha	It assumes Tau equivalence (T) or unidimensional factor structure All indicators (test items) of a factor all load or correlate in a similar manner on one dimension (item heterogeneity) When we don't have this Tau equivalence alpha coeffiecnt will underestimate the lists level of reliability
	McDonalds omega coefficient	Does not assume Tau equivalence and can be used to assess internal reliability for non equivalent Tau items Calculation of omega coefficient is not straightforward as it relys on the outcome of a structural equation model SPSS macros are available to do calculation
	Sources of errors that take place when estimating reliability for behavioral observation	Individuals scoring the test (judges) Rating errors Definitional issues Ignores errors due to item sampling
	What must be done to estimate true scores (relability) for observational and subjective beahviours	Interrater reliability (inter judge, inter scorer or inter observer ratings) must be calculated
	Interrater reliability	Refer to estimating the consistency amoung judges or raters who are evaluating the beahviour The percentage of agreement between raters is sometimes used as a measure of interrater reliability but percentages do not take into account chance level of agreement Kappa coeffiecnt used to account for chance of agreement for Ordinal level data
	Kappa coeffiecnt	agreement Range from +1 to -1 (leds agreement than expected by chance alone) Indicates actual level of agreement as a proportion of actual agreement is corrected for chance agreementRange from +1 to -1 (leds agreement than expected by chance alone) Greater than .75 is exceptional, .4 -.74 is satisfactory. Less than .4 is poorUsed when agreement is sought between two raters -more than two use fleiss interrater correlation or Krippendorffs alpha Greater than .75 is exceptional, .4 -.74 is satisfactory. Less than .4 is poor agreementRange from +1 to -1 (leds agreement than expected by chance alone) Greater than .75 is exceptional, .4 -.74 is satisfactory. Less than .4 is poorUsed when agreement is sought between two raters -more than two use fleiss interrater correlation or Krippendorffs alpha Used when agreement is sought between two raters -more than two use fleiss interrater correlation or Krippendorffs alpha
	When can Kappa coefficients be used (When the agreement in classification is of interest)	1) when a test administered at two different points in time to classify people into diagnostic groups, or groups such as who to hire and who to reject -person is classified to a group using obtained test scores on each occasion and the degree of agreement across times is compared via Kappa 2) one could use two different tests on the same group of people at the same point in time and classify them separately using each set of test scores and then compute the cross test agreement in classification with Kappa
	Know	KR20, split half reliability and coeffiecnt alpha are interrelated The maximum true value of a test retest correlation cannot exceed the square root of alpha
	What is the formula for Kappa	K = chance of disagreement - chance of agreement / chance of disagreement K = frequency of observed agreement - expected frequency / overall total frequency - expected frequency

Share This Flashcard Set