Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Hint

Flashcards
»
CSR EPPP Stats, Test Construction

Csr Eppp Stats, Test Construction

by csr, Nov. 2007

Subjects: consistency construction reliability stats test validity

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/46

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

46 Cards in this Set

Front
Back
3rd side (hint)

When assesing a test's split half reliability, what effect will incrasing the heterogenity of the sample (S's) have?	Increase the reliability
What is eta squared	A measure of effect size describing the amount of variablity in the criterion accounted for by the predictor
What is Cohen's d	Cohen's d is a measure of effect size when comparing the mean difference between two groups
ANOVA:regression as Cohen's d:_________	eta squared
Professor Sharp argues against raising the legal drinking age from 18 to 21 on the ground that doing so will only encourage 18-, 19-, and 20-year olds to drink. Apparently, Sharp is familiar with which of the following theories: individuation psychological reactance inoculation expectancy-value	reactance
How do you calculate a CI for a S's score on a test	to calculate CI you need S's score SEM of the test level of confidence desired e.g. for a test w/ SEM of 5, if a S scored 105 and you wnated a 68% CI: 68% CI = within 1 SEM 105 +/- 5 68% CI = 100-110 for a 95 % CI (2 SEM) 105 +/- (2x5) 105 +/- 10 95 % CI = 95-115 99% CI (3 SEM) 105 +/- (3x5) 105 +/- 15 99% CI = 90-120
What is the formuula for SEM?	sigma/sqrt N
What formula is sigma [1-sqrt(1-r2xy) where sigma=sd of criterion (Y) scores and rxy= correlation coefficient (an estimate of criterion validity)	Standard Error of Estimate It is used to create CI's for estimates of performance on a criterion (Y variable) derived from a regression equation involving the predictor (X variable) score CI = Yhat +/- SEe(# of SEe from mean assoc. w/ desired CI) 68% CI = 1 SEE 95% CI = 2 SEE 99% CI = 3 SEE
What is criterion related validity	Criterion Related Validity is the extent to which a predictor (X) reliably estimates performance on a criterion (Y) using a regression equation
What is construct validity How is it assessed?	Construct Validity refers to the extent to which a predictor measures the concept it is deisned to. Assess it using facor analysis or multitrait/multimethod analysis
What are the 3 types of Construct validity?	1. Discriminant validity 2. Convergent validity Assess it using 1. Factor Analysis 2. Multitrait Multimethod Matrix	Constuct Validity = DC FM Construction of the Antena towers on Wisconsin DC recpeption of FM radio stations
What are the 2 types of Criterion Validity How do you assess them	predicitive and concurrent Predictive validity is the assessed by calculating the Standard Error of Estimate (SEest) SEest=SDy[sqrt(1-r2xy)] where SDy = SD of criterion scores (Y scores) and rxy2=validity coefficient (correlational coefficient indicating how much of the variance in Y is predicted by variability in X)	Crtierion Validity is PC
Reliability of a test= sum of what two things	Communality + specificity where communality= % of variance in DV (Y) explained by all of the IV (X) where Yhat=X put into the regression equation specificity = 1-error
What is Content Validity? For what kind of tests is it important	Content validity is important for: 1. Mastery tests (how much does S know about a specific Domain - like the EPPP) 2. Achievement tests (same as mastery?) 3. Developing tests of the KSA needed for a particular job (Use Job Analysis to determine the KSA) 4. job samples It can be evaluated by asking "experts" to examine the test and see if it seems representative of the knowledge in the domain. AND by testing to see: Does it discriminate between novices and experts in the field	Content Validity = 1. Discrimination between experts and novices 2.Experts evaluation of test, 3. Correltion w/ established measures, Content Validity=DEC
What are the 3 mnemonics for teh 3 types of validity	Construct DCFM towers Criterion is PC Content = ED and content validity is often all that elementary and secondary mastery tests are about	Think about progression of my time at AU Construct(ion) of DCFM towers on the way in Criterion=PC (the criterion for being liked on college campuses is are you PC) ED content = what was the content of my education at AU?
What is the nature of the relationship between a test's reliability and validity	A test's validity coefficient cannot exceed the square root of its reliability coefficient	Memorize it! Write out the formula for this to help figure out difficultly worded questions! Hints: 1. reliability (r2xx) is already squared 2. the relaionship involved involves taking the square root of something
In terms of percentile ranks, where do 1sd, 2sd and 3sd above the mean fall?	84 97.7 99	different from normal distribution
If the assumptions for a One-Way ANOVA are violated, what test could you use instead	Kruskal-Wallis
What are the 3 assumptions when using a parametric test?	1. NORMAL POPULATION DISTRIBUTION (not too serious as long as can compensate by using sample size >= 25-30) HOMOSCEDASTICITY - variances of the two populations are equal - if you violate this assumption, it's usally not too serious) 3. INDEPENDENT OBSERVATIONS* very serious - Incrases likelihood of Type I and Type II error Independent Observations
What 2 assumptions do Parametric and Non-parametric tests share?	1. Random Selection from Population 2. Independent Observations
If you have ordinal data w/ 2 IV's what test do you use to see if there are stat. sig differences between the groups?	Mann-Whitney U
If you have Nominal Data, what test do you use to compare cell means?	Chi-Square
If the assumption or normality is violated for a one-way ANOVA, what non-parametric test could you use instead?	Kruskal-Wallis
Non-parametric test to compare 2 indpendent groups?	Mann-Whitney U	U looks like two lines of data if you eliminate the bottom
Non-parametric test to compare 2 dependent (repeated-measures) groups?	Wilcoxan Matched Pairs Signed Rank Test
What test would you substitute for an independent measures t-test if the assumption of normality were violated or the data were ordinal?	Mann-Whitney U
What test would you substitute for a repeated measures t-test (or RM ANOVA w/ 1 IV) if the data were Ordinal or if the assumption of normality was violated?	Wilcoxan Matched-pairs signed rank test
If you want to use 2 or more continuous predictor variables (IV's) to predict preformance on nominal criterion (e.g. Yes/No or DSM-IV TR Dx.) what test should you use?	Discriminant Function Analysis
If the relationship between 2 or more discreet IV's (not contiuous) and a continuous Y variable, what type of test should you use?	Logistic Regression
If you want to test a pre-defined theory about causal relationship(s) between two vairables, and the relationship goes in one way (X causes Y, but not vice versa), what test (type of modling) should you use?	Path Analysis	Think of Life Path - it goes only one way
For what is the Spearman-Brown formula used?	correction for split-half reliability calculation	Split half - Spearman Brown - alliteravie, rhythm of saying them together
If you are assessing inter-item consistency (reliability), and variable is continuous, you would use:	Cronbach's alpha	e.g. score on MOCI for MA Thesis
If you are assessing inter-item consistency and the variable is dichotomous (e.g. yes/no), what would you use to correct for the overestimaton you would expect using Cronbach's alpha?:	Kuder Richardson-20	Kuder-Richardson 20 has a 2 in it and the DV on this kind of test has only 2 possible answers
To assess inter-rater reliability, what two things could you use?	1. simple percent agreement 2. Kappa statistic
How can you increase a test's reliability?	1. Increase Homogenity of test-items 2. Increase Heterogeneity of Subjects taking test 3. Increase test length
What are the three types of Validity	CRITERION (use regression, Standard Error of Estimate) 1. Predicdtive 2. Concurrent CONTENT (Ask an expert what s/he thinks) 1. Expert's opinion 2. Does it discriminate between experts and novices in the field? CONSTRUCT Two Types 1. Convergent 2. Discriminant Two Ways to Assess 1. Factor Analysis 2. Multi-trait, Multi-method	Three C's w/ two types of validity each 1. Content (expers' opinion) 2. Construct convergent validity, discriminant validity factor analysis; multi-trait, multimethod 3. Criterion Predictive Validity (regression, Standard Error of Estimate) Concurrent Validity
Validity of a Predictor is limited by it's	Reliability Validity <= (sqrt Reliability)	Think about the formula for calculating the Standard Error of Estimate (for Confidence Interval's when predicting score on Criterion from predictor using a regression equation) SEest = SDy*[1-(sqrt r2xy)]
Validity of Predictor is less than or equal to the sqrt and is less than or = to:	Validity of Predictor (rxy) is less than or equal to the square root of it's reliability of the predictor rxy <= sqrt(r2xx) and is less than or equal to the square root of the product of the reliability of the predictor times the reliability of the criterion rxy <= sqrt(r2xx*r2yy)
If you want to calculate the correlation between two variables which are both continuous, which of the following would you use? Biserial Point Biserial Rho Pearson r	Pearson r
If you want to calculate the correlation between 2 variables and one is continuous and the other is a true dichtomoy, what test would you use?	Point Biserial	If something is pointed, it is specific
If you want to calculate the relationship between two rank-ordered variables, which of the following would you use? Pearson r Biserial Point Biserial rho	rho
If you want to calculate the relationship between two variables, one of which is continuous and the other of which is an artificial dichotomy	Bserial
In Item Response Theory, when one creates an Item Characteristic Curve (ICC), what is on the X-axis and what is on the Y-axis	X-axis: S's ability level (low to high) Y-axis: % of S's who get item right
When using ICC, what does the slope indicate?	Disciminative ability for S's high and low in the ability being assessed (sharp slope discriminates well, flat slope does not)
When using ICC, how do you know what the chance of guessing correctly is?	Y-intercept
When using ICC, how do you determine how difficult an item is?	at what level of ability (where on the X-axis) do 50% of S's get the item right

Share This Flashcard Set

Set the Language

Csr Eppp Stats, Test Construction

Add to Folders

Upgrade to Cram Premium

Card Range To Study

46 Cards in this Set