Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
KEY TERMS: Research/Stats/Test Construction

Key Terms: Research/stats/test Construction

by shyvo@yahoo.com, May 2005

Subjects: research statistics test construction

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/51

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

51 Cards in this Set

Front
Back

	IV vs DV	INDEPENDENT VARIABLE: what is being compared, can be manipulated or non-manipulated (pre-existing); DEPENDENT VARIABLE: outcome measure selected, can be nominal, ordinal, interval, or ratio
	True Experiment vs. Quasi-Experiment vs. Observational	TRUE EXPERIMENT: at least one IV is manipulated and subjects are randomly assigned; QUASI-EXPERIMENTAL: at least one IV is manipulated but there is non-random assignment (subjects already in pre-existing groups); OBSERVATIONAL: no intervention or manipulation, passive, non-experimental (e.g., study extent of cigarette smoking between adolescent boys and girls)
	Group vs. Single Subject Design	GROUP (NOMOTHETIC): Between groups (compares independent groups), Within subjects (correlated groups or subjects repeatedly measured), Mixed Design (groups that are independent and correlated); SINGLE SUBJECT (IDIOGRAPHIC): one of few subjects measured repeatedly; AB design, ABAB descign, Multiple Baseline, Simultaneous Tx Design
	AB vs ABAB vs Multiple Baseline vs. Simultaneous Tx vs. Changing Criterion	AB (baseline followed by tx; threat of hx); ABAB DESIGN (tx, baseline, tx, baseline; failure to return to baseline, ethics); MULTIPLE BASELINE (sequential or consecutive tx across subjects, situations, bx; time consuming and expensive); CHANGING CRITERION (attempt to change bx in increments to match changing criterion, gradual reduction)
	Simple Random Sampling vs. Stratified Random Sampling vs. Proportional Sampling vs. Cluster Sampling vs. Systematic Sampling	SIMPLE RANDOM SAMP: every member of pop has equal chance of being randomly selected; STRATIFIED RANDOM SAMP: pop divided into strata and then random samp of equal size from each stratum seleted; PROPORTIONAL SAMP: indviduals random select in proportion to their rep in gen pop; CLUSTER SAMP: i.d. naturally occuring groups of subjects and randomly select certain clusters, then survey all members of cluster; SYSTEMATIC SAMP: selecting every kth element after a random start
	Interval Recording vs. Event Sampling	INTERVAL RECORDING: time sampling, momentary or whole-interval, good when bx is not discrete and thus has no distinct beginning or end; EVENT SAMPLING: tallying # of times that target bx occurred, good when bx is discrete and occurs relatively infrequently
	Threats to Internal Validity	History (occurrence of external, related event), Maturation, Testing (practice effects), Instrumentation (change in observer or instrument calibration), Statistical Regression (from extreme twds the mean), Selection Bias (non-random assignment), Attrition, Diffusion (no-tx group gets some of the tx by mistake)
	Threats to Construct Validity	Attention and Contact w/Clients, Experimenter Expectancies (Rosenthal Effect, cues given to subjects inadvertently), Demand Characteristics (things in procedure that suggest how subject should behave)
	Threats to External Validity	Sample Characteristics (dfncs between sample and population), Stimulus Characteristics (artifical research arrangements, non-generalizable to real world), Contextual Characteristics (reactivity--bx based on being observed, e.g., Hawthorne effect)
	Threats to Statistical Conclusion Validity	Low Power (diminished ability to find significant results, e.g., small sample size, inadequate intervention), Unreliability of Measures, Unreliability of Procedures (inconsistency in tx procedures), Subject Heterogeneity
	Nominal vs. Ordinal vs. Interval vs. Ratio Data	NOMINAL: tallying people to find non-ordered category; no inherent order; no group mean (e.g., 100 subjects, tally based on gender, race); ORDINAL: tallying people to find ordered category; no group mean (e.g., 100 subjects, tally on attitude re: abortion); INTERVAL: obtain numerical scores for each person where score values have equal intervals; no zero score or zero isn't absolute (e.g., IQ score, t-score), can calculate group mean; RATIO: obtain numerical scores for each person where score values have equal intervals and absolute zero (e.g., savings in bank, EPPP score); can calculate group mean
	Descriptive vs. Inferential Statistics	DESCRIPTIVE STATS: data simply described; INFERENTIAL STATS: goal is to make inferences about population from the sample
	Mean vs. Mode vs. Median	MEAN: arithmetic average of group of data, add up all scores and divide by # of scores; MODE: most frequently occurring score; MEDIAN: the score at the 50th percentile
	Standard Deviation vs. Range vs. Variance	STANDARD DEVIATION: measure of avg deviation (spread) from mean in given set of scores; VARIANCE: standard deviation squared; RANGE: crudest measure of variability, the difference between the lowest and highest value obtained
	Criterion-Referenced vs. Norm-Referenced Scores	CRITERION-REFERENCED SCORE (aka Domain-Referenced Score): Percentage correct; NORM-REFERENCED SCORE (aka Standard Score): Z-score, t score, IQ score, percentile rank
	Z Scores	most basic standard score; correspond directly to SD units; have mean of zero and SD of one; z-score distribution will always be identical to raw score distribution; Z=score-mean/SD
	Standard Error of the Mean and Central Limit Theorem	STANDARD ERROR OF THE MEAN: the average amount of deviation of plotted means from sample distribution; CENTRAL LIMIT THEOREM: assuming an infinite number of equal sized sampels drawn from the population plotted, a normally distributed distribution of means will result
	Null Hypothesis vs. Alternative Hypothesis	NULL HYPOTHESIS: there are no differences between gorups, researcher hopes to reject this statement; ALTERNATIVE HYPOTHESIS: there are differences between groups
	Rejection vs. Retention Region	REJECTION REGION: at the tail end of the curve, aka region of unlikely values, size of region corresponds to ALPHA level, when obtained values fall in this region, the null hypothesis is rejected and it is concluded that there were tx effects; RETENTION REGION: aka acceptance region, when obtained values fall in this region, the null hypothesis is accepted and it is concluded that there were no tx effects
	Alpha vs. Beta	ALPHA=size of rejection region, the greater the size, the higher likelihood of Type I error (incorrectly rejecting the null hypothesis/differences erroneously found); BETA= the probability of making a Type II error, where the null is incorrectly accepted (no dfnces found where they really did exist); THERE IS AN INVERSE RELATIONSHIP BTWN ALPHA & BETA
	Type I vs. Type II Error	TYPE I ERROR: occurs when null is incorrectly rejected (related to size of ALPHA); TYPE II ERROR: occurs when null is incorrectly accepted (BETA)
	Power	The ability to correctly reject the null; increased when sample size is large, the magnitude of the intervention is large, random error is small, stats test is parametric and one-tailed; INVERSE rltnshp with BETA (POWER=1-beta); DIRECT rltnshp with ALPHA (the more alpha, the more power)
	t-test vs. ANOVA	Parametric tests used when the DV is interval or ratio: T-TEST=One IV only (e.g., type of tx), only one or two groups compared (e.g., effectiveness of CBT vs med for depression); ANOVA: One or More IVs (e.g., type of tx) and two or more groups are compared (e.g., effectiveness of CBT vs med vs combined tx for depression)
	One-Way ANOVA vs. Factorial ANOVA	ONE-WAY ANOVA: One IV only (e.g., type of tx), two or more groups are compared (e.g., effectiveness of CBT vs med vs combined tx for depression); FACTORIAL ANOVA: Two or more IVs (e.g., type of tx and sex), data for each IV is independent (e.g., effectiveness of CBT vs med and dfncs between men and women in tx of depression
	Split-plot ANOVA vs. Randomized block ANOVA vs. Repeated measures ANOVA	SPLIT PLOT ANOVA: Two or more IVs (e.g., type of tx and time), data for at least one IV are independent and for at least one IV are correlated (e.g., effectiveness of CBT vs med for tx of depression msrd before, during, and after tx); RANDOMIZED BLOCK ANOVA: 2 IVs, 2 groups or more per IV, both IVS=group independent, one blocked; REPEATED MEASURES FACTORIAL ANOVA: 2 IVS, 2 groups or more per IV, groups correlated
	MANOVA vs. ANCOVA	MANOVA: 2 or more DVs, 2 groups or more per IV, ind and/or corr groups, typically run when there is more than one DV (outcome measure) b/c of lower Type I error likelihood (than separate ANOVAs); ANCOVA: 2 IVS, 2 groups or more per IV, with covariate, ind and/or corr groups
	Main vs. Interaction Effects	MAIN EFFECTS (e.g., whether there were differences btwn ethnic groups or between different tx in reducing anxiety): INTERACTION EFFECTS: (e.g., whether one's ethnicity affects one's response to type of tx)
	Trend Analysis	Used with quantitative IV (e.g., dosage of drug, hours of food deprivaton) where the outcome is nonlinear--we are then less interested in differences between the groups and more in the trend of the data, or the ups and downs; an extension of the ANOVA
	Bivariate vs. Multivariate Correlation	BIVARIATE CORRELATION: involve two variables, X (predictor) and Y (criterion), where neither is an IV in the truest sense and what is being looked at is the relationships between the two variables, 3 basic assumption: linear rltnshp btwn X and Y, homoscedasticity, and unrestricted range of scores on X and Y; MULTIVARIATE CORRELATION: correlation between two or more IVs (X) and one DV (Y), where Y is always interval or ratio data and at least one X is interval or ratio data
	Least Squares Criterion
	Pearson r vs. Eta vs. Biserial Correlation	PEARSON R: both variables are continuous; ETA: curvilinear relationship between X and Y ("Old Aunt Eta" with curved abck); BISERIAL: one variable is artifical dichotomy and other variable is continuous ("Buy Cereal" with artifical sweeteners)
	Zero order vs. Partial vs. Semipartial Correlation	ZERO CORRELATION: no extraneous variables affecting rltnshp between X and Y; PARTIAL CORRELATION: looking at relationship between two variables with 3rd variable removed, aka First Order correlation (e.g., correlating SAT and GPA w/o parental education variable); SEMIPARTIAL CORRELATION: aka Part, looking at rltnshp between two variables where influence of the 3rd variable is removed from only one of the variables
	Multiple R vs. Canonical vs. Discriminant vs. Loglinear Correlation	MULTIPLE R (aka Multiple Correlation): correlation btwn 2 or more IVs (X) and one DV (Y) where Y interval/ratio and at least one X is interval/ratio; CANONICAL R: correlation between 2 or more IVs and 2 or more DVs; DISCRIMINANT Compensatory and can not be used to infer causal relationships FUNCTION ANALYSIS: 2 or more IVs and one DV (Y) but Y is nominal rather than invertal/ratio; LOGLINEAR ANAYLSIS: used to predict a categorical criterion (Y) based on categorical predictors (X)
	Correlation vs. Regressoin	CORRELATION: statistics that depict relationships between variables; REGRESSIONS: aka analyses, statistics that predict
	Path Analysis vs. LISREL	PATH ANALYSIS: statistical procedure that allows for testing a model specifying the causal links among vairables by applying multiple regression techniques, straight arrows->causal rltnshps (paths), curved arrows->correlational rltnshps; LISREL: computer program used for solving path diagrams, a type of structural equation modeling-way to etermine whether or not a given model of relationships among variables is correct
	Orthogonal vs. Oblique	ORTHOGONAL ROTATIONS: axes remain perpendicular and result in factors with NO correlation w/one another (communality=related term); OBLIQUE ROTATIONS: angle between axes is non-perpendicular and factors ARE correlated
	Factor Analysis vs. Cluster Analysis	FACTOR ANALYSIS: operates by extracting as many significant factors from the data as is possible in increasing strength (e.g., an attractiveness scale that measures 17 dfnt elements of attractiveness can be factor analyzed for whether the scale is measuring one vs several dimensions underlying attractiveness); CLUSTER ANALYSIS: involves gathering data on a variety of dependent variables and statistically looking for naturally occurring subgroups inthe data w/o any apriori hypotheses (e.g., entering MMPI-2 data for 100s of police and finding 3 basic profile groups)
	Reliability vs. Validity	RELIABILITY: consistency, repeatability, dependability, in scores obtained with a given test; VALIDITY: meaningfulness, usefulness, or accuracy of the test measuring what it is supposed to be measuring
	Test-Retest vs. Parallel Form vs. Internal Consistency vs. Interrater Reliability	TEST-RETEST RELIABILITY: aka the coefficient of stablitiy, involves correlating pairs of scores from the same sample of people tested 2x w/identical test; PARALLEL FORMS REL: aka the coefficient of equivalence, admin 2 roughly equivalent test to same group of people at 2 dfnt points in time; INTERNAL CONSISTENCY REL: looks at consistency of scores w/in the test, admin only 1x to one group of people--split test in half or use Kruder-Richardson or Cronback's coefficient alpha; INTERRATER REL: looks at degree of rel between 2 or more scorers when test is subjectively scored
	Spearman-Brown Prophecy Formula	Tells us how much more reliable the test would be if it were longer
	Split-half vs. Coefficient alpha and Kuder-Richardson	SPLIT-HALF RELIABILITY: calculated by splitting test in half and correlating scores on 2 halves with one another, for each person who is taking the test; KUDER-RICHARDSON and CRONBACH'S COEFFICIENT ALPHA: very sophisticated forms of internal consistency reliability that essentially involve the correlation of each item with every other item in the test
	Standard Error of Measurement	The standard deviation of a theoretically normal distribution of test scores obtained by one individual on equivalent tests; when a test is totally unreliable, the stand error of msrmt would be equal to the SD of the test; b/c of st er of msrmt we report scores using confidence bands/intervals
	Calculating confidence intervals	Create bell shaped distribution, plot the person's score in the middle, and use standard error of measurement to label the values at the z-scores (from -3 to + 3), scores are reported in terms of 3 possible confidence intervals: 68% (from -1 to +1), 95% (from -2 to +2) and 99% (from -3 to +3)
	Content vs. Criterion-related vs. Construct Validity	CONTENT VALIDITY: how adequately a test samples a particular content area; CRITERION-RELATED VALIDITY: how adequately a test score can be used to infer, predict, or estimate criterion outcome (e.g., SAT scores-> GPA); CONSTRUCT VALIDITY: how adequately a test measures a construct or trait (a hypothetical concept that typically cannot be measured directly)
	Concurrent vs. Predicitive validity	CONCURRENT VALIDITY: the predictor and criterion are measured and correlated at about the same time (e.g., post test and EPPP given w/in a few days of each other and correlated); PREDICTIVE VALIDITY: there is a delay between the measurement of the predictor and the criterion (e.g., using SAT scores to predict college GPA)
	Standard Error of Estimate	Amount of error in a predictor test's criterion-related validity; the standard deviation of a theoretically normal distribution of criterion scores obtained by one person measured repeatedly
	Taylor-Russell tables	A complete set of tables that numerically describe the amount of improvement in our selection decisions that will result from using a predictor test vs. no test at all: Base rate (the rate of selecting successful employees w/o any predictor test), Selection ratio (the proportion of available openings to available applicants), Incremental Validity (the amount of improvement in success rate that results from using a predictor test)
	False positives vs. True positives vs. False negatives vs. True negatives	FALSE POSITIVES: those incorrectly id as having what is being measured (not successful); TRUE POSITIVES: those correctly id as having what is being measured (successful); FALSE NEGATIVES: incorrectly id as not possessing what is being measured (successful); TRUE NEGATIVES: correctly identified as not possessing what is being measured (not successfull)
	Multi-trait Multi-method matrix	A table that allows us to determine whether our test has both convergent and divergent/discriminant validity, both of which are necessary for construct validity
	Convergent vs. Divergent (discriminant) validity	TYPES OF CONSTRUCT VALIDITY=CONVERGENT VALIDITY: the correlation between scores on the new test with other available measures of the same construct, the expected correlation should be moderate to high; DIVERGENT/DISCRIMINANT VALIDITY: the correlation between scores on the new test with scores on another test that measures a divergent construct, the expected correlation should be low
	Classical test theory vs. Item response theory (item characteristics curve)	ITEM RESPONSE THEORY: it is assumed that item performance is related to the amount of the respondent's latent trait (plot of relationship between item performance and total score)

Share This Flashcard Set

Set the Language

Key Terms: Research/stats/test Construction

Add to Folders

Upgrade to Cram Premium

Card Range To Study

51 Cards in this Set