 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
Reading...
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
Play button
Play button
67 Cards in this Set
 Front
 Back
Exploratory Factor Analysis

a multivariate methods that allows you to explore the underlying structure of variables
• provides the tools for analyzing the structure of the correlations among a large number of variables by defining sets of variables (factors) that are interrelated 
Exploratory

no a priori predictions about how variables should look;

Confirmatory

see if variables confirm your predictions about variable structure

Stage 1

Objectives of Factor Analysis  How objectives fit with the RQ
Specify Unit of Analysis Factor Analysis Outcomes Variable Selection 
Unit of Analysis

R Factor vs. Q Factor
R Factor identifies latent variables (not easily observed) Q is used to reduce people into groups 
Factor Analysis Outcomes:
Data Summarization 
Dimensions that describe data in a small number of concepts

Data Reduction

Extends summarization by providing factor score for each dimension (factor)

Variable Selection

Use appropriate judgement
Dont get garbage in, garbage out 
Stage 2: Designing a Factor Analysis
Rules of Thumb 
Calculate input data  R vs. Q
Variable selection  mostly metric ~ 5 metrics for proposed dimensions Sample Size  50 minimum; 100 to 200 preferred 5x as many subjects as proposed variables 10:1; 20:1 is better 
Stage 3: Assumptions
Conceptual 
Some structure does exist
Patterns are appropriate Homogeneous sample 
EFA Assumptions
Statistical 
Normality
Multicolinearity is desired  Should be > .30  Partial correlations > .7 are awesome 
KMO Statistic

Predicts if data is likely to factor well based on correlation and partial correlation
 Identify which factors to drop 
KMO Rules of Thumb

> .5 is required to proceed
> .7 or .8 is very good If <.5, remove variable with lowest KMO score one at a time until KMO scores are > .5 
Bartlett Sphericity

Examines entire correlation matrix
Stat. sig. at p < .05, meaning correlations exist among the variables 
Stage 4: Deriving Factors and Assessing Fit

Select Factor Method
Determining number of Factors 
Selecting Factors RULE OF THUMB

30 or more variables
> .6 communality number  Use component analysis when data reduction is necessary  Common factor occurs for more theoretical basis 
Determining number of factors RULE OF THUMB

eigenvalues > 1.0
scree test for common variance enough to meet specified common variance (usually >.6) 
Interpreting the Factors Steps

 Examine the factor loadings
 Identify highest loading  Delete crossloadings  Assess communalities (remove <.5)  Label the factors 
Process of Interpretation

Estimate the factor matrix:
 Examine factor loadings  correlation of each variable and the factor  • Higher loadings = representative of factor 
Factor Rotation

Reference axes are turned about the origin until other position is reached. Graphical way to see what factors correlate with what factors
Oblique Rotation  not maintained at 90 degrees Orthogonal Rotation  90 degrees QUARTIMAX – simplifies rows • Maximizing a variables loading on a single factor VARIMAX – simplifies columns (better results) • Making # of high loadings as few as possible • EQUIMAX  combination Orthogonal Rotation  90 degrees 
Factor Loading Criteria

+ > .5 are considered practically significant (consult literature for specific discipline)
Loadings > 1.7 indicative of well defined structure Sample Size should be > 100 for practical significance 
Validation of Factor Analysis

Confirmatory perspective
 Split the Sample, or analyze with separate sample Assess factor structure stability  look at sample size and the number of cases per variable Detect influential observations 
Additional Uses of Factor Analysis Results

Select variable with highest loading factor as a surrogate representative for a particular factor dimension
Replace original variables with small set of variables created from summated scales  >.7 cronbachas alpha convergent validity  like other scales discriminate  differ from other like scales nomological valid  like the theory that shaped it 
Why Examine Data?

Help with a basic understanding of data and the relationships between variables
To ensure the data has met all the requirements for the analysis (assumptions, outliers) 
First Step to Managing Data

Assess whether data was entered correctly.
Could check data against original data. 
Graphical Examination of Data

Histogram  determines shape of distribution
Scatterplot  linear or curvlinear relationship between 2 variables Boxplot  group differences 
Missing Data

identify the patterns associated with missing data to understand how missing data is missing

Impact of missing data

Can reduce sample size
Can distort results and introduce bias 
If no Pattern is found,

Dummy code variable  one group with missing and one group with none
TTest other variables as the DV against the missing ones If no difference, feel safe deleting values If difference, there are steps to take 
If pattern, or too many missing data

Replace values with numbers from prior knowledge or educated guess
replace values with variable mean Replace with group mean. little reduction in validity Use regression to predict missing values 
Identifying missing data

Determine type of missing data
 Ignorable (delete)  Nonignorable (dont delete) 
Determine Extent of Missing Data

10% ignored, except in nonrandom fashion
ii. Always see if cases with no missing data must be sufficient to run the analysis iii. >= 15% are candidates for deletion iv. >50% delete data, unless variable is essential to model 
Diagnose the Randomness of the Missing Data Processes

Missing at Random (MAR  not random)
i. Missing values of Y depend on X, but not on Y. i.e., one gender is significantly different than another b. Missing Completely at Random (MCAR) i. Cases with missing data are indistinguishable from cases with complete data. 
Select the Imputation Method (estimating the missing values based on the available values)

MAR data process – apply specific modeling approach (EM approach)
b. MCAR – i. use only valid data – Listwise method, Parwise (all available data) ii. replacement values – case substitution, hot or cold deck, mean substitution, 
Imputation Method Rules of Thumb

i. < 10%, any imputation method can be applied
ii. 10%  20%  all available, hot deck, regression iii. > 20% regression method for MCAR, model method for MAR 
Steps to Identify Missing Data

1. Determine the Type of Missing Data (ig. or not)
2. Determine the Extent of the Missing Data 3. Diagnose the Randomness of the Missing Data Processes 4. Select the Imputation Method (estimating the missing values based on the available values) 
Outliers

• distinct difference from other observations/responses
• Is the observation/response representative of the population? • Check for both univariate and multivariate outliers 
Reasons for Outliers

• Data entry mistake
• Missing value code not specified • Outlier not a part of population • Part of population, but is extreme: o Delete, change to fit normality but still keep extreme, transform (if normality is met) 
Identifying Outliers 
Standard Score Rules 
• 80 subjects or fewer, outliers are defined at standard scores > 2.5
• Larger samples, 4 standard scores • 2.5 to 4 SDs, if standard scores are not used 
• 9010 split

If you have a dichotomous variable with an extremely uneven split (i.e. 90 – 10 split, 90% say yes and 10% say no) this will produce an outlier. The only fix for this is to delete the variable.

• Univariate outliers

very large standardized scores (z scores greater than 3.3) and that are disconnected from the distribution

• Bivariate outliers

specific variable relationships – scatterplots with confidence intervals

• Multivariate Outliers

are found by first computing a Mahalanobis Distance for each case and once that is done the Mahalanobis scores are screened in the same manner that univariate outliers are screened

Assumptions

Normality
 Skewness and Kurtosis Homoscadasity Homogeneity of Variance Homogeneity of VarianceCovariance Matrices 
Normality

shape of the data distribution and its correspondence to a normal distribution
• Skewness – the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift). o Must be between 1 and 1 • Kurtosis – peaked or flat distribution is. o Must be less than 8 
Skewness

the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift).
o Must be between 1 and 1 
Kurtosis

peaked or flat distribution is.
o Must be less than 8 
Homoscedasticity

• Equal variances across independent variables
• if both variables are normally distributed than you should have homoscedasticity 
Homogeneity of Variance

variance in DV is expected to be the same for all levels of the IV.
• Important for grouped data • SPSS gives the Levene’s test as a measure of homogeneity of variance. • Above .05, heterogeneous 
Homogeneity of VarianceCovariance Matrices

used for multivariate tests
• an entry in a variancecovariance matrix using one DV should be similar to the same entry in a matrix using another DV. • formal test for this in SPSS is Box’s M 
Data Transformations

• Done to:
o Correct violations to assumptions o Improve correlations between variables recommended as a last resort only because of the added difficulty of interpreting a transformed variable • Common Transformations: • 1) square root, used when there is moderate skewness/ deviation, • 2) logarithm, used when there substantial skewness/ deviation and • 3) inverse, used when there is extreme skewness/ deviation 
Transformation RULES OF THUMB

• impact of transformation – calculate ratio of variable’s mean to its SD
o noticeable effects occur when ratio is < 4 • applied to IV, except when doing them for heteroscadacity • use variables in untransformed format when interpreting results 
Multicollinearity

• If you have a correlation between two variables that is .90 or greater

Singularity

two variables are identical or one is a subscale of another they are singular

Dummy Variables

• nonmetric IV that has two (or more) distinct levels that are coded 0 and 1
• act as replacement variables so that nonmetric variables can be used as metric 
What is Multivariate Analysis?

• Analysis of single variables in a single relationship of set of relationships
• Used for measurement, predicting and explaining, and testing hypotheses 
Variate

• Linear combo of variables with empirically determined weights
• Every subject has a variate value (Y’), which is the dependent variable, or the linear combination of the entire set of variables 
• Nonmetric Scales

(Qualitative)
o Nominal – labels/categories (i.e., occupation, gender, class rank) o Ordinal – ordered variable with specific order (i.e., first place, second place). Relative positions in ordered series. Distances are not equal and cannot be determined. (i.e., 7 point scale) 
Metric Scales

Quantitative)
o Interval – No natural zero point; equal differences between scale points (temperature) o Ratio – natural zero point (i.e., money or weight) 
Measurement Error

• Degree to which the observed values do not represent the true values.
• Caused by: o Data entry, imprecise measurement scales 
Validity

o Degree to which measure accurately represents what it is supposed to represent
• Measuring total income by asking for disposable income 
• Reliability

o Degree to which measure accurately represents true value AND is error free
• If repeated measures of a variable are consistent, they are reliable 
Type 1 Error

o Probability of rejecting null hypothesis when it is true
alpha, usually .05 
• Type 2 Error

o Probability of failing to reject a null hypothesis when it is false. Chance of not finding correlation when there is a correlation.
Beta. 
• Power

o Probability of rejecting the null hypothesis when it is false.
Correctly finding a hypothesized relationship when one exists. 1 – Beta o About .8 or higher 
Power Determined by:

• Sample Size – Increase sample size, increase power
• Alpha Value – increase alpha, power increases. • Effect Size  The magnitude of the effect of interest. Whether the correlation between variables, or the observed relationship is meaningful. • Small effect sizes require larger sample sizes 
Guidelines for Multivariate Analysis

• Establish both practical significance (“So What?”) and statistical significance
• Recognize that sample size affects all results o Power, effect size, no generalizability (small sample size), find anything (large sample size  > 400) • Know Your Data. o Outliers, Assumptions, Missing Data • Strive for Model Parsimony o Don’t put in irrelevant variables, which could lead to multicollineraity (degree to which any variables effect can be attributed to other variables) • Look at Your Errors o Assess the validity of measurement; unexplained relationships • Validate Results o Results could be specific to the sample o Split sample into two subsamples and rerun analysis o Get separate sample o Employ bootstrapping, which is large number of subsamples from the samples 