• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key

image

Play button

image

Play button

image

Progress

1/67

Click to flip

67 Cards in this Set

  • Front
  • Back
Exploratory Factor Analysis
a multivariate methods that allows you to explore the underlying structure of variables

• provides the tools for analyzing the structure of the correlations among a large number of variables by defining sets of variables (factors) that are interrelated
Exploratory
no a priori predictions about how variables should look;
Confirmatory
see if variables confirm your predictions about variable structure
Stage 1
Objectives of Factor Analysis - How objectives fit with the RQ

Specify Unit of Analysis
Factor Analysis Outcomes
Variable Selection
Unit of Analysis
R Factor vs. Q Factor

R Factor identifies latent variables (not easily observed)

Q is used to reduce people into groups
Factor Analysis Outcomes:

Data Summarization
Dimensions that describe data in a small number of concepts
Data Reduction
Extends summarization by providing factor score for each dimension (factor)
Variable Selection
Use appropriate judgement
Dont get garbage in, garbage out
Stage 2: Designing a Factor Analysis

Rules of Thumb
Calculate input data - R vs. Q
Variable selection - mostly metric
~ 5 metrics for proposed dimensions
Sample Size - 50 minimum; 100 to 200 preferred
5x as many subjects as proposed variables
10:1; 20:1 is better
Stage 3: Assumptions

Conceptual
Some structure does exist

Patterns are appropriate

Homogeneous sample
EFA Assumptions

Statistical
Normality

Multicolinearity is desired
- Should be > .30
- Partial correlations > .7 are awesome
KMO Statistic
Predicts if data is likely to factor well based on correlation and partial correlation

- Identify which factors to drop
KMO Rules of Thumb
> .5 is required to proceed

> .7 or .8 is very good

If <.5, remove variable with lowest KMO score one at a time until KMO scores are > .5
Bartlett Sphericity
Examines entire correlation matrix

Stat. sig. at p < .05, meaning correlations exist among the variables
Stage 4: Deriving Factors and Assessing Fit
Select Factor Method

Determining number of Factors
Selecting Factors RULE OF THUMB
30 or more variables
> .6 communality number
- Use component analysis when data reduction is necessary
- Common factor occurs for more theoretical basis
Determining number of factors RULE OF THUMB
eigenvalues > 1.0
scree test for common variance
enough to meet specified common variance (usually >.6)
Interpreting the Factors Steps
- Examine the factor loadings
- Identify highest loading
- Delete cross-loadings
- Assess communalities (remove <.5)
- Label the factors
Process of Interpretation
Estimate the factor matrix:
- Examine factor loadings - correlation of each variable and the factor - • Higher loadings = representative of factor
Factor Rotation
Reference axes are turned about the origin until other position is reached. Graphical way to see what factors correlate with what factors

-Oblique Rotation - not maintained at 90 degrees
Orthogonal Rotation - 90 degrees
QUARTIMAX – simplifies rows
• Maximizing a variables loading on a single factor
VARIMAX – simplifies columns (better results)
• Making # of high loadings as few as possible
• EQUIMAX - combination


Orthogonal Rotation - 90 degrees
Factor Loading Criteria
+- > .5 are considered practically significant (consult literature for specific discipline)

Loadings > 1.7 indicative of well defined structure

Sample Size should be > 100 for practical significance
Validation of Factor Analysis
Confirmatory perspective
- Split the Sample, or analyze with separate sample

Assess factor structure stability
- look at sample size and the number of cases per variable

Detect influential observations
Additional Uses of Factor Analysis Results
Select variable with highest loading factor as a surrogate representative for a particular factor dimension

Replace original variables with small set of variables created from summated scales
- >.7 cronbachas alpha
convergent validity - like other scales
discriminate - differ from other like scales
nomological valid - like the theory that shaped it
Why Examine Data?
Help with a basic understanding of data and the relationships between variables

To ensure the data has met all the requirements for the analysis (assumptions, outliers)
First Step to Managing Data
Assess whether data was entered correctly.

Could check data against original data.
Graphical Examination of Data
Histogram - determines shape of distribution

Scatterplot - linear or curvlinear relationship between 2 variables

Boxplot - group differences
Missing Data
identify the patterns associated with missing data to understand how missing data is missing
Impact of missing data
Can reduce sample size

Can distort results and introduce bias
If no Pattern is found,
Dummy code variable - one group with missing and one group with none

T-Test other variables as the DV against the missing ones

If no difference, feel safe deleting values

If difference, there are steps to take
If pattern, or too many missing data
Replace values with numbers from prior knowledge or educated guess

replace values with variable mean

Replace with group mean. little reduction in validity

Use regression to predict missing values
Identifying missing data
Determine type of missing data
- Ignorable (delete)
- Nonignorable (dont delete)
Determine Extent of Missing Data
10% ignored, except in nonrandom fashion
ii. Always see if cases with no missing data must be sufficient to run the analysis
iii. >= 15% are candidates for deletion
iv. >50% delete data, unless variable is essential to model
Diagnose the Randomness of the Missing Data Processes
Missing at Random (MAR - not random)
i. Missing values of Y depend on X, but not on Y. i.e., one gender is significantly different than another

b. Missing Completely at Random (MCAR)
i. Cases with missing data are indistinguishable from cases with complete data.
Select the Imputation Method (estimating the missing values based on the available values)
MAR data process – apply specific modeling approach (EM approach)


b. MCAR –
i. use only valid data – Listwise method, Parwise (all available data)
ii. replacement values – case substitution, hot or cold deck, mean substitution,
Imputation Method Rules of Thumb
i. < 10%, any imputation method can be applied
ii. 10% - 20% - all available, hot deck, regression
iii. > 20% regression method for MCAR, model method for MAR
Steps to Identify Missing Data
1. Determine the Type of Missing Data (ig. or not)
2. Determine the Extent of the Missing Data
3. Diagnose the Randomness of the Missing Data Processes
4. Select the Imputation Method (estimating the missing values based on the available values)
Outliers
• distinct difference from other observations/responses
• Is the observation/response representative of the population?
• Check for both univariate and multivariate outliers
Reasons for Outliers
• Data entry mistake
• Missing value code not specified
• Outlier not a part of population
• Part of population, but is extreme:
o Delete, change to fit normality but still keep extreme, transform (if normality is met)
Identifying Outliers -
Standard Score Rules
• 80 subjects or fewer, outliers are defined at standard scores > 2.5
• Larger samples, 4 standard scores
• 2.5 to 4 SDs, if standard scores are not used
• 90-10 split
If you have a dichotomous variable with an extremely uneven split (i.e. 90 – 10 split, 90% say yes and 10% say no) this will produce an outlier. The only fix for this is to delete the variable.
• Univariate outliers
very large standardized scores (z scores greater than 3.3) and that are disconnected from the distribution
• Bivariate outliers
specific variable relationships – scatterplots with confidence intervals
• Multivariate Outliers
are found by first computing a Mahalanobis Distance for each case and once that is done the Mahalanobis scores are screened in the same manner that univariate outliers are screened
Assumptions
Normality
- Skewness and Kurtosis

Homoscadasity

Homogeneity of Variance

Homogeneity of Variance-Covariance Matrices
Normality
shape of the data distribution and its correspondence to a normal distribution


• Skewness – the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift).
o Must be between -1 and 1


• Kurtosis – peaked or flat distribution is.
o Must be less than 8
Skewness
the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift).
o Must be between -1 and 1
Kurtosis
peaked or flat distribution is.
o Must be less than 8
Homoscedasticity
• Equal variances across independent variables

• if both variables are normally distributed than you should have homoscedasticity
Homogeneity of Variance
variance in DV is expected to be the same for all levels of the IV.
• Important for grouped data
• SPSS gives the Levene’s test as a measure of homogeneity of variance.
• Above .05, heterogeneous
Homogeneity of Variance-Covariance Matrices
used for multivariate tests
• an entry in a variance-covariance matrix using one DV should be similar to the same entry in a matrix using another DV.
• formal test for this in SPSS is Box’s M
Data Transformations
• Done to:
o Correct violations to assumptions
o Improve correlations between variables

recommended as a last resort only because of the added difficulty of interpreting a transformed variable


• Common Transformations:
• 1) square root, used when there is moderate skewness/ deviation,
• 2) logarithm, used when there substantial skewness/ deviation and
• 3) inverse, used when there is extreme skewness/ deviation
Transformation RULES OF THUMB
• impact of transformation – calculate ratio of variable’s mean to its SD
o noticeable effects occur when ratio is < 4
• applied to IV, except when doing them for heteroscadacity
• use variables in untransformed format when interpreting results
Multicollinearity
• If you have a correlation between two variables that is .90 or greater
Singularity
two variables are identical or one is a subscale of another they are singular
Dummy Variables
• nonmetric IV that has two (or more) distinct levels that are coded 0 and 1
• act as replacement variables so that nonmetric variables can be used as metric
What is Multivariate Analysis?
• Analysis of single variables in a single relationship of set of relationships
• Used for measurement, predicting and explaining, and testing hypotheses
Variate
• Linear combo of variables with empirically determined weights
• Every subject has a variate value (Y’), which is the dependent variable, or the linear combination of the entire set of variables
• Nonmetric Scales
(Qualitative)
o Nominal – labels/categories (i.e., occupation, gender, class rank)
o Ordinal – ordered variable with specific order (i.e., first place, second place). Relative positions in ordered series. Distances are not equal and cannot be determined. (i.e., 7 point scale)
Metric Scales
Quantitative)
o Interval – No natural zero point; equal differences between scale points (temperature)
o Ratio – natural zero point (i.e., money or weight)
Measurement Error
• Degree to which the observed values do not represent the true values.
• Caused by:
o Data entry, imprecise measurement scales
Validity
o Degree to which measure accurately represents what it is supposed to represent
• Measuring total income by asking for disposable income
• Reliability
o Degree to which measure accurately represents true value AND is error free
• If repeated measures of a variable are consistent, they are reliable
Type 1 Error
o Probability of rejecting null hypothesis when it is true
alpha,
usually .05
• Type 2 Error
o Probability of failing to reject a null hypothesis when it is false. Chance of not finding correlation when there is a correlation.
Beta.
• Power
o Probability of rejecting the null hypothesis when it is false.

Correctly finding a hypothesized relationship when one exists. 1 – Beta
o About .8 or higher
Power Determined by:
• Sample Size – Increase sample size, increase power
• Alpha Value – increase alpha, power increases.
• Effect Size - The magnitude of the effect of interest. Whether the correlation between variables, or the observed relationship is meaningful.
• Small effect sizes require larger sample sizes
Guidelines for Multivariate Analysis
• Establish both practical significance (“So What?”) and statistical significance
• Recognize that sample size affects all results
o Power, effect size, no generalizability (small sample size), find anything (large sample size - > 400)
• Know Your Data.
o Outliers, Assumptions, Missing Data
• Strive for Model Parsimony
o Don’t put in irrelevant variables, which could lead to multicollineraity (degree to which any variables effect can be attributed to other variables)
• Look at Your Errors
o Assess the validity of measurement; unexplained relationships
• Validate Results
o Results could be specific to the sample
o Split sample into two subsamples and re-run analysis
o Get separate sample
o Employ bootstrapping, which is large number of subsamples from the samples