Chapter 6 Factor Analysis
6.1 items verification
- easier to subset the items using
DF2 <- subset(DF1, select = c("VARIABLE", … ))
withDF1
being the original data frame,DF2
the data frame do be created with the subset, andVARIABLE
the variable names or numbers in the data frame - general descriptive of the items with
freq(DF2)
,boxplot(DF2)
or similar functions - create a data frame
desc
with the descriptive usingdesc <- data.frame (describe(DF2))
and:
- compute skews statistics
desc$skewstat <- desc$skew/desc$se
- compute kurtosis statistics
desc$kurtosisstat <- desc$kurtosis/desc$se
- create a correlation matrix
cor
withcor <- cor(DF2, use = "pairwise.complete.obs")
- verify
use
options for dealing with missing data - visualize the correlations with
corrplot()
(requirescorrplot
package), for example, incorrplot(cor, method="number", type="lower", diag=FALSE, number.cex=0.5)
the matrixcor
is represented as numbers with only the lower section a ignoring the diagonal and with the at half the size
- test for identity matrix with
cortest.bartlett(DF2)
(the guideline for a good matrix is \(p<.05\) meaning it is not an identity matrix) - test common variance with
KMO(DF2)
(the guideline for acceptable common variance is \(Overall MSA >.70\))
6.2 exploratory factor analysis
This section requires the psych
package. The example uses a ordinary least squared estimation with fm=ols
and oblimin rotation with `rotate = “oblimin.” Complete step by step guide available soon.
6.2.1 deciding de number of factors
EFA.m1 <- fa(DF2, nfactors = X, rotate = "none", fm="ols", missing=TRUE, impute = "mean")
creates the objectEFA.m1
with the factorial model data based on theDF2
subset produced in the item verification:nfactors = X
should be set to the number of items and consequentlyrotate = "none"
set that the factor rotation is no neededmissing=TRUE
indicates the existence of missing data andimpute = "mean"
imputation using the mean
plot(EFA.m1$values, type = "b", xlab = "nº of factors", ylab = "eigenvalue")
plots the eigenvalues by the number of factorsEFA.m1
prints all the factorial model data but it can be more specific, for example, usingEFA.m1$loadings
will only print the loadings
6.2.2 evaluating factorial models
EFA.m2 <- fa(DF2, nfactors = X, rotate = "oblimin", fm="ols", missing=TRUE, impute = "mean")
creates a new objectEFA.m2
with the factorial model data based on theDF2
subset produced in the item verification:- this time
nfactors = X
should be set to the number factors determined in byEFA.m1
a rotation method inrotate = ""
should be specified for ease of interpretation
- this time
EFA.m2
prints all the factorial model data,EFA.m2$loadings
will only print the loadings
6.2.3 computing the factors
DF$FACTOR <- rowMeans(DF2)
computes the row means (coarse scores) for allDF2
variables on the originalDF
data frame with the nameFACTOR
DF <- cbind(DF2, EFA.m2$scores)
computes the factor scores (refined scores) for allDF2
variables on the originalDF
data frame;names(DF)[names(DF) == "DF$scores"] <- "FACTOR"
can be used to change variable nameFACTOR
6.3 confirmatory factor analysis
This section requires the lavaan
and semPlot
packages. Complete step by step guide available soon.
6.3.1 model estimation
- example of a model with 2 correlating factors, with 3 items per factor, and a higher order factor:
<-
model 'factor.1 =~ item.1 + item.2 + item.3
factor.2 =~item.4 + item.5 + item.6
global.factor =~ factor.1 + factro.2
factor.1 ~~ factor.2
'
model.fit <- cfa(model, data = DF)
estimates de confirmatory model with thecfa()
function, using themodel
defined above and with theDF
database (names in th emodel
should be exactly the same the names in the variable names in theDF
)
6.3.2 overall goodness of fit
summary(model.fit, fit.measures = TRUE, standardized = TRUE, rsq = TRUE)
provides a detailed summary of themodel.fit
fitMeasures(model.fit)
provides only the fit measures of themodel.fit
6.3.3 measures of strain
resid(model.fit, type = "standardized")
provides a matrix with the standardized residuals (the guideline for the absence of focal areas of strain is residuals < |2.58|)- for further verification
model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))
creates the data framemodel.fit.residuals
with the residuals andwrite.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
exports the data frame with residuals to a spreadsheet
- for further verification
modificationindices(model.fit)
provides a table with the modification indexes (the guideline for the absence of focal areas of strain is modification indexes < 3.84)- for further verification
model.fit.mi <- modificationindices(model.fit)
creates the data framemodel.fit.mi
with the residuals andsubset(model.fit.mi, mi>3.84)
shows only the modification indexes above the criteria
- for further verification
- follows the code chunk:
resid(model.fit, type = "standardized")
<- data.frame(resid(model.fit, type = "standardized"))
model.fit.residuals write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
modificationindices(model.fit)
<- modificationindices(model.fit)
model.fit.mi subset(model.fit.mi, mi>3.84)
6.3.4 estimates
parameterEstimates(model.fit)
provides the non-standardized estimatesstandardizedSolution(model.fit)
# provides the standardized estimates (the guideline is that estimates < |.30| indicate week associations with the factors)semPaths(model fit, style="lisrel", what="stand", layout="tree")
provides a graph with the associations with the factors according the the model estimation
6.3.5 computing the factors
- for coarse scores computation see 6.2.3
- for refined scores use
DF.refined <- data.frame(predict(model.fit))
followed byDF <- cbind(DF, DF.refined)
(the name of the variable will be the name of latent variable in the model estimation)