Chapter 6 Factor Analysis

6.1 items verification

easier to subset the items using DF2 <- subset(DF1, select = c("VARIABLE", … )) with DF1 being the original data frame, DF2 the data frame do be created with the subset, and VARIABLE the variable names or numbers in the data frame
general descriptive of the items with freq(DF2), boxplot(DF2) or similar functions
create a data frame desc with the descriptive using desc <- data.frame (describe(DF2)) and:

compute skews statistics desc$skewstat <- desc$skew/desc$se
compute kurtosis statistics desc$kurtosisstat <- desc$kurtosis/desc$se

create a correlation matrix cor with cor <- cor(DF2, use = "pairwise.complete.obs")

verify use options for dealing with missing data
visualize the correlations with corrplot() (requires corrplot package), for example, in corrplot(cor, method="number", type="lower", diag=FALSE, number.cex=0.5) the matrix cor is represented as numbers with only the lower section a ignoring the diagonal and with the at half the size

test for identity matrix with cortest.bartlett(DF2) (the guideline for a good matrix is $p<.05$ meaning it is not an identity matrix)
test common variance with KMO(DF2) (the guideline for acceptable common variance is $Overall MSA >.70$ )

6.2 exploratory factor analysis

This section requires the psych package. The example uses a ordinary least squared estimation with fm=ols and oblimin rotation with `rotate = “oblimin.” Complete step by step guide available soon.

6.2.1 deciding de number of factors

EFA.m1 <- fa(DF2, nfactors = X, rotate = "none", fm="ols", missing=TRUE, impute = "mean") creates the object EFA.m1 with the factorial model data based on the DF2 subset produced in the item verification:
- nfactors = X should be set to the number of items and consequently rotate = "none" set that the factor rotation is no needed
- missing=TRUE indicates the existence of missing data and impute = "mean" imputation using the mean
plot(EFA.m1$values, type = "b", xlab = "nº of factors", ylab = "eigenvalue") plots the eigenvalues by the number of factors
EFA.m1 prints all the factorial model data but it can be more specific, for example, using EFA.m1$loadings will only print the loadings

6.2.2 evaluating factorial models

EFA.m2 <- fa(DF2, nfactors = X, rotate = "oblimin", fm="ols", missing=TRUE, impute = "mean") creates a new object EFA.m2 with the factorial model data based on the DF2 subset produced in the item verification:
- this time nfactors = X should be set to the number factors determined in by EFA.m1 a rotation method in rotate = "" should be specified for ease of interpretation
EFA.m2 prints all the factorial model data, EFA.m2$loadings will only print the loadings

6.2.3 computing the factors

DF$FACTOR <- rowMeans(DF2) computes the row means (coarse scores) for all DF2 variables on the original DF data frame with the name FACTOR
DF <- cbind(DF2, EFA.m2$scores) computes the factor scores (refined scores) for all DF2 variables on the original DF data frame; names(DF)[names(DF) == "DF$scores"] <- "FACTOR" can be used to change variable name FACTOR

6.3 confirmatory factor analysis

This section requires the lavaan and semPlot packages. Complete step by step guide available soon.

6.3.1 model estimation

example of a model with 2 correlating factors, with 3 items per factor, and a higher order factor:

model <- 
'factor.1 =~ item.1 + item.2 + item.3 
factor.2 =~item.4 + item.5 + item.6
global.factor =~ factor.1 + factro.2
factor.1 ~~ factor.2
'

model.fit <- cfa(model, data = DF) estimates de confirmatory model with the cfa() function, using the model defined above and with the DF database (names in th emodelshould be exactly the same the names in the variable names in the DF)

6.3.2 overall goodness of fit

summary(model.fit, fit.measures = TRUE, standardized = TRUE, rsq = TRUE) provides a detailed summary of the model.fit
fitMeasures(model.fit) provides only the fit measures of the model.fit

6.3.3 measures of strain

resid(model.fit, type = "standardized") provides a matrix with the standardized residuals (the guideline for the absence of focal areas of strain is residuals < |2.58|)
- for further verification model.fit.residuals <- data.frame(resid(model.fit, type = "standardized")) creates the data frame model.fit.residuals with the residuals and write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE) exports the data frame with residuals to a spreadsheet
modificationindices(model.fit) provides a table with the modification indexes (the guideline for the absence of focal areas of strain is modification indexes < 3.84)
- for further verification model.fit.mi <- modificationindices(model.fit) creates the data frame model.fit.mi with the residuals and subset(model.fit.mi, mi>3.84) shows only the modification indexes above the criteria
follows the code chunk:

resid(model.fit, type = "standardized")
model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))
write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
modificationindices(model.fit)
model.fit.mi <- modificationindices(model.fit)
subset(model.fit.mi, mi>3.84)

6.3.4 estimates

parameterEstimates(model.fit) provides the non-standardized estimates
standardizedSolution(model.fit) # provides the standardized estimates (the guideline is that estimates < |.30| indicate week associations with the factors)
semPaths(model fit, style="lisrel", what="stand", layout="tree") provides a graph with the associations with the factors according the the model estimation

6.3.5 computing the factors

for coarse scores computation see 6.2.3
for refined scores use DF.refined <- data.frame(predict(model.fit)) followed by DF <- cbind(DF, DF.refined) (the name of the variable will be the name of latent variable in the model estimation)

6.4 online resources on factor analysis

The Personality Project: An introduction to psychometric theory