Chapter 6 Factor Analysis

6.1 items verification

  1. easier to subset the items using DF2 <- subset(DF1, select = c("VARIABLE", … )) with DF1 being the original data frame, DF2 the data frame do be created with the subset, and VARIABLE the variable names or numbers in the data frame
  2. general descriptive of the items with freq(DF2), boxplot(DF2) or similar functions
  3. create a data frame desc with the descriptive using desc <- data.frame (describe(DF2)) and:
  • compute skews statistics desc$skewstat <- desc$skew/desc$se
  • compute kurtosis statistics desc$kurtosisstat <- desc$kurtosis/desc$se
  1. create a correlation matrix cor with cor <- cor(DF2, use = "pairwise.complete.obs")
  • verify use options for dealing with missing data
  • visualize the correlations with corrplot() (requires corrplot package), for example, in corrplot(cor, method="number", type="lower", diag=FALSE, number.cex=0.5) the matrix cor is represented as numbers with only the lower section a ignoring the diagonal and with the at half the size
  1. test for identity matrix with cortest.bartlett(DF2) (the guideline for a good matrix is \(p<.05\) meaning it is not an identity matrix)
  2. test common variance with KMO(DF2) (the guideline for acceptable common variance is \(Overall MSA >.70\))

6.2 exploratory factor analysis

This section requires the psych package. The example uses a ordinary least squared estimation with fm=ols and oblimin rotation with `rotate = “oblimin.” Complete step by step guide available soon.

6.2.1 deciding de number of factors

  • EFA.m1 <- fa(DF2, nfactors = X, rotate = "none", fm="ols", missing=TRUE, impute = "mean") creates the object EFA.m1 with the factorial model data based on the DF2 subset produced in the item verification:
    • nfactors = X should be set to the number of items and consequently rotate = "none" set that the factor rotation is no needed
    • missing=TRUE indicates the existence of missing data and impute = "mean" imputation using the mean
  • plot(EFA.m1$values, type = "b", xlab = "nº of factors", ylab = "eigenvalue") plots the eigenvalues by the number of factors
  • EFA.m1 prints all the factorial model data but it can be more specific, for example, using EFA.m1$loadings will only print the loadings

6.2.2 evaluating factorial models

  • EFA.m2 <- fa(DF2, nfactors = X, rotate = "oblimin", fm="ols", missing=TRUE, impute = "mean") creates a new object EFA.m2 with the factorial model data based on the DF2 subset produced in the item verification:
    • this time nfactors = X should be set to the number factors determined in by EFA.m1 a rotation method in rotate = "" should be specified for ease of interpretation
  • EFA.m2 prints all the factorial model data, EFA.m2$loadings will only print the loadings

6.2.3 computing the factors

  • DF$FACTOR <- rowMeans(DF2) computes the row means (coarse scores) for all DF2 variables on the original DF data frame with the name FACTOR
  • DF <- cbind(DF2, EFA.m2$scores) computes the factor scores (refined scores) for all DF2 variables on the original DF data frame; names(DF)[names(DF) == "DF$scores"] <- "FACTOR" can be used to change variable name FACTOR

6.3 confirmatory factor analysis

This section requires the lavaan and semPlot packages. Complete step by step guide available soon.

6.3.1 model estimation

  • example of a model with 2 correlating factors, with 3 items per factor, and a higher order factor:
model <- 
'factor.1 =~ item.1 + item.2 + item.3 
factor.2 =~item.4 + item.5 + item.6
global.factor =~ factor.1 + factro.2
factor.1 ~~ factor.2
'
  • model.fit <- cfa(model, data = DF) estimates de confirmatory model with the cfa() function, using the model defined above and with the DF database (names in th emodelshould be exactly the same the names in the variable names in the DF)

6.3.2 overall goodness of fit

  • summary(model.fit, fit.measures = TRUE, standardized = TRUE, rsq = TRUE) provides a detailed summary of the model.fit
  • fitMeasures(model.fit) provides only the fit measures of the model.fit

6.3.3 measures of strain

  • resid(model.fit, type = "standardized") provides a matrix with the standardized residuals (the guideline for the absence of focal areas of strain is residuals < |2.58|)
    • for further verification model.fit.residuals <- data.frame(resid(model.fit, type = "standardized")) creates the data frame model.fit.residuals with the residuals and write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE) exports the data frame with residuals to a spreadsheet
  • modificationindices(model.fit) provides a table with the modification indexes (the guideline for the absence of focal areas of strain is modification indexes < 3.84)
    • for further verification model.fit.mi <- modificationindices(model.fit) creates the data frame model.fit.mi with the residuals and subset(model.fit.mi, mi>3.84) shows only the modification indexes above the criteria
  • follows the code chunk:
resid(model.fit, type = "standardized")
model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))
write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
modificationindices(model.fit)
model.fit.mi <- modificationindices(model.fit)
subset(model.fit.mi, mi>3.84)

6.3.4 estimates

  • parameterEstimates(model.fit) provides the non-standardized estimates
  • standardizedSolution(model.fit) # provides the standardized estimates (the guideline is that estimates < |.30| indicate week associations with the factors)
  • semPaths(model fit, style="lisrel", what="stand", layout="tree") provides a graph with the associations with the factors according the the model estimation

6.3.5 computing the factors

  • for coarse scores computation see 6.2.3
  • for refined scores use DF.refined <- data.frame(predict(model.fit)) followed by DF <- cbind(DF, DF.refined) (the name of the variable will be the name of latent variable in the model estimation)

6.4 online resources on factor analysis