# Chapter 6 Factor Analysis

## 6.1 items verification

- easier to
**subset**the items using`DF2 <- subset(DF1, select = c("VARIABLE", … ))`

with`DF1`

being the original data frame,`DF2`

the data frame do be created with the subset, and`VARIABLE`

the variable names or numbers in the data frame - general descriptive of the items with
`freq(DF2)`

,`boxplot(DF2)`

or similar functions - create a data frame
`desc`

with the descriptive using`desc <- data.frame (describe(DF2))`

and:

- compute skews statistics
`desc$skewstat <- desc$skew/desc$se`

- compute kurtosis statistics
`desc$kurtosisstat <- desc$kurtosis/desc$se`

- create a correlation matrix
`cor`

with`cor <- cor(DF2, use = "pairwise.complete.obs")`

- verify
`use`

options for dealing with missing data - visualize the correlations with
`corrplot()`

(requires`corrplot`

package), for example, in`corrplot(cor, method="number", type="lower", diag=FALSE, number.cex=0.5)`

the matrix`cor`

is represented as numbers with only the lower section a ignoring the diagonal and with the at half the size

- test for
**identity matrix**with`cortest.bartlett(DF2)`

(the*guideline*for a good matrix is \(p<.05\) meaning it is not an identity matrix) - test common variance with
`KMO(DF2)`

(the*guideline*for acceptable common variance is \(Overall MSA >.70\))

## 6.2 exploratory factor analysis

This section requires the `psych`

package. The example uses a ordinary least squared estimation with `fm=ols`

and oblimin rotation with `rotate = “oblimin.” Complete step by step guide available soon.

### 6.2.1 deciding de number of factors

`EFA.m1 <- fa(DF2, nfactors = X, rotate = "none", fm="ols", missing=TRUE, impute = "mean")`

creates the object`EFA.m1`

with the factorial model data based on the`DF2`

subset produced in the item verification:`nfactors = X`

should be set to the number of items and consequently`rotate = "none"`

set that the factor rotation is no needed`missing=TRUE`

indicates the existence of missing data and`impute = "mean"`

imputation using the mean

`plot(EFA.m1$values, type = "b", xlab = "nº of factors", ylab = "eigenvalue")`

plots the eigenvalues by the number of factors`EFA.m1`

prints all the factorial model data but it can be more specific, for example, using`EFA.m1$loadings`

will only print the loadings

### 6.2.2 evaluating factorial models

`EFA.m2 <- fa(DF2, nfactors = X, rotate = "oblimin", fm="ols", missing=TRUE, impute = "mean")`

creates a new object`EFA.m2`

with the factorial model data based on the`DF2`

subset produced in the item verification:- this time
`nfactors = X`

should be set to the number factors determined in by`EFA.m1`

a rotation method in`rotate = ""`

should be specified for ease of interpretation

- this time
`EFA.m2`

prints all the factorial model data,`EFA.m2$loadings`

will only print the loadings

### 6.2.3 computing the factors

`DF$FACTOR <- rowMeans(DF2)`

computes the row means (*coarse scores*) for all`DF2`

variables on the original`DF`

data frame with the name`FACTOR`

`DF <- cbind(DF2, EFA.m2$scores)`

computes the factor scores (*refined scores*) for all`DF2`

variables on the original`DF`

data frame;`names(DF)[names(DF) == "DF$scores"] <- "FACTOR"`

can be used to change variable name`FACTOR`

## 6.3 confirmatory factor analysis

This section requires the `lavaan`

and `semPlot`

packages. Complete step by step guide available soon.

### 6.3.1 model estimation

- example of a model with 2 correlating factors, with 3 items per factor, and a higher order factor:

```
<-
model 'factor.1 =~ item.1 + item.2 + item.3
factor.2 =~item.4 + item.5 + item.6
global.factor =~ factor.1 + factro.2
factor.1 ~~ factor.2
'
```

`model.fit <- cfa(model, data = DF)`

estimates de confirmatory model with the`cfa()`

function, using the`model`

defined above and with the`DF`

database (names in th e`model`

should be exactly the same the names in the variable names in the`DF`

)

### 6.3.2 overall goodness of fit

`summary(model.fit, fit.measures = TRUE, standardized = TRUE, rsq = TRUE)`

provides a detailed summary of the`model.fit`

`fitMeasures(model.fit)`

provides only the fit measures of the`model.fit`

### 6.3.3 measures of strain

`resid(model.fit, type = "standardized")`

provides a matrix with the**standardized residuals**(the*guideline*for the absence of focal areas of strain is residuals < |2.58|)- for further verification
`model.fit.residuals <- data.frame(resid(model.fit, type = "standardized"))`

creates the data frame`model.fit.residuals`

with the residuals and`write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)`

exports the data frame with residuals to a spreadsheet

- for further verification
`modificationindices(model.fit)`

provides a table with the**modification indexes**(the*guideline*for the absence of focal areas of strain is modification indexes < 3.84)- for further verification
`model.fit.mi <- modificationindices(model.fit)`

creates the data frame`model.fit.mi`

with the residuals and`subset(model.fit.mi, mi>3.84)`

shows only the modification indexes above the criteria

- for further verification
- follows the code chunk:

```
resid(model.fit, type = "standardized")
<- data.frame(resid(model.fit, type = "standardized"))
model.fit.residuals write.csv (model.fit.residuals, "model.fit.residuals.csv", row.names = TRUE)
modificationindices(model.fit)
<- modificationindices(model.fit)
model.fit.mi subset(model.fit.mi, mi>3.84)
```

### 6.3.4 estimates

`parameterEstimates(model.fit)`

provides the non-standardized estimates`standardizedSolution(model.fit)`

# provides the standardized estimates (the*guideline*is that estimates < |.30| indicate week associations with the factors)`semPaths(model fit, style="lisrel", what="stand", layout="tree")`

provides a graph with the associations with the factors according the the model estimation

### 6.3.5 computing the factors

- for
*coarse scores*computation see 6.2.3 - for
*refined scores*use`DF.refined <- data.frame(predict(model.fit))`

followed by`DF <- cbind(DF, DF.refined)`

(the name of the variable will be the name of latent variable in the model estimation)