Chapter 1 Introduction to confirmatory factor analysis

1.1 The principal component factor analysis approach

PCFA attempts to account for all of the variance and covariance of the set of items rather than the portion of the covariance that the items share in common
- This may not be the best, but it is the most common
Illistrate PCFA with NLSY97 data to construct a measure of convervatism

library(haven)

nlsy.data <- read_dta("nlsy97cfa.dta")

summary(nlsy.data)

##     r0000100          x1              x2             x3              x4       
##  Min.   :   1   Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2249   1st Qu.:2.000   1st Qu.:1.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median :4502   Median :2.000   Median :1.00   Median :1.000   Median :1.000  
##  Mean   :4504   Mean   :2.332   Mean   :1.62   Mean   :1.416   Mean   :1.365  
##  3rd Qu.:6758   3rd Qu.:3.000   3rd Qu.:2.00   3rd Qu.:2.000   3rd Qu.:2.000  
##  Max.   :9022   Max.   :4.000   Max.   :4.00   Max.   :4.000   Max.   :4.000  
##  NA's   :1      NA's   :7152    NA's   :7126   NA's   :7111    NA's   :7113   
##        x5              x6              x7              x8       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :2.000   Median :1.000  
##  Mean   :1.773   Mean   :2.277   Mean   :2.229   Mean   :1.309  
##  3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:2.000  
##  Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##  NA's   :7170    NA's   :7174    NA's   :7210    NA's   :7110   
##        x9             x10             x11             x12       
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :2.000   Median :1.000   Median :3.000   Median :2.000  
##  Mean   :1.705   Mean   :1.391   Mean   :3.224   Mean   :2.233  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.:3.000  
##  Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
##  NA's   :7138    NA's   :7125    NA's   :1690    NA's   :1588   
##       x13       
##  Min.   :1.000  
##  1st Qu.:3.000  
##  Median :4.000  
##  Mean   :3.655  
##  3rd Qu.:4.000  
##  Max.   :4.000  
##  NA's   :1694

# PCFA model
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

pcfa <- 
  nlsy.data %>%
  select(x1:x10) %>%
  drop_na() %>%
  principal(covar = TRUE, rotate = "none", nfactors = 10)
pcfa

## Principal Components Analysis
## Call: principal(r = ., nfactors = 10, rotate = "none", covar = TRUE)
## Standardized loadings (pattern matrix) based upon correlation matrix
##      PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10 h2       u2 com
## x1  0.61 -0.38 -0.29  0.01  0.22 -0.07  0.55  0.22 -0.06 -0.07  1  2.2e-16 3.9
## x2  0.58  0.04 -0.61  0.10  0.22  0.20 -0.16 -0.37  0.15 -0.01  1  4.4e-16 3.6
## x3  0.72  0.21 -0.08 -0.29  0.13 -0.33 -0.11  0.07 -0.17  0.41  1  2.2e-16 3.1
## x4  0.72  0.32 -0.04 -0.17 -0.01 -0.33 -0.17  0.16  0.10 -0.42  1  5.6e-16 3.1
## x5  0.58 -0.03 -0.24  0.44 -0.59  0.02 -0.11  0.21 -0.06  0.08  1 -2.2e-16 3.7
## x6  0.61 -0.45  0.35  0.08 -0.10 -0.31  0.01 -0.24  0.36  0.08  1 -6.7e-16 4.5
## x7  0.60 -0.33  0.23  0.02  0.31  0.37 -0.36  0.32  0.02  0.01  1  1.1e-16 4.9
## x8  0.60  0.33  0.13 -0.37 -0.27  0.42  0.25  0.03  0.24  0.07  1  5.6e-16 4.9
## x9  0.73 -0.16  0.24 -0.10 -0.14  0.12  0.02 -0.34 -0.44 -0.15  1  1.1e-16 2.9
## x10 0.45  0.52  0.33  0.54  0.29  0.02  0.18 -0.05 -0.01  0.03  1  2.2e-16 4.5
## 
##                        PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10
## SS loadings           3.92 1.01 0.88 0.77 0.74 0.69 0.61 0.54 0.45 0.39
## Proportion Var        0.39 0.10 0.09 0.08 0.07 0.07 0.06 0.05 0.04 0.04
## Cumulative Var        0.39 0.49 0.58 0.66 0.73 0.80 0.86 0.92 0.96 1.00
## Proportion Explained  0.39 0.10 0.09 0.08 0.07 0.07 0.06 0.05 0.04 0.04
## Cumulative Proportion 0.39 0.49 0.58 0.66 0.73 0.80 0.86 0.92 0.96 1.00
## 
## Mean item complexity =  3.9
## Test of the hypothesis that 10 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0 
##  with the empirical chi square  0  with prob <  NA 
## 
## Fit based upon off diagonal values = 1

The eigenvalue of 3.92 shows how much of the total variance is explained by the first factor.
The RCFA analyzes the correlation matrix where each item is standardized to have a vairnace of 1. With 10 items, the eigenvalues combined will add up to 10. With 3.92 out of 10 being explained by the first factor, we say that the first factor explains 39.2% of the variance in the set of items.
Any factor with an eigenvalue of less than 1.0 can usually be ignored
We drop the last loading because the eigenvalue is so low

pcfa <- 
  nlsy.data %>%
  select(x1:x9) %>%
  drop_na() %>%
  principal(covar = TRUE, rotate = "none", nfactors = 9)
pcfa

## Principal Components Analysis
## Call: principal(r = ., nfactors = 9, rotate = "none", covar = TRUE)
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9 h2       u2 com
## x1 0.62  0.24  0.40 -0.20 -0.07  0.53  0.25 -0.06 -0.07  1 -8.9e-16 3.8
## x2 0.59 -0.26  0.57 -0.17  0.20 -0.16 -0.38  0.14  0.00  1  1.1e-15 3.9
## x3 0.72 -0.29 -0.12 -0.25 -0.32 -0.08  0.06 -0.17  0.42  1 -6.7e-16 3.1
## x4 0.71 -0.35 -0.16 -0.07 -0.32 -0.16  0.15  0.10 -0.42  1 -4.4e-16 3.2
## x5 0.58 -0.07  0.28  0.71  0.01 -0.12  0.21 -0.07  0.08  1  0.0e+00 2.6
## x6 0.62  0.54 -0.16  0.14 -0.31  0.02 -0.24  0.36  0.08  1  2.2e-16 3.9
## x7 0.61  0.41 -0.06 -0.25  0.38 -0.40  0.30  0.02  0.01  1 -2.2e-16 4.5
## x8 0.60 -0.35 -0.40  0.09  0.43  0.32  0.02  0.23  0.08  1 -6.7e-16 4.6
## x9 0.74  0.19 -0.24  0.09  0.12  0.06 -0.34 -0.45 -0.15  1  3.3e-16 2.8
## 
##                        PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9
## SS loadings           3.76 0.95 0.85 0.75 0.69 0.62 0.54 0.45 0.39
## Proportion Var        0.42 0.11 0.09 0.08 0.08 0.07 0.06 0.05 0.04
## Cumulative Var        0.42 0.52 0.62 0.70 0.78 0.85 0.91 0.96 1.00
## Proportion Explained  0.42 0.11 0.09 0.08 0.08 0.07 0.06 0.05 0.04
## Cumulative Proportion 0.42 0.52 0.62 0.70 0.78 0.85 0.91 0.96 1.00
## 
## Mean item complexity =  3.6
## Test of the hypothesis that 9 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0 
##  with the empirical chi square  0  with prob <  NA 
## 
## Fit based upon off diagonal values = 1

These are good results because only one factor has an eigenvalue greater than 1.- and all 9 items load over 0.5 on that factor

1.2 Alpha reliability for our nine-item scale

the next step is to assess the reliability of our 9 item scale of conservatism

psych::alpha(nlsy.data[ , c(2:10)])

## 
## Reliability analysis   
## Call: psych::alpha(x = nlsy.data[, c(2:10)])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.81      0.82    0.82      0.34 4.5 0.003  1.8 0.51     0.33
## 
##  lower alpha upper     95% confidence boundaries
## 0.8 0.81 0.81 
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## x1      0.79      0.80    0.80      0.34 4.1   0.0033 0.0073  0.33
## x2      0.79      0.81    0.80      0.34 4.2   0.0032 0.0065  0.34
## x3      0.78      0.79    0.78      0.32 3.8   0.0034 0.0050  0.33
## x4      0.78      0.79    0.78      0.32 3.8   0.0034 0.0052  0.33
## x5      0.79      0.81    0.80      0.35 4.3   0.0032 0.0069  0.35
## x6      0.79      0.80    0.79      0.34 4.1   0.0033 0.0054  0.34
## x7      0.79      0.81    0.80      0.34 4.1   0.0033 0.0067  0.34
## x8      0.80      0.81    0.80      0.35 4.2   0.0032 0.0056  0.34
## x9      0.77      0.79    0.78      0.32 3.7   0.0035 0.0057  0.32
## 
##  Item statistics 
##       n raw.r std.r r.cor r.drop mean   sd
## x1 1833  0.66  0.63  0.56   0.51  2.3 1.02
## x2 1859  0.59  0.60  0.52   0.46  1.6 0.80
## x3 1874  0.67  0.70  0.67   0.58  1.4 0.67
## x4 1872  0.66  0.70  0.66   0.57  1.4 0.63
## x5 1815  0.58  0.59  0.50   0.45  1.8 0.81
## x6 1811  0.65  0.62  0.56   0.51  2.3 0.93
## x7 1775  0.66  0.61  0.54   0.49  2.2 1.07
## x8 1875  0.54  0.59  0.51   0.44  1.3 0.56
## x9 1847  0.72  0.73  0.69   0.63  1.7 0.74
## 
## Non missing response frequency for each item
##       1    2    3    4 miss
## x1 0.25 0.34 0.25 0.16 0.80
## x2 0.54 0.33 0.09 0.04 0.79
## x3 0.67 0.27 0.05 0.02 0.79
## x4 0.70 0.25 0.04 0.01 0.79
## x5 0.43 0.41 0.12 0.04 0.80
## x6 0.22 0.40 0.26 0.12 0.80
## x7 0.32 0.28 0.23 0.16 0.80
## x8 0.73 0.23 0.03 0.01 0.79
## x9 0.44 0.43 0.10 0.02 0.79

the alpha is 0.81, which is over the 0.70 minimum value standard
the raw_alpha column shows what would happen if we dropped any single item from our scale. If dropping an item would raise the alpha then we should consider if the item is measuing the same concept as the other items
to obtain the scale score for each person in our sample, we would simply compute the total or mean score for the 9 items

nlsy.data <- 
  nlsy.data %>%
  rowwise() %>%
  mutate(m = mean(c(x1, x2, x3, x4, x5, x6, x7, x8, x9), na.rm = TRUE))

describe(nlsy.data$m)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1888 1.78 0.51   1.69    1.74 0.53   1   4     3 0.72     0.53 0.01

hist(nlsy.data$m)

1.3 Generating a factor score rather than a mean of summative score

you can generate a factor score that weights each item according to how salient it is to the concept being measured
if the scores are similar then factor scores will be very similar to simple mean scores

# pcfa <- 
#   nlsy.data %>%
#   select(x1:x9) %>%
#   drop_na() %>%
#   principal(covar = TRUE, rotate = "none", nfactors = 9, scores = TRUE)
# nlsy.data$factor_scores <- pcfa$scores
# 
# hist(nlsy.data$factor_scores)
# cor(nlsy.data$m, nlsy.data$factor_scores)

1.4 What can CFA add?

CFA and factor analysis methods allow each item to have its own variance
confirmatory models specify the factor that underlies the responses to items - all the items are indicators of conservatism
with CFA we estimate the variance of each item. Thus we acknowledge that each item may have some unique variance that we are treating as random error
we get better measures of the latent variable by isolating the shared variance from each item’s variance

1.5 Fitting a CFA model

library(lavaan)

## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.

## 
## Attaching package: 'lavaan'

## The following object is masked from 'package:psych':
## 
##     cor2cov

cfa.model <- '
  # latent variable model
  conservative =~ 1 * x1 + a2 * x2 + a3 * x3 + a4 * x4 + a5 * x5 + a6 * x6 + 
    a7 * x7 + a8 * x8 + a9 * x9
'

cfa.fit <- cfa(cfa.model, data = nlsy.data)
summary(cfa.fit)

## lavaan 0.6-9 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
##                                                       
##                                                   Used       Total
##   Number of observations                          1625        8985
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               419.007
##   Degrees of freedom                                27
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   conservative =~                                     
##     x1                1.000                           
##     x2        (a2)    0.738    0.046   16.153    0.000
##     x3        (a3)    0.827    0.043   19.425    0.000
##     x4        (a4)    0.756    0.039   19.222    0.000
##     x5        (a5)    0.738    0.046   15.941    0.000
##     x6        (a6)    0.915    0.054   16.992    0.000
##     x7        (a7)    1.028    0.062   16.607    0.000
##     x8        (a8)    0.549    0.033   16.675    0.000
##     x9        (a9)    0.928    0.048   19.475    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.729    0.028   26.051    0.000
##    .x2                0.471    0.018   26.439    0.000
##    .x3                0.240    0.010   23.378    0.000
##    .x4                0.215    0.009   23.722    0.000
##    .x5                0.495    0.019   26.540    0.000
##    .x6                0.590    0.023   25.970    0.000
##    .x7                0.820    0.031   26.201    0.000
##    .x8                0.230    0.009   26.162    0.000
##    .x9                0.297    0.013   23.287    0.000
##     conservative      0.316    0.029   11.039    0.000

1.6 Interpreting and presenting CFA results

this first factor loading is fixed at 1 and is called the reference indicator
standardized estimates fix all variances to 1

standardizedSolution(cfa.fit)

##             lhs op          rhs label est.std    se      z pvalue ci.lower
## 1  conservative =~           x1         0.550 0.020 27.583      0    0.511
## 2  conservative =~           x2    a2   0.517 0.021 24.926      0    0.476
## 3  conservative =~           x3    a3   0.688 0.016 42.841      0    0.657
## 4  conservative =~           x4    a4   0.676 0.016 41.107      0    0.643
## 5  conservative =~           x5    a5   0.508 0.021 24.208      0    0.467
## 6  conservative =~           x6    a6   0.556 0.020 28.118      0    0.517
## 7  conservative =~           x7    a7   0.538 0.020 26.575      0    0.498
## 8  conservative =~           x8    a8   0.541 0.020 26.838      0    0.501
## 9  conservative =~           x9    a9   0.691 0.016 43.295      0    0.660
## 10           x1 ~~           x1         0.698 0.022 31.834      0    0.655
## 11           x2 ~~           x2         0.733 0.021 34.138      0    0.690
## 12           x3 ~~           x3         0.526 0.022 23.804      0    0.483
## 13           x4 ~~           x4         0.544 0.022 24.471      0    0.500
## 14           x5 ~~           x5         0.742 0.021 34.848      0    0.700
## 15           x6 ~~           x6         0.691 0.022 31.422      0    0.648
## 16           x7 ~~           x7         0.711 0.022 32.654      0    0.668
## 17           x8 ~~           x8         0.707 0.022 32.434      0    0.665
## 18           x9 ~~           x9         0.522 0.022 23.635      0    0.479
## 19 conservative ~~ conservative         1.000 0.000     NA     NA    1.000
##    ci.upper
## 1     0.589
## 2     0.558
## 3     0.720
## 4     0.708
## 5     0.549
## 6     0.595
## 7     0.577
## 8     0.580
## 9     0.723
## 10    0.741
## 11    0.775
## 12    0.570
## 13    0.587
## 14    0.784
## 15    0.734
## 16    0.753
## 17    0.750
## 18    0.565
## 19    1.000

library(semPlot)
semPaths(cfa.fit, what = "est", edge.label.cex = 0.75, edge.color = "black")

interpretation of the loadings: if you are 1 standard deviation higher on Conservative, you will respond 0.55 standarad deviations higher on x1, 0.52 standard deviations higher on x2, etc.
With CFA we can also estimate reliability which is the propostion of the total variation in a scale formed by our indicators that is attributed to the true score

1.7 Assessing goodness of fit

summary(cfa.fit, fit.measures = TRUE)

## lavaan 0.6-9 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
##                                                       
##                                                   Used       Total
##   Number of observations                          1625        8985
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               419.007
##   Degrees of freedom                                27
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3872.316
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.898
##   Tucker-Lewis Index (TLI)                       0.864
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -15593.729
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               31223.457
##   Bayesian (BIC)                             31320.536
##   Sample-size adjusted Bayesian (BIC)        31263.353
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.095
##   90 Percent confidence interval - lower         0.087
##   90 Percent confidence interval - upper         0.103
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.049
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   conservative =~                                     
##     x1                1.000                           
##     x2        (a2)    0.738    0.046   16.153    0.000
##     x3        (a3)    0.827    0.043   19.425    0.000
##     x4        (a4)    0.756    0.039   19.222    0.000
##     x5        (a5)    0.738    0.046   15.941    0.000
##     x6        (a6)    0.915    0.054   16.992    0.000
##     x7        (a7)    1.028    0.062   16.607    0.000
##     x8        (a8)    0.549    0.033   16.675    0.000
##     x9        (a9)    0.928    0.048   19.475    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.729    0.028   26.051    0.000
##    .x2                0.471    0.018   26.439    0.000
##    .x3                0.240    0.010   23.378    0.000
##    .x4                0.215    0.009   23.722    0.000
##    .x5                0.495    0.019   26.540    0.000
##    .x6                0.590    0.023   25.970    0.000
##    .x7                0.820    0.031   26.201    0.000
##    .x8                0.230    0.009   26.162    0.000
##    .x9                0.297    0.013   23.287    0.000
##     conservative      0.316    0.029   11.039    0.000

modificationIndices(cfa.fit)

##    lhs op rhs      mi    epc sepc.lv sepc.all sepc.nox
## 20  x1 ~~  x2  27.046  0.083   0.083    0.142    0.142
## 21  x1 ~~  x3   1.645 -0.016  -0.016   -0.038   -0.038
## 22  x1 ~~  x4  22.437 -0.055  -0.055   -0.139   -0.139
## 23  x1 ~~  x5   1.255  0.018   0.018    0.030    0.030
## 24  x1 ~~  x6  16.165  0.073   0.073    0.111    0.111
## 25  x1 ~~  x7   8.434  0.062   0.062    0.080    0.080
## 26  x1 ~~  x8  13.662 -0.042  -0.042   -0.101   -0.101
## 27  x1 ~~  x9   0.798 -0.012  -0.012   -0.027   -0.027
## 28  x2 ~~  x3   2.945  0.017   0.017    0.050    0.050
## 29  x2 ~~  x4   0.252  0.005   0.005    0.015    0.015
## 30  x2 ~~  x5  16.394  0.053   0.053    0.109    0.109
## 31  x2 ~~  x6  39.084 -0.090  -0.090   -0.171   -0.171
## 32  x2 ~~  x7   0.083  0.005   0.005    0.008    0.008
## 33  x2 ~~  x8   1.676 -0.012  -0.012   -0.035   -0.035
## 34  x2 ~~  x9  11.090 -0.036  -0.036   -0.098   -0.098
## 35  x3 ~~  x4 147.975  0.089   0.089    0.392    0.392
## 36  x3 ~~  x5  18.664 -0.043  -0.043   -0.126   -0.126
## 37  x3 ~~  x6  15.379 -0.044  -0.044   -0.116   -0.116
## 38  x3 ~~  x7  15.291 -0.051  -0.051   -0.115   -0.115
## 39  x3 ~~  x8   1.463  0.008   0.008    0.036    0.036
## 40  x3 ~~  x9  16.558 -0.036  -0.036   -0.133   -0.133
## 41  x4 ~~  x5   0.441  0.006   0.006    0.019    0.019
## 42  x4 ~~  x6  12.898 -0.038  -0.038   -0.106   -0.106
## 43  x4 ~~  x7  22.944 -0.059  -0.059   -0.140   -0.140
## 44  x4 ~~  x8  11.218  0.022   0.022    0.098    0.098
## 45  x4 ~~  x9  30.371 -0.045  -0.045   -0.178   -0.178
## 46  x5 ~~  x6   1.967  0.021   0.021    0.038    0.038
## 47  x5 ~~  x7   3.838 -0.034  -0.034   -0.053   -0.053
## 48  x5 ~~  x8   0.168  0.004   0.004    0.011    0.011
## 49  x5 ~~  x9   0.013  0.001   0.001    0.003    0.003
## 50  x6 ~~  x7  29.683  0.104   0.104    0.150    0.150
## 51  x6 ~~  x8  31.435 -0.057  -0.057   -0.154   -0.154
## 52  x6 ~~  x9  62.385  0.098   0.098    0.235    0.235
## 53  x7 ~~  x8   1.752 -0.016  -0.016   -0.036   -0.036
## 54  x7 ~~  x9  19.055  0.064   0.064    0.129    0.129
## 55  x8 ~~  x9  16.553  0.031   0.031    0.120    0.120

a modification index is an estimate of how much the chi-squared will be reduced if we estimated a particular extra parameter
because items 2 and 8 aren’t great indicators of conservatism we will drop them. We’ll also allow for items 3 and 4 to be correlated

cfa.model <- '
  # latent variable model
  conservative =~ a1 * x1 + a3 * x3 + a4 * x4 + a5 * x5 + a6 * x6 + a7 * x7 + 
    a9 * x9
    
  # covariances
  x3 ~~ x4
'

cfa.fit <- cfa(cfa.model, data = nlsy.data)
summary(cfa.fit, standardized = TRUE, fit.measures = TRUE, rsquare = TRUE)

## lavaan 0.6-9 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
##                                                       
##                                                   Used       Total
##   Number of observations                          1631        8985
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                56.021
##   Degrees of freedom                                13
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              2870.789
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.985
##   Tucker-Lewis Index (TLI)                       0.976
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -12634.282
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               25298.564
##   Bayesian (BIC)                             25379.519
##   Sample-size adjusted Bayesian (BIC)        25331.866
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.045
##   90 Percent confidence interval - lower         0.033
##   90 Percent confidence interval - upper         0.057
##   P-value RMSEA <= 0.05                          0.729
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.023
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   conservative =~                                                       
##     x1        (a1)    1.000                               0.573    0.560
##     x3        (a3)    0.685    0.040   16.946    0.000    0.393    0.580
##     x4        (a4)    0.618    0.037   16.592    0.000    0.354    0.563
##     x5        (a5)    0.705    0.046   15.279    0.000    0.404    0.496
##     x6        (a6)    1.037    0.057   18.130    0.000    0.594    0.642
##     x7        (a7)    1.078    0.064   16.944    0.000    0.618    0.575
##     x9        (a9)    0.955    0.049   19.311    0.000    0.547    0.727
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .x3 ~~                                                                 
##    .x4                0.110    0.009   12.093    0.000    0.110    0.384
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .x1                0.717    0.029   25.054    0.000    0.717    0.686
##    .x3                0.304    0.012   24.449    0.000    0.304    0.663
##    .x4                0.270    0.011   24.762    0.000    0.270    0.683
##    .x5                0.502    0.019   26.080    0.000    0.502    0.754
##    .x6                0.504    0.022   23.153    0.000    0.504    0.588
##    .x7                0.772    0.031   24.769    0.000    0.772    0.669
##    .x9                0.268    0.013   19.898    0.000    0.268    0.472
##     conservative      0.329    0.030   10.957    0.000    1.000    1.000
## 
## R-Square:
##                    Estimate
##     x1                0.314
##     x3                0.337
##     x4                0.317
##     x5                0.246
##     x6                0.412
##     x7                0.331
##     x9                0.528

standardizedSolution(cfa.fit)

##             lhs op          rhs label est.std    se      z pvalue ci.lower
## 1  conservative =~           x1    a1   0.560 0.021 27.163      0    0.520
## 2  conservative =~           x3    a3   0.580 0.020 28.457      0    0.540
## 3  conservative =~           x4    a4   0.563 0.021 26.987      0    0.522
## 4  conservative =~           x5    a5   0.496 0.022 22.383      0    0.452
## 5  conservative =~           x6    a6   0.642 0.019 34.481      0    0.605
## 6  conservative =~           x7    a7   0.575 0.020 28.375      0    0.536
## 7  conservative =~           x9    a9   0.727 0.017 43.978      0    0.694
## 8            x3 ~~           x4         0.384 0.024 16.223      0    0.338
## 9            x1 ~~           x1         0.686 0.023 29.653      0    0.641
## 10           x3 ~~           x3         0.663 0.024 28.035      0    0.617
## 11           x4 ~~           x4         0.683 0.023 29.089      0    0.637
## 12           x5 ~~           x5         0.754 0.022 34.374      0    0.711
## 13           x6 ~~           x6         0.588 0.024 24.624      0    0.541
## 14           x7 ~~           x7         0.669 0.023 28.684      0    0.623
## 15           x9 ~~           x9         0.472 0.024 19.629      0    0.425
## 16 conservative ~~ conservative         1.000 0.000     NA     NA    1.000
##    ci.upper
## 1     0.601
## 2     0.620
## 3     0.604
## 4     0.539
## 5     0.678
## 6     0.615
## 7     0.759
## 8     0.431
## 9     0.731
## 10    0.710
## 11    0.729
## 12    0.797
## 13    0.635
## 14    0.715
## 15    0.519
## 16    1.000

1.7.1 Final model and estimating scale reliability

fix the variance of conservative to 1

cfa.model <- '
  # latent variable model
  conservative =~ x1 + x3 + x4 + x5 + x6 + x7 + x9
    
  # (co)variances
  x3 ~~ x4
  conservative ~~ 1 * conservative
'

cfa.fit <- cfa(cfa.model, data = nlsy.data, std.lv = TRUE)
summary(cfa.fit)

## lavaan 0.6-9 ended normally after 19 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
##                                                       
##                                                   Used       Total
##   Number of observations                          1631        8985
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                56.021
##   Degrees of freedom                                13
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   conservative =~                                     
##     x1                0.573    0.026   21.914    0.000
##     x3                0.393    0.017   22.656    0.000
##     x4                0.354    0.016   21.830    0.000
##     x5                0.404    0.021   19.017    0.000
##     x6                0.594    0.023   25.749    0.000
##     x7                0.618    0.027   22.595    0.000
##     x9                0.547    0.018   29.977    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .x3 ~~                                               
##    .x4                0.110    0.009   12.093    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     conservative      1.000                           
##    .x1                0.717    0.029   25.054    0.000
##    .x3                0.304    0.012   24.449    0.000
##    .x4                0.270    0.011   24.762    0.000
##    .x5                0.502    0.019   26.080    0.000
##    .x6                0.504    0.022   23.153    0.000
##    .x7                0.772    0.031   24.769    0.000
##    .x9                0.268    0.013   19.898    0.000

standardizedSolution(cfa.fit)

##             lhs op          rhs est.std    se      z pvalue ci.lower ci.upper
## 1  conservative =~           x1   0.560 0.021 27.163      0    0.520    0.601
## 2  conservative =~           x3   0.580 0.020 28.457      0    0.540    0.620
## 3  conservative =~           x4   0.563 0.021 26.987      0    0.522    0.604
## 4  conservative =~           x5   0.496 0.022 22.383      0    0.452    0.539
## 5  conservative =~           x6   0.642 0.019 34.481      0    0.605    0.678
## 6  conservative =~           x7   0.575 0.020 28.375      0    0.536    0.615
## 7  conservative =~           x9   0.727 0.017 43.978      0    0.694    0.759
## 8            x3 ~~           x4   0.384 0.024 16.223      0    0.338    0.431
## 9  conservative ~~ conservative   1.000 0.000     NA     NA    1.000    1.000
## 10           x1 ~~           x1   0.686 0.023 29.653      0    0.641    0.731
## 11           x3 ~~           x3   0.663 0.024 28.035      0    0.617    0.710
## 12           x4 ~~           x4   0.683 0.023 29.089      0    0.637    0.729
## 13           x5 ~~           x5   0.754 0.022 34.374      0    0.711    0.797
## 14           x6 ~~           x6   0.588 0.024 24.624      0    0.541    0.635
## 15           x7 ~~           x7   0.669 0.023 28.684      0    0.623    0.715
## 16           x9 ~~           x9   0.472 0.024 19.629      0    0.425    0.519

now plot the final model

library(lavaanPlot)
lavaanPlot(model = cfa.fit, coefs = TRUE, stand = TRUE)

semPaths(cfa.fit)

1.8 A two-factor model

now we add an additional indicator Depress which is measured by three indicators

1.8.1 Evaluating the depression dimension

dep.model <- '
  Depress =~ x11 + x12 + x13
'

dep.fit <- cfa(dep.model, data = nlsy.data)
standardizedSolution(dep.fit)

##       lhs op     rhs est.std    se       z pvalue ci.lower ci.upper
## 1 Depress =~     x11   0.813 0.010  79.821      0    0.793    0.833
## 2 Depress =~     x12  -0.609 0.010 -59.602      0   -0.629   -0.589
## 3 Depress =~     x13   0.655 0.010  64.725      0    0.635    0.675
## 4     x11 ~~     x11   0.339 0.017  20.458      0    0.306    0.371
## 5     x12 ~~     x12   0.629 0.012  50.592      0    0.605    0.654
## 6     x13 ~~     x13   0.571 0.013  43.112      0    0.545    0.597
## 7 Depress ~~ Depress   1.000 0.000      NA     NA    1.000    1.000

item x12 would need to be reverse coded if we were using a traditional summated scale
it is useful to have the majority of the indicators coded so that a higher score is associated with more of the concept

1.8.2 Estimating a two-factor model

two.fac.model = '
  # latent variable models
  Conservative =~ x1 + x3 + x4 + x5 + x6 + x7 + x9
  Depress =~ x11 + x12 + x13
  
  # (co)variances
  x3 ~~ x4
'

two.fac.fit <- cfa(two.fac.model, data = nlsy.data)
standardizedSolution(two.fac.fit)

##             lhs op          rhs est.std    se       z pvalue ci.lower ci.upper
## 1  Conservative =~           x1   0.555 0.022  25.456  0.000    0.512    0.598
## 2  Conservative =~           x3   0.578 0.021  26.991  0.000    0.536    0.620
## 3  Conservative =~           x4   0.565 0.022  25.914  0.000    0.522    0.608
## 4  Conservative =~           x5   0.482 0.024  20.472  0.000    0.436    0.529
## 5  Conservative =~           x6   0.663 0.019  35.068  0.000    0.626    0.700
## 6  Conservative =~           x7   0.581 0.021  27.491  0.000    0.539    0.622
## 7  Conservative =~           x9   0.730 0.017  42.531  0.000    0.697    0.764
## 8       Depress =~          x11   0.811 0.022  36.108  0.000    0.767    0.855
## 9       Depress =~          x12  -0.615 0.023 -27.212  0.000   -0.659   -0.570
## 10      Depress =~          x13   0.647 0.022  28.880  0.000    0.603    0.691
## 11           x3 ~~           x4   0.380 0.025  15.176  0.000    0.331    0.429
## 12           x1 ~~           x1   0.692 0.024  28.613  0.000    0.645    0.740
## 13           x3 ~~           x3   0.665 0.025  26.836  0.000    0.617    0.714
## 14           x4 ~~           x4   0.681 0.025  27.600  0.000    0.632    0.729
## 15           x5 ~~           x5   0.767 0.023  33.756  0.000    0.723    0.812
## 16           x6 ~~           x6   0.560 0.025  22.317  0.000    0.511    0.609
## 17           x7 ~~           x7   0.663 0.025  27.007  0.000    0.615    0.711
## 18           x9 ~~           x9   0.467 0.025  18.605  0.000    0.417    0.516
## 19          x11 ~~          x11   0.342 0.036   9.389  0.000    0.271    0.414
## 20          x12 ~~          x12   0.622 0.028  22.420  0.000    0.568    0.677
## 21          x13 ~~          x13   0.581 0.029  20.028  0.000    0.524    0.638
## 22 Conservative ~~ Conservative   1.000 0.000      NA     NA    1.000    1.000
## 23      Depress ~~      Depress   1.000 0.000      NA     NA    1.000    1.000
## 24 Conservative ~~      Depress   0.115 0.033   3.469  0.001    0.050    0.180

broom::tidy(cfa.fit, conf.int = TRUE)

## # A tibble: 16 × 11
##    term     op    estimate std.error statistic p.value conf.low conf.high std.lv
##    <chr>    <chr>    <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>  <dbl>
##  1 conserv… =~       0.573   0.0262       21.9       0   0.522      0.624  0.573
##  2 conserv… =~       0.393   0.0173       22.7       0   0.359      0.427  0.393
##  3 conserv… =~       0.354   0.0162       21.8       0   0.322      0.386  0.354
##  4 conserv… =~       0.404   0.0213       19.0       0   0.363      0.446  0.404
##  5 conserv… =~       0.594   0.0231       25.7       0   0.549      0.640  0.594
##  6 conserv… =~       0.618   0.0273       22.6       0   0.564      0.671  0.618
##  7 conserv… =~       0.547   0.0183       30.0       0   0.512      0.583  0.547
##  8 x3 ~~ x4 ~~       0.110   0.00912      12.1       0   0.0924     0.128  0.110
##  9 conserv… ~~       1       0            NA        NA   1          1      1    
## 10 x1 ~~ x1 ~~       0.717   0.0286       25.1       0   0.661      0.773  0.717
## 11 x3 ~~ x3 ~~       0.304   0.0124       24.4       0   0.280      0.329  0.304
## 12 x4 ~~ x4 ~~       0.270   0.0109       24.8       0   0.249      0.292  0.270
## 13 x5 ~~ x5 ~~       0.502   0.0192       26.1       0   0.464      0.540  0.502
## 14 x6 ~~ x6 ~~       0.504   0.0218       23.2       0   0.462      0.547  0.504
## 15 x7 ~~ x7 ~~       0.772   0.0312       24.8       0   0.711      0.833  0.772
## 16 x9 ~~ x9 ~~       0.268   0.0134       19.9       0   0.241      0.294  0.268
## # … with 2 more variables: std.all <dbl>, std.nox <dbl>

semPaths(two.fac.fit)

modificationIndices(two.fac.fit)

##             lhs op rhs     mi    epc sepc.lv sepc.all sepc.nox
## 25 Conservative =~ x11  2.254 -0.048  -0.027   -0.041   -0.041
## 26 Conservative =~ x12  6.017  0.073   0.041    0.064    0.064
## 27 Conservative =~ x13 16.582  0.107   0.061    0.105    0.105
## 28      Depress =~  x1  0.011 -0.005  -0.003   -0.003   -0.003
## 29      Depress =~  x3  2.077  0.043   0.023    0.034    0.034
## 30      Depress =~  x4  0.000  0.000   0.000    0.000    0.000
## 31      Depress =~  x5  3.603 -0.079  -0.043   -0.052   -0.052
## 32      Depress =~  x6  5.472 -0.101  -0.055   -0.059   -0.059
## 33      Depress =~  x7  4.911  0.117   0.063    0.058    0.058
## 34      Depress =~  x9  0.185  0.015   0.008    0.011    0.011
## 35           x1 ~~  x3  5.651  0.031   0.031    0.065    0.065
## 36           x1 ~~  x4  1.121 -0.013  -0.013   -0.029   -0.029
## 37           x1 ~~  x5  2.792  0.030   0.030    0.049    0.049
## 38           x1 ~~  x6  0.975  0.019   0.019    0.033    0.033
## 39           x1 ~~  x7  2.114  0.034   0.034    0.045    0.045
## 40           x1 ~~  x9 18.483 -0.069  -0.069   -0.157   -0.157
## 41           x1 ~~ x11  1.026  0.012   0.012    0.038    0.038
## 42           x1 ~~ x12  0.018  0.002   0.002    0.004    0.004
## 43           x1 ~~ x13  1.775 -0.015  -0.015   -0.040   -0.040
## 44           x3 ~~  x5  1.963 -0.015  -0.015   -0.037   -0.037
## 45           x3 ~~  x6  4.246 -0.023  -0.023   -0.061   -0.061
## 46           x3 ~~  x7  0.263 -0.007  -0.007   -0.014   -0.014
## 47           x3 ~~  x9  1.484  0.011   0.011    0.039    0.039
## 48           x3 ~~ x11  6.148  0.018   0.018    0.083    0.083
## 49           x3 ~~ x12  2.091  0.011   0.011    0.039    0.039
## 50           x3 ~~ x13  0.099 -0.002  -0.002   -0.009   -0.009
## 51           x4 ~~  x5 26.360  0.050   0.050    0.135    0.135
## 52           x4 ~~  x6  1.634 -0.013  -0.013   -0.037   -0.037
## 53           x4 ~~  x7  1.751 -0.017  -0.017   -0.036   -0.036
## 54           x4 ~~  x9  0.059 -0.002  -0.002   -0.008   -0.008
## 55           x4 ~~ x11  0.717 -0.006  -0.006   -0.028   -0.028
## 56           x4 ~~ x12  1.891  0.010   0.010    0.037    0.037
## 57           x4 ~~ x13  5.704  0.015   0.015    0.065    0.065
## 58           x5 ~~  x6  1.649 -0.020  -0.020   -0.041   -0.041
## 59           x5 ~~  x7  7.188 -0.050  -0.050   -0.080   -0.080
## 60           x5 ~~  x9  0.772 -0.011  -0.011   -0.030   -0.030
## 61           x5 ~~ x11  2.954 -0.017  -0.017   -0.063   -0.063
## 62           x5 ~~ x12  1.324  0.012   0.012    0.034    0.034
## 63           x5 ~~ x13  1.475  0.011   0.011    0.036    0.036
## 64           x6 ~~  x7  1.503  0.025   0.025    0.041    0.041
## 65           x6 ~~  x9  6.477  0.038   0.038    0.108    0.108
## 66           x6 ~~ x11  4.092 -0.021  -0.021   -0.078   -0.078
## 67           x6 ~~ x12  1.071  0.011   0.011    0.032    0.032
## 68           x6 ~~ x13  1.097  0.010   0.010    0.033    0.033
## 69           x7 ~~  x9  0.720  0.014   0.014    0.032    0.032
## 70           x7 ~~ x11  1.050  0.013   0.013    0.038    0.038
## 71           x7 ~~ x12  0.197  0.006   0.006    0.013    0.013
## 72           x7 ~~ x13  2.356  0.018   0.018    0.047    0.047
## 73           x9 ~~ x11  1.600 -0.010  -0.010   -0.052   -0.052
## 74           x9 ~~ x12  3.648 -0.016  -0.016   -0.062   -0.062
## 75           x9 ~~ x13  0.124  0.003   0.003    0.012    0.012
## 76          x11 ~~ x12 16.582 -0.299  -0.299   -1.497   -1.497
## 77          x11 ~~ x13  6.017 -0.181  -0.181   -1.046   -1.046
## 78          x12 ~~ x13  2.254  0.065   0.065    0.287    0.287