10.3 Preconditions of our model

As we want to deploy latent factor analysis using structural equation modeling, we need at least four variables, i.e.,indicators for our latent variable. In most cases these may reflect question items from survey research. Here we often want to assess certain theoretical constructs and running a questionnaire with a huge number of people, while not cheap, is widely used in research. Here, variance of the answers to certain questions asked of respondents is reflected in a measure, coding answers that we received on a Likert-Scale. Usually we use 5- to 7-point Likert scales ( 1 - “I completely disagree.” to 7 - “I completely agree.”) and treat answers registered and coded as numbers as quasi-metric.

Let’s say, we are interested in power at different levels of factorial validity of the indicators that belong to the same construct. For the sake of simplicity, we choose the correlations between the indicators to be .25. .5, .8, and .9. As our data that we randomly generate will be almost perfectly normally-distributed and correlated, this means that the factor loadings of the indicators will be \(\sqrt{x_i}\), i. e., .5 in R1, ≈ .7 in R2, ≈ .89 in R 3, and ≈ .95 in R4.13

Using standardized factor loadings, we usually assess the quality of an indicator, as we expect a good indicator to covary well with other indicators that are supposed to measure the same construct.

In the code below, we set up the correlation matrices.

R1 <- matrix(c(1, 0.25, 0.25, 0.25,
              0.25, 1, 0.25, 0.25,
              0.25 ,0.25, 1, 0.25,
              0.25, 0.25, 0.25, 1), 
            nrow = 4, ncol = 4)

R2 <- matrix(c(1, 0.5, 0.5, 0.5,
              0.5, 1, 0.5, 0.5,
              0.5 ,0.5, 1, 0.5,
              0.5, 0.5, 0.5, 1), 
            nrow = 4, ncol = 4)

R3 <- matrix(c(1, 0.8, 0.8, 0.8,
              0.8, 1, 0.8, 0.8,
              0.8 ,0.8, 1, 0.8,
              0.8, 0.8, 0.8, 1), 
            nrow = 4, ncol = 4)

R4 <- matrix(c(1, 0.99, 0.99, 0.99,
              0.99, 1, 0.99, 0.99,
              0.99 ,0.99, 1, 0.99,
              0.99, 0.99, 0.99, 1), 
            nrow = 4, ncol = 4)

Let us assume that we are interested in mean differences between two latent factors where the means of the indicators differ by 0.2. For simplicity, we assume means for each normally-distributed indicator in group 1 are zero, i.e., \(Mean_{x_i} = 0\) with \(SD_{x_i} = 1\) and in group 2 \(Mean_{x_i} = 0.2\) with \(SD_{x_i} = 1\).

Using the function rmvnorm from the R-package mvtnorm we generate two data sets for the two groups that we observe. We tell rmvnorm to generate a data set with 10000 cases, set the mean of each of the four variables to either 0 or 0.2 and use the covariance matrix, i.e., sigma = R that we just set up.

set.seed(1895)
test_norm_25_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R1))
test_norm_25_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R1))
test_norm_50_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R2))
test_norm_50_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R2))
test_norm_80_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R3))
test_norm_80_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R3))
test_norm_99_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R4))
test_norm_99_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R4))

Let’s see if everything worked out so far using the example of the first variable of the data set with a inter-item-correlation of 0.25.

We request the mean and standard deviation for both groups.

mean(test_norm_25_1$V1)
## [1] 0.0072
sd(test_norm_25_1$V1)
## [1] 0.98
mean(test_norm_25_2$V1)
## [1] 0.2
sd(test_norm_25_2$V1)
## [1] 1

Now, we assign a value for a group variable, so that from here on, we can distinguish the respective groups.

test_norm_25_1$group <- 1
test_norm_25_2$group <- 2
test_norm_50_1$group <- 1
test_norm_50_2$group <- 2
test_norm_80_1$group <- 1
test_norm_80_2$group <- 2
test_norm_99_1$group <- 1
test_norm_99_2$group <- 2

We now specify and estimate a simple factor model where the four variables load on one factor. See the tutorial for the R package lavaan for further information on model specification and all general questions concerning confirmatory factor analysis.

Please note that I use the command effect.coding to scale the variance of the latent factor (Little, Slegers, and Card 2006). When left out, lavaan will use the first indicator variable as the reference indicator and will fix its (non-standardized) loading to 1. However, then the mean of our latent variable will also be fixed to zero in the two groups and we do not want this, especially with regard to group 2 and our general interest in mean differences of latent variables. Effect-coding estimates a solution where the unstandardized factor loadings average to 1 and the intercepts of the manifest variables will average to 0.14

Furthermore, we want to include a mean structure using the command meanstructure = T, i.e., estimate means and intercepts for the factor and the variables, respectively.

#Specify factor model
factor_model <- 'factor =~ V1+V2+V3+V4
'
#Esimate model
fit_model_cor25_group1 <- cfa(factor_model, test_norm_25_1, meanstructure=T, effect.coding = T)
fit_model_cor25_group2 <- cfa(factor_model, test_norm_25_2, meanstructure=T, effect.coding = T)
fit_model_cor50_group1 <- cfa(factor_model, test_norm_50_1, meanstructure=T, effect.coding = T)
fit_model_cor50_group2 <- cfa(factor_model, test_norm_50_2, meanstructure=T, effect.coding = T)
fit_model_cor80_group1 <- cfa(factor_model, test_norm_80_1, meanstructure=T, effect.coding = T)
fit_model_cor80_group2 <- cfa(factor_model, test_norm_80_2, meanstructure=T, effect.coding = T)
fit_model_cor99_group1 <- cfa(factor_model, test_norm_99_1, meanstructure=T, effect.coding = T)
fit_model_cor99_group2 <- cfa(factor_model, test_norm_99_2, meanstructure=T, effect.coding = T)

We take a brief look at the results of the our estimated model in the two groups with low indicator correlations (\(r = .25\)).

lavaan::summary(fit_model_cor25_group1, fit.measures = T, standardized = T)
## lavaan 0.6-7 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         14
##   Number of equality constraints                     2
##                                                       
##   Number of observations                         10000
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 1.332
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.514
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3082.349
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.001
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -54958.922
##   Loglikelihood unrestricted model (H1)     -54958.256
##                                                       
##   Akaike (AIC)                              109941.844
##   Bayesian (BIC)                            110028.368
##   Sample-size adjusted Bayesian (BIC)       109990.234
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.018
##   P-value RMSEA <= 0.05                          1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.002
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   factor =~                                                             
##     V1                0.973    0.023   41.865    0.000    0.485    0.493
##     V2                1.011    0.024   42.785    0.000    0.504    0.505
##     V3                1.041    0.024   43.630    0.000    0.519    0.521
##     V4                0.975    0.023   41.750    0.000    0.486    0.489
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1               -0.001    0.007   -0.075    0.940   -0.001   -0.001
##    .V2                0.001    0.007    0.185    0.853    0.001    0.001
##    .V3               -0.003    0.007   -0.376    0.707   -0.003   -0.003
##    .V4                0.002    0.007    0.261    0.794    0.002    0.002
##     factor            0.008    0.007    1.207    0.227    0.016    0.016
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1                0.734    0.014   53.151    0.000    0.734    0.757
##    .V2                0.744    0.014   51.955    0.000    0.744    0.745
##    .V3                0.722    0.014   50.211    0.000    0.722    0.728
##    .V4                0.753    0.014   53.534    0.000    0.753    0.761
##     factor            0.249    0.006   39.418    0.000    1.000    1.000
lavaan::summary(fit_model_cor25_group2, fit.measures = T, standardized = T)
## lavaan 0.6-7 ended normally after 26 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         14
##   Number of equality constraints                     2
##                                                       
##   Number of observations                         10000
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 2.004
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.367
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3234.326
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -55319.892
##   Loglikelihood unrestricted model (H1)     -55318.889
##                                                       
##   Akaike (AIC)                              110663.783
##   Bayesian (BIC)                            110750.307
##   Sample-size adjusted Bayesian (BIC)       110712.173
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.020
##   P-value RMSEA <= 0.05                          1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.003
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   factor =~                                                             
##     V1                1.011    0.023   44.037    0.000    0.516    0.514
##     V2                1.012    0.023   44.009    0.000    0.517    0.513
##     V3                1.004    0.023   43.858    0.000    0.513    0.511
##     V4                0.973    0.023   42.879    0.000    0.497    0.496
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1                0.003    0.009    0.381    0.703    0.003    0.003
##    .V2               -0.005    0.009   -0.583    0.560   -0.005   -0.005
##    .V3               -0.007    0.009   -0.848    0.396   -0.007   -0.007
##    .V4                0.009    0.009    1.048    0.294    0.009    0.009
##     factor            0.198    0.007   29.658    0.000    0.389    0.389
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1                0.742    0.014   51.661    0.000    0.742    0.736
##    .V2                0.749    0.014   51.798    0.000    0.749    0.737
##    .V3                0.743    0.014   51.935    0.000    0.743    0.739
##    .V4                0.759    0.014   53.486    0.000    0.759    0.754
##     factor            0.261    0.007   40.035    0.000    1.000    1.000

Everything looks fine so far.

For the sake of brevity from here on, I will not display the full summary of the estimated parameters. We can also use lavaan::inspect to extract the necessary information.

The factor loadings in the model for each group with inter-item-correlations of .25 are .5 as expected. That is, the squared standardized loading is the explained variance of the indicator variable by the latent factor.

lavaan::inspect(fit_model_cor25_group1, 'std.all')$lambda
##    factor
## V1   0.49
## V2   0.50
## V3   0.52
## V4   0.49
lavaan::inspect(fit_model_cor25_group2, 'std.all')$lambda
##    factor
## V1   0.51
## V2   0.51
## V3   0.51
## V4   0.50

The means of the latent factor are close to 0 in the first groups and close to 0.2 in the second group, respectively.

lavaan::inspect(fit_model_cor25_group1, 'mean.lv')
## factor 
##  0.008
lavaan::inspect(fit_model_cor25_group2, 'mean.lv')
## factor 
##    0.2

We now merge the two data sets for each condition into one data frame.

data_frame_cor25 <- as.data.frame(rbind(test_norm_25_1, test_norm_25_2))
data_frame_cor50 <- as.data.frame(rbind(test_norm_50_1, test_norm_50_2))
data_frame_cor80 <- as.data.frame(rbind(test_norm_80_1, test_norm_80_2))
data_frame_cor99 <- as.data.frame(rbind(test_norm_99_1, test_norm_99_2))

A model is specified and estimated in which we have our basic factor model.

To test the difference between two parameter estimates, i.e., the means of the latent factor, we require a new parameter that estimates said difference.

model_means <- 'factor =~ V1+V2+V3+V4

#label the means of the latent factor
factor ~ c(mean_1, mean_2)*1

#the := operator defines new parameters
#here, the difference between the two means of the latent factor
diff_mean := mean_2 - mean_1
'

We now estimate the model. Please note that in the code block below, we use equality constraints. As our data is almost perfectly normally-distributed and variables equally correlated in both groups, we would not necessarily need those constraints as parameters estimates will be similar without them, and thus our models are comparable at this point already. For real-world research data, you would need to ensure that there actually is measurement invariance in your groups.15 Aiming for strong measurement invariance at least a) the loadings and b) intercepts would be constrained to equality.

We thus also have equality constraints in the four models that are estimated now.

fit_model_cor25 <- cfa(model_means, #this is the specified model
                       data_frame_cor25, #this is the data frame
                       meanstructure=T, #estimates a mean structure
                       effect.coding = T, #estimates the model using effect coding
                       group = "group", #estimates the model for the two groups
                       group.equal = c("loadings", "intercepts")) #equality constraints for loadings and intercepts

fit_model_cor50 <- cfa(model_means, data_frame_cor50, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts"))
fit_model_cor80 <- cfa(model_means, data_frame_cor80, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts"))
fit_model_cor99 <- cfa(model_means, data_frame_cor99, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts"))

We now take a closer look at the results, starting again with the condition of low inter-item-correlations (R1: \(r = .25\)).

#group1
lavaan::inspect(fit_model_cor25, 'mean.lv')$`1`
## factor 
##  0.008
#group 2
lavaan::inspect(fit_model_cor25, "mean.lv")$"2"
## factor 
##    0.2

The means of the latent factor are roughly 0 in the first groups and 0.2 in the second group, respectively. The same goes for all other data sets as can be seen in the table below.

Data set Mean Group 1 Mean Group 2
R1 - Cor25 0.01 0.2
R2- Cor50 0 0.2
R3 - Cor80 -0.03 0.19
R4 - Cor99 0.02 0.2

  1. Be aware: By using perfect correlations (i.e., a correlation of +1.0 or -1.0) the variance-covariance matrix will not be positive-definite.↩︎

  2. It is also possible to scale the factor variance by using std.lv = TRUE.↩︎

  3. See the chapter on measurement invariance in my coursebook in German or consult the literature (Brown 2015).↩︎