10.3 Preconditions of our model
As we want to deploy latent factor analysis using structural equation modeling, we need at least four variables, i.e.,indicators for our latent variable. In most cases these may reflect question items from survey research. Here we often want to assess certain theoretical constructs and running a questionnaire with a huge number of people, while not cheap, is widely used in research. Here, variance of the answers to certain questions asked of respondents is reflected in a measure, coding answers that we received on a Likert-Scale. Usually we use 5- to 7-point Likert scales ( 1 - “I completely disagree.” to 7 - “I completely agree.”) and treat answers registered and coded as numbers as quasi-metric.
Let’s say, we are interested in power at different levels of factorial validity of the indicators that belong to the same construct.
For the sake of simplicity, we choose the correlations between the indicators to be .25
. .5
, .8
, and .9
. As our data that we randomly generate will be almost perfectly normally-distributed and correlated, this means that the factor loadings of the indicators will be \(\sqrt{x_i}\), i. e., .5
in R1, ≈ .7
in R2, ≈ .89
in R 3, and ≈ .95
in R4. 13
Using standardized factor loadings, we usually assess the quality of an indicator, as we expect a good indicator to covary well with other indicators that are supposed to measure the same construct.
In the code below, we set up the correlation matrices.
<- matrix(c(1, 0.25, 0.25, 0.25,
R1 0.25, 1, 0.25, 0.25,
0.25 ,0.25, 1, 0.25,
0.25, 0.25, 0.25, 1),
nrow = 4, ncol = 4)
<- matrix(c(1, 0.5, 0.5, 0.5,
R2 0.5, 1, 0.5, 0.5,
0.5 ,0.5, 1, 0.5,
0.5, 0.5, 0.5, 1),
nrow = 4, ncol = 4)
<- matrix(c(1, 0.8, 0.8, 0.8,
R3 0.8, 1, 0.8, 0.8,
0.8 ,0.8, 1, 0.8,
0.8, 0.8, 0.8, 1),
nrow = 4, ncol = 4)
<- matrix(c(1, 0.99, 0.99, 0.99,
R4 0.99, 1, 0.99, 0.99,
0.99 ,0.99, 1, 0.99,
0.99, 0.99, 0.99, 1),
nrow = 4, ncol = 4)
Let us assume that we are interested in mean differences between two latent factors where the means of the indicators differ by 0.2. For simplicity, we assume means for each normally-distributed indicator in group 1 are zero, i.e., \(Mean_{x_i} = 0\) with \(SD_{x_i} = 1\) and in group 2 \(Mean_{x_i} = 0.2\) with \(SD_{x_i} = 1\).
Using the function rmvnorm
from the R-package mvtnorm
we generate two data sets for the two groups that we observe. We tell rmvnorm
to generate a data set with 10000 cases, set the mean of each of the four variables to either 0
or 0.2
and use the covariance matrix, i.e., sigma = R
that we just set up.
set.seed(1895)
<- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R1))
test_norm_25_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R1))
test_norm_25_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R2))
test_norm_50_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R2))
test_norm_50_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R3))
test_norm_80_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R3))
test_norm_80_2 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0,0,0,0), sigma = R4))
test_norm_99_1 <- as.data.frame(mvtnorm::rmvnorm(10000, mean = c(0.2,0.2,0.2,0.2), sigma = R4)) test_norm_99_2
Let’s see if everything worked out so far using the example of the first variable of the data set with a inter-item-correlation of 0.25
.
We request the mean and standard deviation for both groups.
mean(test_norm_25_1$V1)
## [1] 0.0072
sd(test_norm_25_1$V1)
## [1] 0.98
mean(test_norm_25_2$V1)
## [1] 0.2
sd(test_norm_25_2$V1)
## [1] 1
Now, we assign a value for a group variable, so that from here on, we can distinguish the respective groups.
$group <- 1
test_norm_25_1$group <- 2
test_norm_25_2$group <- 1
test_norm_50_1$group <- 2
test_norm_50_2$group <- 1
test_norm_80_1$group <- 2
test_norm_80_2$group <- 1
test_norm_99_1$group <- 2 test_norm_99_2
We now specify and estimate a simple factor model where the four variables load on one factor. See the tutorial for the R package lavaan
for further information on model specification and all general questions concerning confirmatory factor analysis.
Please note that I use the command effect.coding
to scale the variance of the latent factor (Little, Slegers, and Card 2006). When left out, lavaan
will use the first indicator variable as the reference indicator and will fix its (non-standardized) loading to 1. However, then the mean of our latent variable will also be fixed to zero in the two groups and we do not want this, especially with regard to group 2 and our general interest in mean differences of latent variables. Effect-coding estimates a solution where the unstandardized factor loadings average to 1 and the intercepts of the manifest variables will average to 0. 14
Furthermore, we want to include a mean structure using the command meanstructure = T
, i.e., estimate means and intercepts for the factor and the variables, respectively.
#Specify factor model
<- 'factor =~ V1+V2+V3+V4
factor_model '
#Esimate model
<- cfa(factor_model, test_norm_25_1, meanstructure=T, effect.coding = T)
fit_model_cor25_group1 <- cfa(factor_model, test_norm_25_2, meanstructure=T, effect.coding = T)
fit_model_cor25_group2 <- cfa(factor_model, test_norm_50_1, meanstructure=T, effect.coding = T)
fit_model_cor50_group1 <- cfa(factor_model, test_norm_50_2, meanstructure=T, effect.coding = T)
fit_model_cor50_group2 <- cfa(factor_model, test_norm_80_1, meanstructure=T, effect.coding = T)
fit_model_cor80_group1 <- cfa(factor_model, test_norm_80_2, meanstructure=T, effect.coding = T)
fit_model_cor80_group2 <- cfa(factor_model, test_norm_99_1, meanstructure=T, effect.coding = T)
fit_model_cor99_group1 <- cfa(factor_model, test_norm_99_2, meanstructure=T, effect.coding = T) fit_model_cor99_group2
We take a brief look at the results of the our estimated model in the two groups with low indicator correlations (\(r = .25\)).
::summary(fit_model_cor25_group1, fit.measures = T, standardized = T) lavaan
## lavaan 0.6-12 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 14
## Number of equality constraints 2
##
## Number of observations 10000
##
## Model Test User Model:
##
## Test statistic 1.332
## Degrees of freedom 2
## P-value (Chi-square) 0.514
##
## Model Test Baseline Model:
##
## Test statistic 3082.349
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.001
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -54958.922
## Loglikelihood unrestricted model (H1) -54958.256
##
## Akaike (AIC) 109941.844
## Bayesian (BIC) 110028.368
## Sample-size adjusted Bayesian (BIC) 109990.234
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.018
## P-value RMSEA <= 0.05 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.002
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## factor =~
## V1 0.973 0.023 41.865 0.000 0.485 0.493
## V2 1.011 0.024 42.785 0.000 0.504 0.505
## V3 1.041 0.024 43.630 0.000 0.519 0.521
## V4 0.975 0.023 41.750 0.000 0.486 0.489
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .V1 -0.001 0.007 -0.075 0.940 -0.001 -0.001
## .V2 0.001 0.007 0.185 0.853 0.001 0.001
## .V3 -0.003 0.007 -0.376 0.707 -0.003 -0.003
## .V4 0.002 0.007 0.261 0.794 0.002 0.002
## factor 0.008 0.007 1.207 0.227 0.016 0.016
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .V1 0.734 0.014 53.151 0.000 0.734 0.757
## .V2 0.744 0.014 51.955 0.000 0.744 0.745
## .V3 0.722 0.014 50.211 0.000 0.722 0.728
## .V4 0.753 0.014 53.534 0.000 0.753 0.761
## factor 0.249 0.006 39.418 0.000 1.000 1.000
::summary(fit_model_cor25_group2, fit.measures = T, standardized = T) lavaan
## lavaan 0.6-12 ended normally after 26 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 14
## Number of equality constraints 2
##
## Number of observations 10000
##
## Model Test User Model:
##
## Test statistic 2.004
## Degrees of freedom 2
## P-value (Chi-square) 0.367
##
## Model Test Baseline Model:
##
## Test statistic 3234.326
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -55319.892
## Loglikelihood unrestricted model (H1) -55318.889
##
## Akaike (AIC) 110663.783
## Bayesian (BIC) 110750.307
## Sample-size adjusted Bayesian (BIC) 110712.173
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.020
## P-value RMSEA <= 0.05 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.003
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## factor =~
## V1 1.011 0.023 44.037 0.000 0.516 0.514
## V2 1.012 0.023 44.009 0.000 0.517 0.513
## V3 1.004 0.023 43.858 0.000 0.513 0.511
## V4 0.973 0.023 42.879 0.000 0.497 0.496
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .V1 0.003 0.009 0.381 0.703 0.003 0.003
## .V2 -0.005 0.009 -0.583 0.560 -0.005 -0.005
## .V3 -0.007 0.009 -0.848 0.396 -0.007 -0.007
## .V4 0.009 0.009 1.048 0.294 0.009 0.009
## factor 0.198 0.007 29.658 0.000 0.389 0.389
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .V1 0.742 0.014 51.661 0.000 0.742 0.736
## .V2 0.749 0.014 51.798 0.000 0.749 0.737
## .V3 0.743 0.014 51.935 0.000 0.743 0.739
## .V4 0.759 0.014 53.486 0.000 0.759 0.754
## factor 0.261 0.007 40.035 0.000 1.000 1.000
Everything looks fine so far.
For the sake of brevity from here on, I will not display the full summary of the estimated parameters. We can also use lavaan::inspect
to extract the necessary information.
The factor loadings in the model for each group with inter-item-correlations of .25
are .5
as expected. That is, the squared standardized loading is the explained variance of the indicator variable by the latent factor.
::inspect(fit_model_cor25_group1, 'std.all')$lambda lavaan
## factor
## V1 0.49
## V2 0.50
## V3 0.52
## V4 0.49
::inspect(fit_model_cor25_group2, 'std.all')$lambda lavaan
## factor
## V1 0.51
## V2 0.51
## V3 0.51
## V4 0.50
The means of the latent factor are close to 0
in the first groups and close to 0.2
in the second group, respectively.
::inspect(fit_model_cor25_group1, 'mean.lv') lavaan
## factor
## 0.008
::inspect(fit_model_cor25_group2, 'mean.lv') lavaan
## factor
## 0.2
We now merge the two data sets for each condition into one data frame.
<- as.data.frame(rbind(test_norm_25_1, test_norm_25_2))
data_frame_cor25 <- as.data.frame(rbind(test_norm_50_1, test_norm_50_2))
data_frame_cor50 <- as.data.frame(rbind(test_norm_80_1, test_norm_80_2))
data_frame_cor80 <- as.data.frame(rbind(test_norm_99_1, test_norm_99_2)) data_frame_cor99
A model is specified and estimated in which we have our basic factor model.
To test the difference between two parameter estimates, i.e., the means of the latent factor, we require a new parameter that estimates said difference.
<- 'factor =~ V1+V2+V3+V4
model_means
#label the means of the latent factor
factor ~ c(mean_1, mean_2)*1
#the := operator defines new parameters
#here, the difference between the two means of the latent factor
diff_mean := mean_2 - mean_1
'
We now estimate the model. Please note that in the code block below, we use equality constraints. As our data is almost perfectly normally-distributed and variables equally correlated in both groups, we would not necessarily need those constraints as parameters estimates will be similar without them, and thus our models are comparable at this point already. For real-world research data, you would need to ensure that there actually is measurement invariance in your groups.15 Aiming for strong measurement invariance at least a) the loadings and b) intercepts would be constrained to equality.
We thus also have equality constraints in the four models that are estimated now.
<- cfa(model_means, #this is the specified model
fit_model_cor25 #this is the data frame
data_frame_cor25, meanstructure=T, #estimates a mean structure
effect.coding = T, #estimates the model using effect coding
group = "group", #estimates the model for the two groups
group.equal = c("loadings", "intercepts")) #equality constraints for loadings and intercepts
<- cfa(model_means, data_frame_cor50, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts"))
fit_model_cor50 <- cfa(model_means, data_frame_cor80, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts"))
fit_model_cor80 <- cfa(model_means, data_frame_cor99, meanstructure=T, effect.coding = T, group = "group", group.equal = c("loadings", "intercepts")) fit_model_cor99
We now take a closer look at the results, starting again with the condition of low inter-item-correlations (R1: \(r = .25\)).
#group1
::inspect(fit_model_cor25, 'mean.lv')$`1` lavaan
## factor
## 0.008
#group 2
::inspect(fit_model_cor25, "mean.lv")$"2" lavaan
## factor
## 0.2
The means of the latent factor are roughly 0
in the first groups and 0.2
in the second group, respectively. The same goes for all other data sets as can be seen in the table below.
Data set | Mean Group 1 | Mean Group 2 |
---|---|---|
R1 - Cor25 | 0.01 | 0.2 |
R2- Cor50 | 0 | 0.2 |
R3 - Cor80 | -0.03 | 0.19 |
R4 - Cor99 | 0.02 | 0.2 |