Note 3 Simulation Example on Structural Equation Modeling (SEM)

# Load required packages
library(tidyverse)
theme_set(theme_classic() +
            theme(panel.grid.major.y = element_line(color = "grey92")))

3.1 Simulate Multivariate Data

In SEM, when multivariate normality is assumed, one can either generate data directly using matrix algebra, or generate the latent variables first before generating the observed variables. The first method is faster, but the second method is more general and can be applied to situations like categorical data or multilevel data. Therefore, in this note we’ll use the second method.

Let’s do a latent growth model (LGM) similar to the one in the note “Simulating Multilevel Data.” Here is the model in a latent growth model representation: \[\begin{bmatrix} y_{0i} \\ y_{1i} \\ y_{2i} \\ y_{3i} \end{bmatrix} = \boldsymbol{\mathbf{\Lambda }} \begin{bmatrix} \eta_{1i} \\ \eta_{2i} \end{bmatrix} + \begin{bmatrix} e_{0i} \\ e_{1i} \\ e_{2i} \\ e_{3i} \end{bmatrix}\] where \(y_{0i}, \ldots, y_{3i}\) are the outcome values for person \(i\) from time 0 to time 3, \(\eta_{1i}\) is the specific intercept for person \(i\), \(\eta_{2i}\) is the specific slope for person \(i\), and \(e_{0i}, \ldots, e_{3i}\) are the within-person level error term. The distributional assumptions are

\[\begin{align*} \begin{bmatrix} \eta_{1i} \\ \eta_{2i} \end{bmatrix} & \sim \mathcal{N}\left(\begin{bmatrix} \alpha_1 \\ \alpha_2 \end{bmatrix}, \begin{bmatrix} \phi_{11} & \phi_{21} \\ \phi_{21} & \phi_{22} \end{bmatrix}\right) \\ \begin{bmatrix} e_{0i} \\ e_{1i} \\ e_{2i} \\ e_{3i} \end{bmatrix} & \sim \mathcal{N}\left(\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \theta_{11} & 0 & 0 & 0 \\ 0 & \theta_{22} & 0 & 0 \\ 0 & 0 & \theta_{33} & 0 \\ 0 & 0 & 0 & \theta_{44} \\ \end{bmatrix}\right) \end{align*}\]

A path diagram is shown below:

growth_model <- "i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
                 s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
                 i ~~ 1 * i
                 s ~~ 0.2 * s + 0.1 * i
                 y1 ~~ 0.5 * y1
                 y2 ~~ 0.5 * y2
                 y3 ~~ 0.5 * y3
                 y4 ~~ 0.5 * y4
                 i ~ 1 * 1
                 s ~ 0.5 * 1"
library(semPlot)
semPaths(semPlotModel_lavaanModel(growth_model))

3.1.1 Using R

Here’s the code for generating the data:

library(mnormt)
set.seed(123)
# Define sample size
N <- 100
# Define the Fixed Parameters
alpha <- c(1, 0.5)  # latent means
Phi <- matrix(c(1, 0.1, 
                0.1, 0.2), nrow = 2)  # latent variances/covariances
Lambda <- cbind(c(1, 1, 1, 1), 
                c(0, 1, 2, 3))  # factor loadings
Theta <- diag(0.5, nrow = 4)
# Generate latent factor scores
eta <- rmnorm(N, mean = alpha, varcov = Phi)
# Generate residuals:
e <- rmnorm(N, varcov = Theta)
# Compute outcome scores: y_i = t(Lambda %*% eta_i) + e
y <- tcrossprod(eta, Lambda) + e
# Make it a data frame
colnames(y) <- paste0("y", 1:4)
df <- as.data.frame(y)

If you want to check whether the simulated data is correct, generate with a large sample size, and check the means and covariances:

N_test <- 1e5
eta_test <- rmnorm(N_test, mean = alpha, varcov = Phi)
colMeans(eta_test)
># [1] 1.0016925 0.5021968
cov(eta_test)
>#          [,1]      [,2]
># [1,] 1.001489 0.1000130
># [2,] 0.100013 0.2011578
e_test <- rmnorm(N_test, varcov = Theta)
colMeans(e_test)
># [1]  0.0008598494  0.0025900286 -0.0051191571  0.0011246080
cov(e_test)
>#               [,1]          [,2]          [,3]          [,4]
># [1,]  0.4995076410 -0.0007194104  0.0010100393 -0.0007624743
># [2,] -0.0007194104  0.4993700934 -0.0016984945  0.0024070969
># [3,]  0.0010100393 -0.0016984945  0.4965962722  0.0003983313
># [4,] -0.0007624743  0.0024070969  0.0003983313  0.5027934517

It’s handy to wrap this as a function:

gen_lgm_data <- function(N, alpha, Phi, Lambda, Theta) {
  # Generate latent factor scores
  eta <- rmnorm(N, mean = alpha, varcov = Phi)
  # Generate residuals:
  e <- rmnorm(N, varcov = Theta)
  # Compute outcome scores
  y <- tcrossprod(eta, Lambda) + e
  colnames(y) <- paste0("y", 1:4)
  # Make it a data frame
  as.data.frame(y)
}
# Test it:
set.seed(123)
gen_lgm_data(100, 
             alpha = c(1, 0.5), 
             Phi = matrix(c(1, 0.1, 
                            0.1, 0.2), nrow = 2), 
             Lambda = cbind(c(1, 1, 1, 1), 
                            c(0, 1, 2, 3)), 
             Theta = diag(0.5, nrow = 4)) %>% 
  head() # shows only the first six cases
>#            y1         y2          y3        y4
># 1 1.994318062 1.71116086  0.93927927 1.8544817
># 2 2.265725727 2.90855563  3.37429135 4.1980646
># 3 2.296655600 2.35159235  3.73462339 5.0831256
># 4 2.332408257 1.09066818  0.74843609 2.6298055
># 5 0.001197646 0.03891746 -0.08691963 0.1158814
># 6 1.818221258 3.44031115  4.56735772 5.0621645

Indeed, you get the same data!

3.1.2 Other methods for generating SEM data

Many SEM software or packages have capability in generating data with input of an SEM model. For example, in R, you can call Mplus using the MplusAutomation package and use their MONTECARLO routine. In R, you can generate SEM data using the lavaan package with the simulateData() function, like the following example:

# Using a previously defined SEM model:
lavaan::simulateData(growth_model) %>% 
  head() # shows only the first six cases
>#          y1         y2        y3         y4
># 1 0.7164342  0.5568492 0.4726109 -0.1122304
># 2 1.8000324  1.8018441 1.5462532  2.4277768
># 3 0.6034256  0.6713423 2.2884345  3.1191797
># 4 1.1521190  2.7229104 4.7980347  5.8445554
># 5 0.6229306 -0.1175073 1.7054730  0.5659889
># 6 0.5230306  2.1871287 2.6141488  2.6783935

Personally, however, I prefer directly simulating data in R because

  • it forces you to specify everything in the model in the way you want. Especially in Mplus there are a lot of hidden default settings that may mess up with your simulation;
  • it makes the process of generating data more transparent;
  • it helps you learn the math behind the model;
  • it is more flexible as you can specify any distributional assumptions or models not supported by the SEM packages.

3.2 Analyzing the Simulated Data

In R, for running SEM models, the most common options are lavaan, OpenMx, and Mplus (via MplusAutomation). When possible, I’ll stick to lavaan to avoid jumping between programs, so let’s analyze the simulated data twice, first with the true model and second with a misspecified model where the random slope term is omitted (i.e., the variance of s is constrained to zero).

library(lavaan)
># This is lavaan 0.6-3
># lavaan is BETA software! Please report any bugs.
# True model
m1 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       i ~~ s'
# Model without random slopes
m2 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       s ~~ 0*i + 0*s'
m1_fit <- growth(m1, data = df)
m2_fit <- growth(m2, data = df)

Here’s some examples of things you can extract (without showing all the results)

fitMeasures(m1_fit)  # return more than 30 fit indices
# Specific fit indices
fitMeasures(m1_fit, c("chisq", "df", "pvalue", "cfi", "rmsea"))
coef(m1_fit)  # parameter estimates
vcov(m1_fit)  # asymptotic covariance matrix of parameters
# Asymptotic standard errors of latent mean of s
sqrt(diag(vcov(m1_fit))["s~1"])
parameterEstimates(m1_fit, standardized = FALSE)  # coefficients, SE, CI
modificationIndices(m1_fit, minimum.value = 3.84)  # modification indices
# Bootstrapped fit index
bootstrapLavaan(m1_fit, R = 1000L, type = "bollen.stine",
                FUN = fitMeasures, fit.measures = c("cfi"))

3.3 Full Example of a Small Scale Simulation

In a methodological experiment with Monte Carlo simulation, one usually generates millions of data sets across tens or hundreds of carefully chosen conditions. As an example, here is a small scale simulation study on LGM. The two goals are: (a) to understand the bias on the mean of slopes and its standard error estimates, and (b) to illustrate the difference between estimated and empirical standard error.

For simplicity, I’ll only choose three designed factors (i.e., manipulated independent variables), namely sample size, variance of the random slope in the data generating model, and the mean of the slopes. The design factors are summarized here:

  • Sample size (N): 50, 100, 200
  • Variance of slopes (\(\phi_{22}\)): 0.1, 0.5 (i.e., 1/10 and 1/2 of the intercept variance)
  • Mean of slopes (\(\alpha_2\)): 1, 0.5

Therefore, it’s a 3 \(\times\) 2 \(\times\) 2 factorial design. Now let’s walk through each component.

3.3.1 Fixed Values for the Study

In the study I decided to run 500 replications for each condition. I can estimate the simulation error to see whether it’s enough.

set.seed(515)  # set the seed for reproducibility
NREP <- 500  # number of replications
# Fixed parameters (all caps)
ALPHA1 <- 1  # latent mean of intercepts
PHI11 <- 1  # intercept variance
LAMBDA <- cbind(c(1, 1, 1, 1), 
                c(0, 1, 2, 3))  # factor loadings
THETA <- diag(0.5, nrow = 4)  # residual variances

We should also define everything that will not change across replications and across conditions, including syntax for running growth models, functions to generate data, etc.

# Function for generating data:
gen_lgm_data <- function(N, alpha, Phi, Lambda, Theta) {
  # Generate latent factor scores
  eta <- rmnorm(N, mean = alpha, varcov = Phi)
  # Generate residuals:
  e <- rmnorm(N, varcov = Theta)
  # Compute outcome scores
  y <- tcrossprod(eta, Lambda) + e
  colnames(y) <- paste0("y", 1:4)
  # Make it a data frame
  as.data.frame(y)
}
# lavaan syntax
# True model
m1 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       i ~~ s'
# Model without random slopes
m2 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       s ~~ 0*i + 0*s'

3.3.2 Keep track of simulation conditions

I’ll create a data frame to store the design factors:

# Design factors:
DESIGNFACTOR <- expand.grid(
  N = c(50, 100, 200), 
  phi22 = c(0.1, 0.5), 
  alpha2 = c(1, 0.5)
)
# Add condition number:
DESIGNFACTOR <- rowid_to_column(DESIGNFACTOR, "cond")
DESIGNFACTOR
>#    cond   N phi22 alpha2
># 1     1  50   0.1    1.0
># 2     2 100   0.1    1.0
># 3     3 200   0.1    1.0
># 4     4  50   0.5    1.0
># 5     5 100   0.5    1.0
># 6     6 200   0.5    1.0
># 7     7  50   0.1    0.5
># 8     8 100   0.1    0.5
># 9     9 200   0.1    0.5
># 10   10  50   0.5    0.5
># 11   11 100   0.5    0.5
># 12   12 200   0.5    0.5

An additional benefit is that I only need to loop over one dimension (condition number) instead of writing three loops, one for each factor.

3.3.3 Writing a function to conduct the simulation

While this step is not necessary, it’s recommended to wrap your steps of generating data and extracting results into one big function, so that the code is easier to read, like the runsim() function below:

runsim <- function(to_run,  # conditions to run
                   nrep,  # number of replications
                   alpha1 = ALPHA1,  # latent mean of intercepts
                   phi11 = PHI11,  # intercept variance
                   Lambda = LAMBDA,  # factor loadings
                   Theta = THETA,  # residual variances
                   designfactors = DESIGNFACTOR) {
  # Extract design parameters for the given condition
  N <- designfactors[to_run, "N"]
  phi22 <- designfactors[to_run, "phi22"]
  alpha2 <- designfactors[to_run, "alpha2"]
  # Put the values back to the matrix
  alpha <- c(alpha1, alpha2)
  Phi <- matrix(c(phi11, phi22 / 2,
                  phi22 / 2, phi22), nrow = 2)
  # Initialize place holders for the results
  rep_result <- vector("list", nrep)
  for (i in seq_len(nrep)) {
    # Generate data
    df <- gen_lgm_data(N, alpha, Phi, Lambda, Theta)
    # Run model 1
    m1_fit <- growth(m1, data = df)
    # Run model 2
    m2_fit <- growth(m2, data = df)
    # Save results 
    rep_result[[i]] <- list(m1_fit = m1_fit, 
                            m2_fit = m2_fit)
  }
  # Return results
  return(rep_result)
}

We can do a trial run:

# Run condition 1 with 2 replications
sim_results <- runsim(1, 2)
># Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
>#                 is not positive definite;
>#                 use lavInspect(fit, "cov.lv") to investigate.
# Examine the results:
sim_results[[1]]
># $m1_fit
># lavaan 0.6-3 ended normally after 31 iterations
># 
>#   Optimization method                           NLMINB
>#   Number of free parameters                          9
># 
>#   Number of observations                            50
># 
>#   Estimator                                         ML
>#   Model Fit Test Statistic                       2.093
>#   Degrees of freedom                                 5
>#   P-value (Chi-square)                           0.836
># 
># $m2_fit
># lavaan 0.6-3 ended normally after 21 iterations
># 
>#   Optimization method                           NLMINB
>#   Number of free parameters                          7
># 
>#   Number of observations                            50
># 
>#   Estimator                                         ML
>#   Model Fit Test Statistic                      17.169
>#   Degrees of freedom                                 7
>#   P-value (Chi-square)                           0.016

3.3.4 Running the simulation

This step is easy. You can use a for loop, or the purrr::map() function

sim_results <- 
  map(seq_len(nrow(DESIGNFACTOR)), 
      ~ runsim(.x, nrep = NREP))

3.3.5 Extract the results

Right now I’m saving everything for each step. I’ll need a function to extract the relevant output, which are

  • Estimated coefficients of \(\alpha_2\)
  • Estimated standard errors of \(\alpha_2\)

It’s usually helpful to start extract results on one replication first:

# Try on one replication
with(sim_results[[1]][[1]], 
     tibble(coef(m1_fit)["s~1"], 
            sqrt(vcov(m1_fit)["s~1", "s~1"]), 
            coef(m2_fit)["s~1"], 
            sqrt(vcov(m2_fit)["s~1", "s~1"])))
# Now, wrap it as a function:
extract_coef <- function(res) {
  out <- with(res, 
              tibble(coef(m1_fit)["s~1"], 
                     sqrt(vcov(m1_fit)["s~1", "s~1"]), 
                     coef(m2_fit)["s~1"], 
                     sqrt(vcov(m2_fit)["s~1", "s~1"])))
  # Add names:
  names(out) <- c("m1_est", "m1_se", "m2_est", "m2_se")
  out
}
extract_coef(sim_results[[1]][[1]])

Now, we can do it on all replications and all conditions:

library(tidyverse)
sim_outdata <- sim_results %>% 
  # Iterate across conditions
  map_dfr(
    # Iterate across replications
    ~ map_dfr(.x, extract_coef, .id = "rep"), 
    .id = "cond") %>% 
  # Make `cond` a integer variable
  mutate(rep = as.integer(rep), 
         cond = as.integer(cond)) %>% 
  # Add design factors
  left_join(DESIGNFACTOR)
# Save the results
write_csv(sim_outdata, "example_sem_results.csv")

3.3.6 Summarize the Results

Finally, let’s summarize the results in terms of the standardized bias and the relative standard error bias.

sim_outdata <- sim_outdata %>% 
  # Make it long format
  gather("var", "val", m1_est:m2_se) %>% 
  separate(var, c("model", "var")) %>% 
  spread(var, val) %>% 
  # (Optional) Rename the levels of design factors to make it better looking in 
  # the plots
  mutate(model = factor(model, labels = c("Correctly specified", 
                                          "Misspecified")), 
         alpha2_lab = as_factor(paste0("alpha[2] == ", alpha2)), 
         phi22_lab = as_factor(paste0("phi[22] == ", phi22)), 
         N_lab = as_factor(paste0("italic(N) == ", N)))
sim_sum <- sim_outdata %>% 
  # Summarize results by conditions
  group_by(model, alpha2, phi22, N, alpha2_lab, phi22_lab, N_lab) %>% 
  summarise(ave_est = mean(est), 
            emp_sd = sd(est), 
            ave_se = mean(se)) %>% 
  # Compute standardized bias and relative SE bias
  mutate(bias = ave_est - alpha2, 
         bias_mcse = emp_sd / sqrt(NREP), 
         std_bias = bias / emp_sd, 
         rse_bias = (ave_se - emp_sd) / emp_sd) %>% 
  ungroup()

With relatively small number of conditions, one can present the results in a table (and it’s handy in R):

sim_sum %>% 
  select(model:N, 
         Bias = bias, `Monte Carlo Error` = bias_mcse, 
         `Standardized Bias` = std_bias, 
         `Relative SE Error` = rse_bias) %>% 
  knitr::kable(digits = 3L)
model alpha2 phi22 N Bias Monte Carlo Error Standardized Bias Relative SE Error
Correctly specified 0.5 0.1 50 0.006 0.003 0.101 -0.054
Correctly specified 0.5 0.1 100 -0.001 0.002 -0.024 -0.009
Correctly specified 0.5 0.1 200 0.002 0.001 0.059 0.017
Correctly specified 0.5 0.5 50 0.002 0.005 0.015 -0.014
Correctly specified 0.5 0.5 100 -0.001 0.003 -0.012 0.020
Correctly specified 0.5 0.5 200 -0.002 0.002 -0.045 0.008
Correctly specified 1.0 0.1 50 -0.001 0.003 -0.019 -0.076
Correctly specified 1.0 0.1 100 0.002 0.002 0.038 -0.031
Correctly specified 1.0 0.1 200 -0.001 0.001 -0.045 -0.016
Correctly specified 1.0 0.5 50 0.001 0.005 0.013 -0.031
Correctly specified 1.0 0.5 100 0.002 0.004 0.023 -0.082
Correctly specified 1.0 0.5 200 0.002 0.003 0.043 -0.039
Misspecified 0.5 0.1 50 0.007 0.003 0.113 -0.161
Misspecified 0.5 0.1 100 -0.002 0.002 -0.040 -0.132
Misspecified 0.5 0.1 200 0.002 0.001 0.056 -0.112
Misspecified 0.5 0.5 50 0.001 0.005 0.013 -0.249
Misspecified 0.5 0.5 100 -0.002 0.003 -0.030 -0.238
Misspecified 0.5 0.5 200 -0.002 0.002 -0.033 -0.251
Misspecified 1.0 0.1 50 -0.002 0.003 -0.025 -0.186
Misspecified 1.0 0.1 100 0.002 0.002 0.037 -0.155
Misspecified 1.0 0.1 200 -0.001 0.001 -0.043 -0.136
Misspecified 1.0 0.5 50 0.002 0.005 0.018 -0.275
Misspecified 1.0 0.5 100 0.002 0.004 0.028 -0.307
Misspecified 1.0 0.5 200 0.002 0.003 0.035 -0.286

It is, however, recommended you try to plot the results, both for exploratory purpose and for better presentation of the results.

# Summarize estimates
sim_outdata %>% 
  ggplot(aes(x = factor(N), y = est, color = model)) + 
  geom_boxplot() + 
  geom_hline(aes(yintercept = alpha2)) + 
  facet_grid(alpha2_lab ~ phi22_lab, labeller = label_parsed) + 
  labs(x = "Sample Size (N)", y = "Estimates of Mean Slope")

# Standardized bias
sim_sum %>% 
  ggplot(aes(x = factor(N), y = std_bias, color = model)) + 
  geom_line(aes(group = model)) + 
  geom_hline(aes(yintercept = 0)) + 
  facet_grid(alpha2_lab ~ phi22_lab, labeller = label_parsed) + 
  labs(x = "Sample Size (N)", y = "Standardized Bias")

# Relative SE bias
sim_sum %>% 
  ggplot(aes(x = factor(N), y = rse_bias, color = model)) + 
  geom_line(aes(group = model)) + 
  geom_hline(aes(yintercept = 0)) + 
  facet_grid(alpha2_lab ~ phi22_lab, labeller = label_parsed) + 
  labs(x = "Sample Size (N)", y = "Relative Standard Error Bias")

3.4 Using the SimDesign package

Recently, the SimDesign package was developed so that designing and running simulation studies can be more structured and organized. The package also provides some great features such as parallel computing, fail-safe stopping, gathering of error or warning messages, among others. Check out the paper by Sigal & Chalmers (2016) as well as the package vignettes (https://cran.r-project.org/web/packages/SimDesign/index.html) for more information. Below is the R code for running the latent growth example with their package. Try it yourself!

library(tidyverse)
library(SimDesign)
library(mnormt)
library(lavaan)

# Design factors:
DESIGNFACTOR <- expand.grid(
  N = c(50, 100, 200), 
  phi22 = c(0.1, 0.5), 
  alpha2 = c(0, 0.5)
)
# Add condition number:
DESIGNFACTOR <- rowid_to_column(DESIGNFACTOR, "cond")
DESIGNFACTOR

# Function for generating data:
gen_lgm_data <- function(condition, fixed_objects = NULL) {
  N <- condition$N
  phi22 <- condition$phi22
  alpha2 <- condition$alpha2
  alpha <- c(1, alpha2)
  Phi <- matrix(c(1, phi22 / 2,
                  phi22 / 2, phi22), nrow = 2)
  Lambda <- cbind(c(1, 1, 1, 1), 
                  c(0, 1, 2, 3))
  Theta <- diag(0.5, nrow = 4)
  # Generate latent factor scores
  eta <- rmnorm(N, mean = alpha, varcov = Phi)
  # Generate residuals:
  e <- rmnorm(N, varcov = Theta)
  # Compute outcome scores
  y <- tcrossprod(eta, Lambda) + e
  colnames(y) <- paste0("y", 1:4)
  # Make it a data frame
  as.data.frame(y)
}
# Test: generate data
# test_df <- gen_lgm_data(DESIGNFACTOR[1, ])

# lavaan syntax (to be passed through `fixed_objects`)
# True model
M1 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       i ~~ s'
# Model without random slopes
M2 <- 'i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
       s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
       s ~~ 0*i + 0*s'

# Analysis function
run_lgm <- function(condition, dat, fixed_objects = NULL) {
  # Run model 1
  m1_fit <- growth(fixed_objects$M1, data = dat)
  # Run model 2
  m2_fit <- growth(fixed_objects$M2, data = dat)
  # Extract parameter estimates and standard errors
  ret <- c(coef(m1_fit)["s~1"], 
           sqrt(vcov(m1_fit)["s~1", "s~1"]),
           coef(m2_fit)["s~1"], 
           sqrt(vcov(m2_fit)["s~1", "s~1"]))
  names(ret) <- c("m1_est", "m1_se", "m2_est", "m2_se")
  ret
}
# Test: run analyses for simulated data
# run_lgm(dat = test_df)

# Helper function for computing relative SE bias
rse_bias <- function(est_se, est) {
  est_se <- as.matrix(est_se)
  est <- as.matrix(est)
  est_se <- colMeans(est_se)
  emp_sd <- apply(est, 2L, sd)
  est_se / emp_sd - 1
}

# Evaluative function
evaluate_lgm <- function(condition, results, fixed_objects = NULL) {
  alpha2 <- condition$alpha2
  c(bias = bias(results[ , c(1, 3)], parameter = alpha2), 
    std_bias = bias(results[ , c(1, 3)], parameter = alpha2, 
                    type = "standardized"), 
    rmse = RMSE(results[ , c(1, 3)], parameter = alpha2), 
    rse_bias = rse_bias(results[ , c(2, 4)], results[ , c(1, 3)])
  )
}

# Put all together
sim_results <- runSimulation(DESIGNFACTOR, 
                             replications = 500, 
                             generate = gen_lgm_data, 
                             analyse = run_lgm, 
                             summarise = evaluate_lgm, 
                             fixed_objects = 
                               list(M1 = M1, M2 = M2))

# With parallel processing (4 cores in this example; uncomment the following to
# run)
# sim_results <- runSimulation(DESIGNFACTOR, 
#                              replications = 500, 
#                              generate = gen_lgm_data, 
#                              analyse = run_lgm, 
#                              summarise = evaluate_lgm, 
#                              packages = c("mnormt", "lavaan"), 
#                              parallel = TRUE, 
#                              ncores = 4L, 
#                              fixed_objects = 
#                                list(M1 = M1, M2 = M2))

3.5 Exercise

  1. From the simulation results, evaluate the relative efficiency of the estimated average slope (i.e., \(\alpha_2\)) under model 2 relative to that under model 1.