Chapter 6 Structural Equation Modeling
In this chapter, we will extend our statistical understandings regarding correlation and regression to the concept of Structural Equation Modeling (SEM).
SEM comprises a diverse set of analysis techniques, including confirmatory factor analysis and path analysis, each of which lend to the testing of theoretical models. Researchers who utilize SEM might review literature related to the constructs of interest (i.e., unobservable phenomena), and then hypothesize specific pathways within a plausible model through a set of regression equations and factor analyses, thus allowing for efficient exploration of complex relationships within and between psychological constructs. The variables within any model might include both measured variables and latent variables, or hypothetical constructs that cannot be directly observed (e.g., intelligence).
SEM utilizes maximum likelihood estimation (MLE), a statistical method which determines values for model parameters. Specifically, MLE maximizes the likelihood that the observed data will fit a theoretical model over other potential data distributions. SEM is straightforward in R Studio, but it is helpful to first build confidence in how you interact with R studio as a user.
6.1 SEM Syntax
SEM differs from multiple regression in that the individual items are allowed to contribute to the overall construct in an unequal manner, whereby in regression items are summed or averaged, thus limiting potential effects to aggregate scores of observable measures. Because of this level of specificity, latent variables can be removed in the event that items do not load appropriately on a construct. Simple regressions, as well as multiple regression are simultaneously estimated within this model. Additionally, SEM affords the calculation of both direct and indirect effects.
Like in baseR
, lavaan
uses tildes the regression operator ~
, read
as ‘is regressed on’. The dependent variable is called prior to the
tilde with independent variables called after the tilde. A typical path
model is defined by a set of regression formulas.
~ independent1 + independent2 + independent3 dependent1
## dependent1 ~ independent1 + independent2 + independent3
~ independent1 + independent2 dependent2
## dependent2 ~ independent1 + independent2
Latent variables are called with =~
which can be read as “is measured
by.”
=~ item1 + item2 + item3
latent1 =~ item4 + item5 + item6 latent2
Use double tildes ~~
to specify variances and covariances. This can be
read as “is correlated with.”
~~ observed1 (variance)
observed1 ~~ observed2 (covariance) observed1
Directly measure variable intercepts with the tilde ~
and the numeric
value 1.
~ 1
y1 ~ 1 f1
A specified Full SEM might look like this. This code will produce a model syntax object, called model that can be called later to attain model estimates and produce visualizations.
<- ' # regressions
model targetvariable ~ construct1 + construct2 + observed1 + observed2
construct1 ~ construct2 + construct3
construct2 ~ construct3 + observed1 + observed2
# latent variable definitions
construct1 =~ item1 + item2 + item3
construct2 =~ item4 + item5 + item6
construct3 =~ item7 + item8 + item9 + item10
# variances and covariances
item1 ~~ item1
item1 ~~ item2
construct1 ~~ construct2
# intercepts
item1 ~ 1
construct1 ~ 1
'
6.1.1 Running, Interpreting, and Visualizing Models
6.1.1.1 CFA
Confirmatory Factor Analysis estimates models comprised of latent
variables that constitute the measurement model, which assesses how
several paths might represent a theoretical assumption about the data.
The measurement model only requires the =~
sign to be used unless you
are setting latent constructs as correlates in your CFA.
#### THIS CHUNK IS FOR TROUBLESHOOTING SEMPLOT INSTALLATION
#options(pkgType="binary")
#devtools::install_version("latticeExtra", version="0.6-28")
#install.packages('backports')
#install.packages("Hmisc")
#library(Hmisc)
Model fit is assessed model using χ2, the Comparative Fit Index (CFI), and the Root Mean Square Error of Approximation (RMSEA). Each of these are examples of fit statistics, which are used to assess the extent to which a statistical model fits a set of observations. Fit statistics can also be used to compare sets of models against thresholds of good fit. For instance, a model with good fit should have a nonsignificant χ2 value, though this fit statistic is sensitive to larger sample sizes. Thus, χ2 is rarely used as a standalone measurement of model fit. The CFI compares the theoretical model to a null model and is considered to be demonstrative of good fit at or above 0.95, though models with a CFI above 0.90 are deemed to have acceptable fit. RMSEA is sensitive to parsimony of the model. In contrast with the CFI, a RMSEA value of 0.05 or below demonstrates good fit, however values below 0.1 are still considered acceptable.
We will be drawing from the lavaan
package documentation for these
explanatory examples. lavaan
comes with three preinstalled datasets,
the first of which is the “classic Holzinger and Swineford (1939)
dataset [which] consists of mental ability test scores of seventh- and
eighth-grade children from two different schools” (Pasteur and
Grant-White). Demographic variables include an index, gender, age in
years, age in months, school attended and grade level. Measured
variables included assessments of visual perception, using cubes,
lozenges, paragraph comprehension, sentence completion, word meaning,
speeded addition, speeded counting of dots, and speeded text
discrimination.
The authors hypothesized that these measured items might represent the
latent constructs of visual, textual, and speed-related ability in
educational contexts. The code below includes a specified model, a
summary of the fit model, and a visualization of the model using the
semPlot
package.
#install.packages('lavaan')
#install.packages('semPlot')
library(lavaan)
library(semPlot)
# specify the model
<- ' visual ~ textual + speed
HS.model
visual =~ x1 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual~~textual
textual~~speed
visual~~speed'
# fit the model
<- cfa(HS.model, data = HolzingerSwineford1939) fit
## Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, : lavaan WARNING:
## Could not compute standard errors! The information matrix could
## not be inverted. This may be a symptom that the model is not
## identified.
# display summary output
summary(fit, fit.measures=TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 31 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 66.892
## Degrees of freedom 15
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 861.843
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.938
## Tucker-Lewis Index (TLI) 0.884
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3281.274
## Loglikelihood unrestricted model (H1) -3247.828
##
## Akaike (AIC) 6604.548
## Bayesian (BIC) 6682.397
## Sample-size adjusted Bayesian (BIC) 6615.797
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.107
## 90 Percent confidence interval - lower 0.082
## 90 Percent confidence interval - upper 0.134
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.060
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x1 1.000 1.012 0.868
## x3 0.566 NA 0.573 0.508
## textual =~
## x4 1.000 0.991 0.853
## x5 1.112 NA 1.102 0.855
## x6 0.923 NA 0.915 0.837
## speed =~
## x7 1.000 0.630 0.579
## x8 1.174 NA 0.739 0.731
## x9 1.042 NA 0.657 0.652
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual ~
## textual 0.230 NA 0.225 0.225
## speed 0.217 NA 0.135 0.135
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .visual ~~
## textual 0.186 NA 0.212 0.212
## textual ~~
## speed 0.174 NA 0.279 0.279
## .visual ~~
## speed 0.148 NA 0.264 0.264
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.335 NA 0.335 0.246
## .x3 0.946 NA 0.946 0.742
## .x4 0.369 NA 0.369 0.273
## .x5 0.446 NA 0.446 0.269
## .x6 0.359 NA 0.359 0.300
## .x7 0.786 NA 0.786 0.664
## .x8 0.475 NA 0.475 0.465
## .x9 0.584 NA 0.584 0.575
## .visual 0.786 NA 0.768 0.768
## textual 0.982 NA 1.000 1.000
## speed 0.397 NA 1.000 1.000
#visualize output with standard estimates
semPaths(fit, "std", edge.label.cex = 1.0, curvePivot = FALSE, rotation = 3)
semPaths(fit, what = "path", whatLabels = "std", style = "lisrel", edge.label.cex = .9, rotation = 2, curve = 2, layoutSplit = FALSE, normalize = FALSE, height = 9, width = 6.5, residScale = 10)
The output indicates a satisfactory fitting model despite the significant chi-square result (CFI = .93, RMSEA = .092, p = .001). Visual factors x1 and x3 loaded as expected, while x2 is approaching the lower bound for acceptable factor loading. As no items were non-significant, we might consider removing this item in a subsequent analysis to improve overall model fit. The textual and speed factor loadings were each satisfactory with the textual model holding strong consistency across each measured item (all loaded onto factor with a standardized estimate of .85 or .84). Covariances were satisfactory for this model (visual to textual is .46, visual to speed is .47, textual to speed is .28).
When we remove x2 from the model the CFI is improved at the expense of
the RMSEA value. As there are no other benefits to removing this item,
it should be retained. This model can be visualized using the
semPlot()
package. The semPath()
function in particular is what we
use to plot our SEM diagram. The semPaths()
function takes the
following arguments. For a description of each, see
https://www.rdocumentation.org/packages/semPlot/versions/1.1.2/topics/semPaths
semPaths(object, what = "paths", whatLabels, style, layout = "tree",
intercepts = TRUE, residuals = TRUE, thresholds = TRUE, intStyle = "multi",
rotation = 1, curve, curvature = 1, nCharNodes = 3, nCharEdges = 3, sizeMan = 5,
sizeLat = 8, sizeInt = 2, sizeMan2, sizeLat2, sizeInt2, shapeMan, shapeLat,
shapeInt = "triangle", ask, mar, title, title.color = "black", title.adj = 0.1,
title.line = -1, title.cex = 0.8, include, combineGroups = FALSE, manifests,
gui = FALSE, allVars = FALSE, edge.color,
latents, groups, color, residScale, reorder = TRUE, structural = FALSE, ThreshAtSide = FALSE, thresholdColor,
thresholdSize = 0.5, fixedStyle = 2, freeStyle = 1,
as.expression = character(0), optimizeLatRes = FALSE, inheritColor = TRUE,
pastel = FALSE, rainbowStart = 0, intAtSide,
levels, nodeLabels, edgeLabels, springLevels = FALSE, nDigits = 2, exoVar, exoCov = TRUE, centerLevels = TRUE,
panelGroups = FALSE, layoutSplit = FALSE, measurementLayout = "tree", subScale,
subRes = 4, subLinks, modelOpts = list(mplusStd = "std"),
subScale2, curveAdjacent = '<->', edge.label.cex = 0.6, cardinal = "none",
equalizeManifests = FALSE, covAtResiduals = TRUE, bifactor, optimPoints = 1:8 * (pi/4),
...)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
6.1.1.2 Full SEM
We will be using another lavaan
dataset for a Full SEM example. The
PoliticalDemocracy datasets includes 75 observations of 11 variables
that measure political democracy and industrialization in developing
countries. Variables are labeled with the following descriptions:
Political Democracy in 1960
y1: Expert ratings of the freedom of the press in 1960
y2: The freedom of political opposition in 1960
y3: The fairness of elections in 1960
y4: The effectiveness of the elected legislature in 1960
Political Democracy in 1965
y5: Expert ratings of the freedom of the press in 1965
y6: The freedom of political opposition in 1965
y7: The fairness of elections in 1965
y8: The effectiveness of the elected legislature in 1965
Level of Industrialization in 1960
x1: The gross national product (GNP) per capita in 1960
x2: The inanimate energy consumption per capita in 1960
x3: The percentage of the labor force in industry in 1960
The bold headers represent latent, unobservable variables that are quantified by the observable bullet points. Take a quick look at the dataset using any of the methods covered thus far.
str(PoliticalDemocracy)
## 'data.frame': 75 obs. of 11 variables:
## $ y1: num 2.5 1.25 7.5 8.9 10 7.5 7.5 7.5 2.5 10 ...
## $ y2: num 0 0 8.8 8.8 3.33 ...
## $ y3: num 3.33 3.33 10 10 10 ...
## $ y4: num 0 0 9.2 9.2 6.67 ...
## $ y5: num 1.25 6.25 8.75 8.91 7.5 ...
## $ y6: num 0 1.1 8.09 8.13 3.33 ...
## $ y7: num 3.73 6.67 10 10 10 ...
## $ y8: num 3.333 0.737 8.212 4.615 6.667 ...
## $ x1: num 4.44 5.38 5.96 6.29 5.86 ...
## $ x2: num 3.64 5.06 6.26 7.57 6.82 ...
## $ x3: num 2.56 3.57 5.22 6.27 4.57 ...
::describe(PoliticalDemocracy) psych
## vars n mean sd median trimmed mad min max range skew kurtosis se
## y1 1 75 5.46 2.62 5.40 5.46 3.11 1.25 10.00 8.75 -0.09 -1.15 0.30
## y2 2 75 4.26 3.95 3.33 4.09 4.94 0.00 10.00 10.00 0.32 -1.47 0.46
## y3 3 75 6.56 3.28 6.67 6.92 4.94 0.00 10.00 10.00 -0.59 -0.72 0.38
## y4 4 75 4.45 3.35 3.33 4.33 4.94 0.00 10.00 10.00 0.12 -1.21 0.39
## y5 5 75 5.14 2.61 5.00 5.23 3.71 0.00 10.00 10.00 -0.23 -0.78 0.30
## y6 6 75 2.98 3.37 2.23 2.51 3.31 0.00 10.00 10.00 0.89 -0.47 0.39
## y7 7 75 6.20 3.29 6.67 6.47 4.94 0.00 10.00 10.00 -0.55 -0.73 0.38
## y8 8 75 4.04 3.25 3.33 3.82 4.40 0.00 10.00 10.00 0.45 -0.96 0.37
## x1 9 75 5.05 0.73 5.08 5.03 0.82 3.78 6.74 2.95 0.25 -0.75 0.08
## x2 10 75 4.79 1.51 4.96 4.85 1.53 1.39 7.87 6.49 -0.35 -0.57 0.17
## x3 11 75 3.56 1.41 3.57 3.53 1.51 1.00 6.42 5.42 0.08 -0.94 0.16
A model is specified below. Run, interpret, and visualize the model.
<- '
model # measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
<- sem(model, data=PoliticalDemocracy)
fit summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 68 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 31
##
## Number of observations 75
##
## Model Test User Model:
##
## Test statistic 38.125
## Degrees of freedom 35
## P-value (Chi-square) 0.329
##
## Model Test Baseline Model:
##
## Test statistic 730.654
## Degrees of freedom 55
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.995
## Tucker-Lewis Index (TLI) 0.993
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1547.791
## Loglikelihood unrestricted model (H1) -1528.728
##
## Akaike (AIC) 3157.582
## Bayesian (BIC) 3229.424
## Sample-size adjusted Bayesian (BIC) 3131.720
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.035
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.092
## P-value RMSEA <= 0.05 0.611
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.044
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ind60 =~
## x1 1.000 0.670 0.920
## x2 2.180 0.139 15.742 0.000 1.460 0.973
## x3 1.819 0.152 11.967 0.000 1.218 0.872
## dem60 =~
## y1 1.000 2.223 0.850
## y2 1.257 0.182 6.889 0.000 2.794 0.717
## y3 1.058 0.151 6.987 0.000 2.351 0.722
## y4 1.265 0.145 8.722 0.000 2.812 0.846
## dem65 =~
## y5 1.000 2.103 0.808
## y6 1.186 0.169 7.024 0.000 2.493 0.746
## y7 1.280 0.160 8.002 0.000 2.691 0.824
## y8 1.266 0.158 8.007 0.000 2.662 0.828
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dem60 ~
## ind60 1.483 0.399 3.715 0.000 0.447 0.447
## dem65 ~
## ind60 0.572 0.221 2.586 0.010 0.182 0.182
## dem60 0.837 0.098 8.514 0.000 0.885 0.885
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .y1 ~~
## .y5 0.624 0.358 1.741 0.082 0.624 0.296
## .y2 ~~
## .y4 1.313 0.702 1.871 0.061 1.313 0.273
## .y6 2.153 0.734 2.934 0.003 2.153 0.356
## .y3 ~~
## .y7 0.795 0.608 1.308 0.191 0.795 0.191
## .y4 ~~
## .y8 0.348 0.442 0.787 0.431 0.348 0.109
## .y6 ~~
## .y8 1.356 0.568 2.386 0.017 1.356 0.338
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.082 0.019 4.184 0.000 0.082 0.154
## .x2 0.120 0.070 1.718 0.086 0.120 0.053
## .x3 0.467 0.090 5.177 0.000 0.467 0.239
## .y1 1.891 0.444 4.256 0.000 1.891 0.277
## .y2 7.373 1.374 5.366 0.000 7.373 0.486
## .y3 5.067 0.952 5.324 0.000 5.067 0.478
## .y4 3.148 0.739 4.261 0.000 3.148 0.285
## .y5 2.351 0.480 4.895 0.000 2.351 0.347
## .y6 4.954 0.914 5.419 0.000 4.954 0.443
## .y7 3.431 0.713 4.814 0.000 3.431 0.322
## .y8 3.254 0.695 4.685 0.000 3.254 0.315
## ind60 0.448 0.087 5.173 0.000 1.000 1.000
## .dem60 3.956 0.921 4.295 0.000 0.800 0.800
## .dem65 0.172 0.215 0.803 0.422 0.039 0.039
semPaths(fit, what = "path", whatLabels = "std", style = "lisrel", edge.label.cex = .9, rotation = 2, curve = 2, layoutSplit = FALSE, normalize = FALSE, height = 9, width = 6.5, residScale = 10)
The results suggest that the model has excellent fit with a non significant chi square, a CFI of .995, and a RMSEA value .035. All factors (Ind60 x1 = .92, x2 = .97, x3 = .87; Dem60 y1 = .85, y2 = .71, y3 = .72, y4 = .85; Dem65 y5 = .81, y6 = .75, y7 = .82, y = .83) and paths (Dem60 ~ ind60 = .45; Dem65 ~ ind60(.18) + dem60(.89)) were significant and there was no evidence of colinearity in the model. Pruning would not be advisable as the model demonstrates outstanding parsimony. How do you interpret these findings on a practical level?
6.1.1.3 Mediation
We can create mediation models where intermediate variables are considered and called the mediator. We use these variables to attempt to explain the impact of an independent variable on an outcome. For instance, it is logical to deduce that past democratization of a country (d60) would predict future democratization (d65). However, it is questionable whether the industrialization of a country predicts a move toward democratization over time.
Let’s set up a mediation model to answer the research question: “To what extent does a country’s level of industrialization impact the development of political democracy over time?” Retain the same latent variables from the previous model. To create a mediation model, use the following formula:
direct effect Y ~ cX # mediator M ~ aX Y ~ bM # indirect effect (ab) ab := ab # total effect total := c + (ab)
The :=
operator “defines” new parameters which take on values that are
an arbitrary function of the original model parameters”
(https://lavaan.ugent.be/tutorial/mediation.html).
<- '
model # measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# mediation
dem65 ~ c*dem60
ind60 ~ a*dem60
dem65 ~ b*ind60
# indirect effect (a*b)
ab := a*b
# total effect
total := c + (a*b)
'
<- sem(model, data=PoliticalDemocracy)
fit summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 37 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 25
##
## Number of observations 75
##
## Model Test User Model:
##
## Test statistic 72.462
## Degrees of freedom 41
## P-value (Chi-square) 0.002
##
## Model Test Baseline Model:
##
## Test statistic 730.654
## Degrees of freedom 55
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.953
## Tucker-Lewis Index (TLI) 0.938
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1564.959
## Loglikelihood unrestricted model (H1) -1528.728
##
## Akaike (AIC) 3179.918
## Bayesian (BIC) 3237.855
## Sample-size adjusted Bayesian (BIC) 3159.062
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.101
## 90 Percent confidence interval - lower 0.061
## 90 Percent confidence interval - upper 0.139
## P-value RMSEA <= 0.05 0.021
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.055
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ind60 =~
## x1 1.000 0.669 0.920
## x2 2.182 0.139 15.714 0.000 1.461 0.973
## x3 1.819 0.152 11.956 0.000 1.218 0.872
## dem60 =~
## y1 1.000 2.201 0.845
## y2 1.354 0.175 7.755 0.000 2.980 0.760
## y3 1.044 0.150 6.961 0.000 2.298 0.705
## y4 1.300 0.138 9.412 0.000 2.860 0.860
## dem65 =~
## y5 1.000 2.084 0.803
## y6 1.258 0.164 7.651 0.000 2.623 0.783
## y7 1.282 0.158 8.137 0.000 2.673 0.819
## y8 1.310 0.154 8.529 0.000 2.730 0.847
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dem65 ~
## dem60 (c) 0.864 0.113 7.671 0.000 0.913 0.913
## ind60 ~
## dem60 (a) 0.136 0.036 3.738 0.000 0.448 0.448
## dem65 ~
## ind60 (b) 0.453 0.220 2.064 0.039 0.146 0.146
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.082 0.020 4.180 0.000 0.082 0.154
## .x2 0.118 0.070 1.689 0.091 0.118 0.053
## .x3 0.467 0.090 5.174 0.000 0.467 0.240
## .y1 1.942 0.395 4.910 0.000 1.942 0.286
## .y2 6.490 1.185 5.479 0.000 6.490 0.422
## .y3 5.340 0.943 5.662 0.000 5.340 0.503
## .y4 2.887 0.610 4.731 0.000 2.887 0.261
## .y5 2.390 0.447 5.351 0.000 2.390 0.355
## .y6 4.343 0.796 5.456 0.000 4.343 0.387
## .y7 3.510 0.668 5.252 0.000 3.510 0.329
## .y8 2.940 0.586 5.019 0.000 2.940 0.283
## .ind60 0.358 0.071 5.020 0.000 0.799 0.799
## dem60 4.845 1.088 4.453 0.000 1.000 1.000
## .dem65 0.115 0.200 0.575 0.565 0.026 0.026
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ab 0.062 0.031 2.012 0.044 0.065 0.065
## total 0.926 0.114 8.097 0.000 0.978 0.978
semPaths(fit, "std", edge.label.cex = 1.0, curvePivot = TRUE)
The chi-square is non significant now. The CFI is .95, and RMSEA is .101 and statistically significant. Factor structures remain intact and all paths are significant. The total effect of ind60 and dem60 explains 84% (beta weights are equivalent to r (then squared for variance explained) of the variance in dem65, however less than 1% of that variance is attributable to industrialization (p = .044). Depending on the measurement validity of this dataset, industrialization should be ruled out as a strong contributor to any change in political democracy. An additional model might be specified with other explanatory variables such as ?
6.1.1.4 Multiple Groups
If your sample size is large enough you can use lavaan
to conduct
multiple group analysis. To do this add the name of the grouping
variable to group argument. The same model will be fitted to each group.
<- ' visual =~ x1 + x3
HS.model textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~ textual'
<- cfa(HS.model, data = HolzingerSwineford1939, group = "school") fit
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative
summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 53 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 52
##
## Number of observations per group:
## Pasteur 156
## Grant-White 145
##
## Model Test User Model:
##
## Test statistic 107.318
## Degrees of freedom 36
## P-value (Chi-square) 0.000
## Test statistic for each group:
## Pasteur 48.124
## Grant-White 59.194
##
## Model Test Baseline Model:
##
## Test statistic 893.935
## Degrees of freedom 56
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.915
## Tucker-Lewis Index (TLI) 0.868
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3236.131
## Loglikelihood unrestricted model (H1) -3182.472
##
## Akaike (AIC) 6576.262
## Bayesian (BIC) 6769.032
## Sample-size adjusted Bayesian (BIC) 6604.118
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.115
## 90 Percent confidence interval - lower 0.090
## 90 Percent confidence interval - upper 0.140
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.079
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##
## Group 1 [Pasteur]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x1 1.000 1.411 1.195
## x3 0.311 0.166 1.870 0.061 0.439 0.379
## textual =~
## x4 1.000 0.949 0.826
## x5 1.177 0.101 11.668 0.000 1.117 0.855
## x6 0.870 0.076 11.455 0.000 0.825 0.836
## speed =~
## x7 1.000 0.622 0.575
## x8 1.080 0.269 4.014 0.000 0.672 0.689
## x9 0.821 0.202 4.057 0.000 0.511 0.517
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual ~
## textual 0.566 0.101 5.591 0.000 0.380 0.380
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## textual ~~
## speed 0.195 0.073 2.684 0.007 0.330 0.330
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 4.941 0.095 52.249 0.000 4.941 4.183
## .x3 2.487 0.093 26.778 0.000 2.487 2.144
## .x4 2.823 0.092 30.689 0.000 2.823 2.457
## .x5 3.995 0.105 38.183 0.000 3.995 3.057
## .x6 1.922 0.079 24.320 0.000 1.922 1.947
## .x7 4.432 0.087 51.181 0.000 4.432 4.098
## .x8 5.563 0.078 71.214 0.000 5.563 5.702
## .x9 5.418 0.079 68.440 0.000 5.418 5.480
## .visual 0.000 0.000 0.000
## textual 0.000 0.000 0.000
## speed 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 -0.597 0.974 -0.613 0.540 -0.597 -0.428
## .x3 1.153 0.161 7.162 0.000 1.153 0.857
## .x4 0.419 0.069 6.096 0.000 0.419 0.317
## .x5 0.460 0.086 5.367 0.000 0.460 0.269
## .x6 0.293 0.050 5.862 0.000 0.293 0.301
## .x7 0.783 0.128 6.113 0.000 0.783 0.669
## .x8 0.501 0.119 4.199 0.000 0.501 0.526
## .x9 0.716 0.103 6.922 0.000 0.716 0.733
## .visual 1.704 0.994 1.715 0.086 0.855 0.855
## textual 0.901 0.150 5.998 0.000 1.000 1.000
## speed 0.387 0.135 2.867 0.004 1.000 1.000
##
##
## Group 2 [Grant-White]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual =~
## x1 1.000 0.777 0.676
## x3 0.885 0.210 4.217 0.000 0.687 0.663
## textual =~
## x4 1.000 0.965 0.861
## x5 0.998 0.087 11.436 0.000 0.964 0.832
## x6 0.967 0.085 11.338 0.000 0.934 0.825
## speed =~
## x7 1.000 0.727 0.705
## x8 1.155 0.175 6.594 0.000 0.839 0.802
## x9 0.927 0.145 6.399 0.000 0.673 0.657
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## visual ~
## textual 0.471 0.101 4.677 0.000 0.585 0.585
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## textual ~~
## speed 0.243 0.078 3.132 0.002 0.347 0.347
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 4.930 0.095 51.696 0.000 4.930 4.293
## .x3 1.996 0.086 23.195 0.000 1.996 1.926
## .x4 3.317 0.093 35.625 0.000 3.317 2.958
## .x5 4.712 0.096 48.986 0.000 4.712 4.068
## .x6 2.469 0.094 26.277 0.000 2.469 2.182
## .x7 3.921 0.086 45.819 0.000 3.921 3.805
## .x8 5.488 0.087 63.174 0.000 5.488 5.246
## .x9 5.327 0.085 62.571 0.000 5.327 5.196
## .visual 0.000 0.000 0.000
## textual 0.000 0.000 0.000
## speed 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.716 0.159 4.514 0.000 0.716 0.543
## .x3 0.601 0.127 4.741 0.000 0.601 0.560
## .x4 0.325 0.065 5.036 0.000 0.325 0.259
## .x5 0.413 0.072 5.770 0.000 0.413 0.308
## .x6 0.408 0.069 5.924 0.000 0.408 0.319
## .x7 0.534 0.092 5.811 0.000 0.534 0.503
## .x8 0.390 0.099 3.932 0.000 0.390 0.357
## .x9 0.598 0.091 6.537 0.000 0.598 0.569
## .visual 0.396 0.140 2.839 0.005 0.657 0.657
## textual 0.932 0.152 6.138 0.000 1.000 1.000
## speed 0.528 0.127 4.156 0.000 1.000 1.000
semPaths(fit, "std", edge.label.cex = 1.0, rotation = 2, curvePivot = TRUE)
The x2 variable at the Pasteur school did not load onto the visual achievement factor. We can retool our questionnaire or consider the study validity to fix this, but for now let’s just remove it from the model again. Once we do that we can see that the x1 and x3 items are loading very differently onto the visual factor, though both items meet the threshold of .40. What would you do if this was your study?
6.1.1.5 Measurement Invariance
You can also test for measurement invariance across the grouping
variable using the measurmentInvariance()
function which “performs a
number of multiple group analyses in a particular sequence, with
increasingly more restrictions on the
parameters”(https://lavaan.ugent.be/tutorial/groups.html). Load the
semTools
library to use this function.
There are four types of measurment invariance.
The first is equal form configural invariance, which assumes the factor structures (number of items) and patterns (relationships between factors) are identical across groups. The second is equal loadings (metric invariance), which assumes that factor loadings (extent which each item loads onto factors) are equal across groups. The third is equal intercepts (scalar invariance), which assumes that the intercepts of observed scores are equal across groups. Finally, equal residual variances (strict invariance) assumes that the residual variances of observed paths are equal across groups.
#install.packages('semTools')
library(semTools)
measurementInvariance(model = HS.model,
data = HolzingerSwineford1939,
group = "school")
##
## Measurement invariance models:
##
## Model 1 : fit.configural
## Model 2 : fit.loadings
## Model 3 : fit.intercepts
## Model 4 : fit.means
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit.configural 36 6576.3 6769.0 107.32
## fit.loadings 41 6576.3 6750.5 117.35 10.030 5 0.07439 .
## fit.intercepts 46 6600.7 6756.4 151.78 34.431 5 1.954e-06 ***
## fit.means 49 6638.2 6782.7 195.21 43.431 3 1.993e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Fit measures:
##
## cfi rmsea cfi.delta rmsea.delta
## fit.configural 0.915 0.115 NA NA
## fit.loadings 0.909 0.111 0.006 0.003
## fit.intercepts 0.874 0.124 0.035 0.012
## fit.means 0.826 0.141 0.048 0.017
This model assesses the four measurement invariance types across the grouping variable ‘school’ in the HS dataset. No model met the acceptable threshold for RMSEA. Judging by CFI alone, which is not always advisable, form, loadings, and intercept models demonstrate good fit. Thus, each school represented in the HS dataset can be assumed to be equal in these respects, but fail to demonstrate strict invariance. These statistics might be used to validate this particular measure as congruent across different groups of students.
6.2 Case Study
In our case study we will be modeling data from Holzer et al.’s (2021) published data on self regulated learning and well being. The abstract and source are pasted below.
“In the wake of COVID-19, university students have experienced fundamental changes of their learning and their lives as a whole. The present research identifies psychological characteristics associated with students’ well-being in this situation. We investigated relations of basic psychological need satisfaction (experienced competence, autonomy, and relatedness) with positive emotion and intrinsic learning motivation, considering self-regulated learning as a moderator. Self-reports were collected from 6,071 students in Austria (Study 1) and 1,653 students in Finland (Study 2). Structural equation modeling revealed competence as the strongest predictor for positive emotion. Intrinsic learning motivation was predicted by competence and autonomy in both countries and by relatedness in Finland. Moderation effects of self-regulated learning were inconsistent, but main effects on intrinsic learning motivation were identified. Surprisingly, relatedness only exerted a minor effect on positive emotion. The results inform strategies to promote students’ well-being through distance-learning, mitigating negative effects of the situation.”
The authors saved the data in an SPSS file (.sav). Let’s use the haven library to read and convert the data file into a dataset.
#install.packages('haven')
library(haven)
= file.path("semcasestudy.sav")
path = read_sav(path) dataset
Examine the dataset using str(), summary() or DataExplorer::create_report()
dataset
## # A tibble: 7,724 × 39
## country gender age sr1 sr2 sr3 comp1 comp2 comp3 auto1
## <dbl+l> <dbl+l> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
## 1 0 [Aus… 1 [fem… 36 5 [str… 4 [dis… 2 [agr… 2 [agr… 3 [som… 3 [som… 3 [som…
## 2 0 [Aus… 2 [mal… 36 4 [dis… 4 [dis… 1 [str… 4 [dis… 4 [dis… 3 [som… 4 [dis…
## 3 0 [Aus… 2 [mal… 36 1 [str… 1 [str… 1 [str… 3 [som… 4 [dis… 2 [agr… 5 [str…
## 4 0 [Aus… 2 [mal… 36 1 [str… 2 [agr… 1 [str… 2 [agr… 2 [agr… 2 [agr… 4 [dis…
## 5 0 [Aus… 1 [fem… 36 1 [str… 1 [str… 3 [som… 3 [som… 2 [agr… 3 [som… 4 [dis…
## 6 0 [Aus… 1 [fem… 36 1 [str… 1 [str… 1 [str… 2 [agr… 2 [agr… 2 [agr… 3 [som…
## 7 0 [Aus… 1 [fem… 36 2 [agr… 2 [agr… 2 [agr… 4 [dis… 3 [som… 3 [som… 4 [dis…
## 8 0 [Aus… 1 [fem… 36 2 [agr… 1 [str… 1 [str… 2 [agr… 1 [str… 2 [agr… 3 [som…
## 9 0 [Aus… 3 [oth… 36 5 [str… 4 [dis… 2 [agr… 3 [som… 1 [str… 3 [som… 4 [dis…
## 10 0 [Aus… 1 [fem… 36 4 [dis… 3 [som… 2 [agr… 3 [som… 2 [agr… 3 [som… 3 [som…
## # … with 7,714 more rows, and 29 more variables: auto2 <dbl+lbl>,
## # auto3 <dbl+lbl>, pa1 <dbl+lbl>, pa2 <dbl+lbl>, pa3 <dbl+lbl>,
## # lm1 <dbl+lbl>, lm2 <dbl+lbl>, lm3 <dbl+lbl>, gp1 <dbl+lbl>, gp2 <dbl+lbl>,
## # gp3 <dbl+lbl>, sr1.rec <dbl+lbl>, sr2.rec <dbl+lbl>, sr3.rec <dbl+lbl>,
## # comp1.rec <dbl+lbl>, comp2.rec <dbl+lbl>, comp3.rec <dbl+lbl>,
## # auto1.rec <dbl+lbl>, auto2.rec <dbl+lbl>, auto3.rec <dbl+lbl>,
## # pa1.rec <dbl+lbl>, pa2.rec <dbl+lbl>, pa3.rec <dbl+lbl>, …
Subjective Well-Being (SWB) describes “how people experience the quality of their lives, which includes both emotional reactions and cognitive judgements”, encompassing moods, emotions, and satisfaction in general and specific areas of life (Martela & Ryan, 2015). Situations outside of our control, such as natural disasters or pandemics, have a significant impact on SWB. For instance, the COVID-19 outbreak has been associated with a 74% dip in SWB in China, a phenomenon also felt across Europe and the United States.
Self regulation style is represented by a motivational spectrum, developed by Deci & Ryan (2000) where students exhibit behavior regulation styles as they perceive motivational influences from either external or internal sources. This spectrum ranges from extrinsic to intrinsic. For instance, financial gain may serve as an extrinsic regulator, as in the salary a person earns motivates them to come to work every day, while the pressure of a duty or a job is an example of introjected regulation, where a person feels they must complete an action due to external pressure, often from other people or the environment. Identified regulation occurs when a person’s identity intersects with an expected set of tasks for success, for instance, I am a musician, so I have to practice. On the far end of the spectrum, intrinsic regulation is self-sustaining, and catalyzed by an identified interest and curiosity for a task. Hundreds of studies across multiple continents suggest that the satisfaction of basic psychological needs can encourage students to move along the spectrum towards intrinsic regulation, where growth, learning, and skill acquisition become self-regulated processes. To practice these skills, follow the directions below.
Considering the theory behind these data, start drafting some research questions. I tend to find that when constructing initial models, it is imperative to draw my thinking on paper and construct model syntax.
Use the code box below to specify your own model based on Holzer et al.’s (2021) dataset. You may need to create composite indexes using multiple subscales of observed variables before specifying your model. If you aren’t sure where to start, go back to the abstract at the start of the case study for evidence of Holzer et al.’s model specifications. Also see the codebook titled ‘HolzerReadme’ at https://osf.io/6jb9t/ within this short course’s folder. Hint: Most items were collected with strongly agree coded as 1, with strongly disagree coded as 5. All items have been recoded to match magnitude with numerical order. These items are labeled as item.rec.
Run the model and interpret the results.
Visualize the model.
Do you need to specify a new model? Do any paths need to be pruned? Write new formulas below. Then run, visualize and interpret the model.
Does your model hold across Austrian and Finnish populations? Group your model by country to see the output of each model, then run a test of measurement invariance on your model.
Make a few notes here regarding potential next steps for research. Consider how the COVID-19 pandemic might impact the measured relationships.
Specify and run a mediation model including variables in the Holzer et al. (2021) dataset.
6.3 Review
In this chapter we introduced SEM concepts and methods. To make sure you understand this material, there is a practice assessment to go along with this chapter at https://jayholster.shinyapps.io/SEMinRAssessment/
6.4 References
Deci, E. L., & Ryan, R. M. (2000) The “What and”Why” of goal pursits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268.
Epskamp, S. (2015). semPlot: Unified visualizations of structural equation models. Structural Equation Modeling, 22(3), 474–483. <https://doi.org/10.1080/10705511.2014.937847>
Holzer, J., Lüftenegger, M., Korlat, S., Pelikan, E., Salmela-Aro, K., Spiel, C., & Schober, B. (2021). Higher dducation in times of COVID-19: University students’ basic need satisfaction, self-regulated learning and well-being. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. https://doi.org/10.3886/E133601V1
Martela, F, & Ryan, R. (2015). The benefits of benevolence: Basic psychological needs, beneficence, and the enhancement of well-being. Journal of Personality, 84(6), 750–764. https://doi.org/10.1111/jopy.12215
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. [https://doi.org/[10.18637/jss.v048.i02](https://doi.org/[10.18637/jss.v048.i02)](https://doi.org/%5B10.18637/jss.v048.i02){.uri}.
6.4.1 R Short Course Series
Video lectures of each guidebook chapter can be found at https://osf.io/6jb9t/. For this chapter, find the follow the folder path Structural Equation Modeling in R -> AY 2021-2022 Spring and access the video files, r markdown documents, and other materials for each short course.
6.4.2 Acknowledgements
This guidebook was created with support from the Center for Research Data and Digital Scholarship and the Laboratory for Interdisciplinary Statistical Analaysis at the University of Colorado Boulder, as well as the U.S. Agency for International Development under cooperative agreement #7200AA18CA00022. Individuals who contributed to materials related to this project include Jacob Holster, Eric Vance, Michael Ramsey, Nicholas Varberg, and Nickoal Eichmann-Kalwara.