2 The key concept behind SEM: model-implied covariance matrix
In the following sections within this chapter, we will illustrate the key concept of SEM by comparing SEM with conventional linear regression in a typical multiple regression scenario.
2.1 No mean structure
Because in SEM analysis, we study the relationship among variables by analyzing covariance/correlation. However, the magnitude of covariance between two given variables remains unchanged if we modified their means. We, therefore assume the means of all variables are zero, the resulting structure of interest is called no mean structure. If not this case, one can achieve zero means by centering or standardize data. This is true in most cases, (see Chapter 5 for more information). Analysis on covariance structure without considering means is often called covariance analysis.
2.2
If we are interested in the relationship between two variables, we normally use covariance/correlation (Person product-moment correlation coefficient) to quantify it. In doing so, we assume these two variables are random variables (more specifically, bivariate normally distributed).
However, in multiple regression, we have one dependent variable, and a set of predictors,
But in SEM, we treat all
When using sample data to estimate the specified model, we are actually trying to find the parameter estimations that satisfy the following 6 equations jointly
In comparison with conventional regression with fixed
2.3 Model all random variables jointly using multivariate normal distribution
In multiple regression with fixed
In SEM, when treating all variables (both IVs and DVs) as random variables, we can not use univariate normal distribution to model every random variable separately, because this is effectively treating them as independent to each other and against to the main goal of SEM. Instead, we use multivariate normal distribution to model all random variables jointly while taking their relationship into consideration.
Let’s denote all
2.4 Model identifiability
2.4.1 Number of information
For a univariate normal distribution, one only need two pieces of information to determine the distribution precisely,
If we have two normally distributed random variables, they have their own variances, we now have two pieces of unique information. But these two variables can be correlated, quantifying by covariance or correlation, therefore we need one more piece of information to delineate the joint distribution.
In general, if we have
2.4.2 Number of parameters
The number of parameters is the number of unknown quantities in the model, we need to estimate them from data. The number of parameters should not exceed the number of unique information, otherwise we will have an unsolvable model, see more detail in the following section.
2.4.3 Degree of freedom
- Just-identified model/unrestricted model/saturated model
The degree of freedom is the number of unique information left after estimating unknown parameters. In a multiple regression model
Note that the model-data fit of just-identified model can not be evaluated using likelihood ratio test (LRT) or any LRT-based method, because it is the full model, aka the best model we can fit to our data, and will always demonstrate perfect model-data fit (see more details in the likelihood ratio test chapter).
- Over-identified model/restricted model
If
- Unidentifiable model
If
Note that, although the reported
For example, the number of information of ex3.11 (see the figure below) is

The corresponding element of
Given the fact that there are
2.5 Homework
Derive the covariance structure of following model, assume