2  The key concept behind SEM: model-implied covariance matrix Σ(θ)

In the following sections within this chapter, we will illustrate the key concept of SEM by comparing SEM with conventional linear regression in a typical multiple regression scenario.

2.1 No mean structure

Because in SEM analysis, we study the relationship among variables by analyzing covariance/correlation. However, the magnitude of covariance between two given variables remains unchanged if we modified their means. We, therefore assume the means of all variables are zero, the resulting structure of interest is called no mean structure. If not this case, one can achieve zero means by centering or standardize data. This is true in most cases, (see for more information). Analysis on covariance structure without considering means is often called covariance analysis.

2.2 Σ(θ)

If we are interested in the relationship between two variables, we normally use covariance/correlation (Person product-moment correlation coefficient) to quantify it. In doing so, we assume these two variables are random variables (more specifically, bivariate normally distributed).

However, in multiple regression, we have one dependent variable, and a set of predictors, yi=β1x1i++βpxpi+ϵi, where x1,,xp are all fixed values, that is, we treat them as variables, but not random variables. This is so called the multiple regression with fixed x. The only random variables in multiple regression with fixed x are y and ϵ, where the randomness of y is arguably inherited from ϵ. Therefore, theoretically, we are not allowed to quantify the relationship among xs and y using either covariance or correlation.

But in SEM, we treat all xs and ys (yes, we can have more than 1 y in SEM) as random variables. If x1, , xp are random variables, the above model becomes multiple regression with random x (unless stated otherwise, multiple regression in this book refers to multiple regression with random xs), and now we can discuss the relationships among all variables using their covariances. The resultant covariance matrix is Cov(x1,,xp,y)=[σx12σx1yσyx1σy2]. Because we are expressing the relationship among all variables in terms of a multiple regression model, therefore we can use the selected model to re-express the covariance matrix above, and derive a model-implied covariance matrix Σ(θ), where θ is a vector hosting all unknown model parameters, representing the model specified. For example, for a multiple regression with 2 predictors x1 and x2, Σ(θ) is [σx12σx1x2σx1yσx2x1σx22σx2yσyx1σyx2σy2], where σx1y=Cov(x1,β1x1+β2x2+ϵ)=β1σx12+β2σx1x2, σx2y=Cov(x2,β1x1+β2x2+ϵ)=β1σx1x2+β2σx22, σy2=Cov(β1x1+β2x2+ϵ,β1x1+β2x2+ϵ)=β12σx12+β22σx22+2β1β2σx1x2+σϵ2. What we have now is the model implied covariance structure of y=β1x1+β2x2+ϵ. If we assumed x1 is correlated with x2, we allow σx1,x2 to be estimated freely, otherwise we can impose a restriction on it to fix it at 0.

When using sample data to estimate the specified model, we are actually trying to find the parameter estimations that satisfy the following 6 equations jointly {sx12=σ^x12sx2x1=σ^x2x1sx22=σ^x22syx1=β^1σ^x12+β^2σ^x1x2syx2=β^1σ^x1x2+β^2σ^x22sy2=β^12σ^x12+β^22σ^x22+2β^1β^2σ^x1x2+σ^ϵ2

In comparison with conventional regression with fixed x, when modeling x1, x2, and y with sample size n, we are actually trying to fit a multiple regression with fixed xs that satisfies n unique equations simultaneously {y1=β^0+β1^x11+β2^x12y2=β^0+β1^x21+β2^x22y3=β^0+β1^x31+β2^x32yn=β^0+β1^xn1+β2^xn2. However, the equation system is unsolvable, so we solve the following equation instead, i=1n(yiβ^0β1^xi1β2^xi2)2=0, this is the ordinary least square estimator for multiple linear regression with fixed xs.

2.3 Model all random variables jointly using multivariate normal distribution

In multiple regression with fixed x, we only assume ϵN(0,σ2), indicating that we are using univariate normal distribution as the underlying distribution when modeling, that is yi(β0β1x1βpxp)=ϵiN(0,1).

In SEM, when treating all variables (both IVs and DVs) as random variables, we can not use univariate normal distribution to model every random variable separately, because this is effectively treating them as independent to each other and against to the main goal of SEM. Instead, we use multivariate normal distribution to model all random variables jointly while taking their relationship into consideration.

Let’s denote all p independent variables as x1, , xp, all m dependent variables as y1, , ym, they jointly form a random vector that follows a multivariate normal distribution (x1xpy1ym)MVN(0,Σ), where Σ=[σx12σx1xpσx1y1σx1ymσxpx1σxp2σxpy1σxpymσy1x1σy1xpσy12σy1ymσymx1σymxpσymy1σym2]. If we imposed a structure, a statistical model with unknown parameters, upon all random variables to detail the relationship among them, we are effectively stating that we believe the relationship among the variables in interest can be explained by Σ(θ) on the population level, i.e. Σ=Σ(θ)). Then we have, (x1xpy1ym)MVN(0,Σ(θ)).

2.4 Model identifiability

2.4.1 Number of information

For a univariate normal distribution, one only need two pieces of information to determine the distribution precisely, μ and σ. Because we are using no-mean-structure, we only need σ to pinpoint a normal distribution. Put another way, we only need one piece of information to depict one normally distributed random variable.

If we have two normally distributed random variables, they have their own variances, we now have two pieces of unique information. But these two variables can be correlated, quantifying by covariance or correlation, therefore we need one more piece of information to delineate the joint distribution.

In general, if we have p random variables, we have p(p+1)/2 unique variances and covariances, that is, p(p+1)/2 pieces of unique information.

2.4.2 Number of parameters

The number of parameters is the number of unknown quantities in the model, we need to estimate them from data. The number of parameters should not exceed the number of unique information, otherwise we will have an unsolvable model, see more detail in the following section.

2.4.3 Degree of freedom

  • Just-identified model/unrestricted model/saturated model

The degree of freedom is the number of unique information left after estimating unknown parameters. In a multiple regression model yi=β1x1i+β2x2i, the number of unique information in the resulting Σ(θ) is 6 (all unrepeated elements in the lower triangle parts and the diagonal of Σ(θ)), the number of parameters is (σ^x12, σ^x22, σ^x1x2, β^1, β^2, and σ^ϵ2). The degree of freedom is 3(3+1)/26=0. For models with df=0, we call it just-identified model, or unrestricted model meaning no restriction is imposed on unknown parameter (see more details in the following text), or saturated model meaning it use all available information.

Note that the model-data fit of just-identified model can not be evaluated using likelihood ratio test (LRT) or any LRT-based method, because it is the full model, aka the best model we can fit to our data, and will always demonstrate perfect model-data fit (see more details in the likelihood ratio test chapter).

  • Over-identified model/restricted model

If df>0, the number of freely estimated parameters is less than the number of unique equations, this is the so-called over-identified model. For example, if we assumed x2 has not impact on y and fixed β^x2y=0 (impose restriction on β^x2y), the number of parameters to be freely estimated decreases by 1, we end up with 1 more df.

  • Unidentifiable model

If df<0, it’s called under-identified model and unsolvable.

Note that, although the reported df in Mplus is the same as in other software (i.e. lavaan), the way how Mplus calculates df is slightly different. In Mplus, the number of information is different for dependent and independent variables. Because the model does not impose restrictions on the parameters of the independent variables, their means, variances and covariances can be estimated separately as the sample values. Therefore the unique elements in the covariance matrix of independent variables are excluded when counting number of information.

For example, the number of information of ex3.11 (see the figure below) is 3×(3+1)/2+3×3, where 6 is the number of unique elements in the contrivance matrix of dependent variables y1, y2 and y3, 9 is the number of covariances between x1, x2, x3 and y1, y2, y3.

The corresponding element of Σ(θ) are marked in red, [σx12σx2x1σx22σx3x1σx3x1σx32σy1x1σy1x2σy1x3σy12σy2x1σy2x2σy2x3σy2y1σy22σy3x1σy3x2σy3x3σy3y1σy3y2σy32].

Given the fact that there are 9 slopes and 3 residual variances to be estimated, df=1512=3.

2.5 Homework

Derive the covariance structure of following model, assume x is correlated with w

y=βxx+βww+βzz+e