3 Path analysis

An easy and convenient representation of the relationships among a number of variables is using path diagram, we have saw a lot in past chapters. A path diagram can be viewed as a hypothesised/theory-based model, specifying the structure among variables in interest. We collect data to test whether our sample support the proposed model. Basically, path analysis is the analysis of the “path”.

3.1 Path diagram

Rectangle: observed or manifest variable
Circle: latent variable (i.e. error, factor)
Single-headed arrow: linear relationship between two variables, starts from an independent variable and ends on a dependent variable.
Double-headed arrow: variance of a variable or covariance between two variables.

3.2 What is the difference between multiple regression and path analysis

In path analysis, a variable can be both dependent variable and independent variable at the same time, e.g. mediator, in multiple regression, dependent variable can not be independent variable.
Multiple regression is more restricted in terms of the type of hypothesis it can test. We can only test whether a independent variable can effectively predict dependent variable while controlling other predictors; whereas path analysis can do more than that, e.g. mediation mechanism.

3.3 Effects Decomposition

When doing path analysis, we impose a theoretical structure upon our variables and derive $Σ (θ)$ . In $Σ (θ)$ , all the variances/correlations between one independent variable and one dependent variable can be partitioned into a summation of different parts using model parameters. This is called effects decomposition. There are four types of possible resulting effects.

3.3.1 Four types of effects

Causal effect (most important)

Direct effect: a direct effect is represented by a single causal arrow from a independent variable to a dependent variable, i.e. $x$ leads to $y$ , the change in $x$ will “cause” a change in $y$ ;
Indirect effect: an indirect effect is represented by two or more spliced causal arrows between an independent variable and a dependent variable, i.e. $x$ leads to $m$ , $m$ leads to $y$ , the causal effect of $x$ on $y$ is transmitted via $m$ ;

Non-causal effect

Undecomposed effect: when there exists causal effect between an independent variable and a dependent variable, the causal effect accounts for part of the covariance/correlation between these two variables, the rest part of their covariance/correlation is undecomposed effect because it can not be decomposed clearly. This is usually because there is another independent variable that is in the same regression model but correlated with the very independent variable, we do not know which independent variable should be responsible for the leftover covariance/correlation. i.e. $x_{1}$ and $x_{2}$ are two predictors of $y$ , the change in $x_{1}$ can spread into $x_{2}$ (because correlated), and the change in $x_{2}$ lead to change in $y$ ;
Spurious covariance/correlation: the part of covariance/correlation between two dependent variables that roots in having same independent variable or correlated independent variables, i.e. $x$ is the predictor of both $y_{1}$ and $y_{2}$ , the change in $x$ will lead to change in $y_{1}$ and $y_{2}$ , making $y_{1}$ and $y_{2}$ seemingly correlated.

3.3.2 Effects decomposition via tracing diagram and $Σ (θ)$

Let’s look at an example, we have four variable:

SES: social economic status
IQ: intelligence quotient
nACH: need for achievement
GPA: grade-point average

We specify the relationship among this four variables as

The corresponding regressions are $\begin{aligned} n A C H & = β_{31} S E S + β_{32} I Q + ϵ_{n A C H} \\ G P A & = β_{41} S E S + β_{42} I Q + β_{43} n A C H + ϵ_{G P A} . \end{aligned}$ The resultant $Σ (θ)$ is $\begin{array}{r} Σ (θ) = [\begin{array}{c} σ_{S E S}^{2} & σ_{S E S, I Q} & σ_{S E S, n A C H} & σ_{S E S, G P A} \\ σ_{I Q, S E S} & σ_{I Q}^{2} & σ_{I Q, n A C H} & σ_{I Q, G P A} \\ σ_{n A C H, S E S} & σ_{n A C H, I Q} & σ_{n A C H}^{2} & σ_{n A C H, G P A} \\ σ_{G P A, S E S} & σ_{G P A, I Q} & σ_{G P A, n A C H} & σ_{G P A}^{2} \end{array}] . \end{array}$

In the following figures, red represents direct effects, blue represent indirect effects, green represents undecomposited effects, and grey represents spurious covariance/correlation.

The decomposition of $σ_{S E S, n A C H}$

$\begin{aligned} σ_{S E S, n A C H} & = Cov (S E S, β_{31} S E S + β_{32} I Q + ϵ_{n A C H}) \\ = β_{31} σ_{S E S}^{2} + β_{32} σ_{S E S, I Q}, \end{aligned}$

The decomposition of $σ_{I Q, n A C H}$

$\begin{aligned} σ_{I Q, n A C H} & = Cov (I Q, β_{31} S E S + β_{32} I Q + ϵ_{n A C H}) \\ = β_{32} σ_{I Q}^{2} + β_{31} σ_{I Q, S E S} \end{aligned}$

The decomposition of $σ_{S E S, G P A}$

$\begin{aligned} σ_{S E S, G P A} & = Cov (S E S, β_{41} S E S + β_{42} I Q + β_{43} n A C H + ϵ_{G P A}) \\ = β_{41} σ_{S E S}^{2} + β_{42} σ_{S E S, I Q} + β_{43} σ_{S E S, n A C H} \\ = β_{41} σ_{S E S}^{2} + β_{43} β_{31} σ_{S E S}^{2} + β_{42} σ_{S E S, I Q} + β_{43} β_{32} σ_{S E S, I Q} . \end{aligned}$

The decomposition of $σ_{I Q, G P A}$

$\begin{aligned} σ_{I Q, G P A} & = Cov (I Q, β_{41} S E S + β_{42} I Q + β_{43} n A C H + ϵ_{G P A}) \\ = β_{41} σ_{I Q, S E S} + β_{42} σ_{I Q}^{2} + β_{43} σ_{I Q, n A C H} \\ = β_{42} σ_{I Q}^{2} + β_{43} β_{32} σ_{I Q}^{2} + β_{41} σ_{I Q, S E S} + β_{43} β_{31} σ_{I Q, S E S} . \end{aligned}$

The decomposition of $σ_{n A C H, G P A}$

$\begin{aligned} σ_{n A C H, G P A} & = Cov (n A C H, β_{41} S E S + β_{42} I Q + β_{43} n A C H + ϵ_{G P A}) \\ = β_{41} σ_{n A C H, S E S} + β_{42} σ_{n A C H, I Q} + β_{43} σ_{n A C H}^{2} \\ = β_{43} σ_{n A C H}^{2} + \\ β_{41} β_{31} σ_{S E S}^{2} + β_{41} β_{32} σ_{S E S, I Q} + \\ β_{42} β_{32} σ_{I Q}^{2} + β_{42} β_{31} σ_{I Q, S E S} \end{aligned}$

By standardize all variables, the variance terms becomes 1 and disappear, the covariance terms become correlations, and the regression coefficients become standardized ones, we have

$\begin{aligned} ρ_{S E S, n A C H} & = β_{31} + β_{32} ρ_{S E S, I Q} \\ ρ_{I Q, n A C H} & = β_{32} + β_{31} ρ_{I Q, S E S} \\ ρ_{S E S, G P A} & = β_{41} + β_{43} β_{31} + β_{42} ρ_{S E S, I Q} + β_{43} β_{32} ρ_{S E S, I Q} \\ ρ_{I Q, G P A} & = β_{42} + β_{43} β_{32} + β_{41} ρ_{I Q, S E S} + β_{43} β_{31} ρ_{I Q, S E S} \\ ρ_{n A C H, G P A} & = β_{43} + β_{41} β_{31} + β_{42} β_{32} + β_{41} β_{32} ρ_{S E S, I Q} + β_{42} β_{31} ρ_{I Q, S E S} \end{aligned}$

Once the estimated parameters are available, it is very easy to manually conduct the effects decomposition of given two variables. For example, with the estimated paths below, the indirect effect between

S E S

and

G P A

0.398 \times 0.416

3.3.3 Homework

ex3.11

Decompose $r_{x 1 y 3}$ ;

3.4 Modeling process

After specifying a theoretical structure, the next step of SEM analysis (including path analysis), is modeling, i.e. collecting data and evaluating model. When dealing with multiple variables, it is very likely that we can have a group of competing models, we do not know which one is the best, we have to perform model evaluation in roughly two steps:

model estimation: obtain the optimal estimates of unknown parameters and model-data fit for these model;
model comparison: compare the model-data fit to find the best model.

3.4.1 Model estimation

3.4.1.1 Four basic matrices in SEM

At population level, when studying a set of mutually correlated variables, the covariance matrix of these variables is $Σ$ . If we imposed a structure with unknown parameters $θ$ upon these group of variables, we can derive the model-implied covariance matrix $Σ (θ)$ . If the structure we specified was true, i.e. correctly represent the relationship among variables in interest, then $Σ = Σ (θ)$ .

At sample level, the sample estimator of $Σ$ is sample covariance matrix $S$ , the sample estimator of $Σ (θ)$ is $Σ (\hat{θ})$ . If we specified the true model, $S$ should be closely approximated by $Σ (\hat{θ})$ , but due to sampling error, they won’t be exactly the same.

If we misspecified a wrong model, then the discrepancy between $Σ$ and $Σ (θ)$ will undoubtedly increase, so will that between $S$ and $Σ (\hat{θ})$ . Therefore, when performing model evaluation, we are effectively trying to find the best model with $\hat{θ}$ that can minimize the discrepancy between $S - Σ (\hat{θ})$ .

3.4.1.2 Discrepancy function

The purpose of model estimation is to find the best sample estimates for the unknown parameters in a given model. The “best” is achieved by minimize the discrepancy using sample. For example (this example is taken from the lecture note of SEM class taught by Professor Zhang Zhiyong at University of Notre Dame), we have two random variables $x$ and $y$ with

$\begin{array}{r} S = [\begin{array}{c} 2 & 1 \\ 1 & 2 \end{array}] \end{array}$

We assume the true relationship between $x$ and $y$ is $y = β x + ϵ$ , thus we have

$\begin{array}{r} Σ (θ) = [\begin{array}{c} σ_{x}^{2} & β σ_{x}^{2} \\ β σ_{x}^{2} & β^{2} σ_{x}^{2} + σ_{ϵ}^{2} \end{array}], \end{array}$

where the unknown parameters are $β$ , $σ_{x}^{2}$ , and $σ_{ϵ}^{2}$ . We can try different sets of $\hat{θ}$ to see how $S - Σ (\hat{θ})$ differ. When ${\hat{θ}}_{1} = (0, 1, 1)$ ,

$\begin{array}{r} Σ ({\hat{θ}}_{1}) = [\begin{array}{c} 1 & 0 \\ 0 & 1 \end{array}], \end{array}$

then

$\begin{array}{r} S - Σ ({\hat{θ}}_{1}) = [\begin{array}{c} 1 & 1 \\ 1 & 1 \end{array}] . \end{array}$

When ${\hat{θ}}_{2} = (1, 1, 1)$ ,

$\begin{array}{r} Σ ({\hat{θ}}_{2}) = [\begin{array}{c} 1 & 1 \\ 1 & 2 \end{array}], \end{array}$

and

$\begin{array}{r} S - Σ ({\hat{θ}}_{2}) = [\begin{array}{c} 1 & 0 \\ 0 & 0 \end{array}] . \end{array}$

It is clear that for this given model, ${\hat{θ}}_{2}$ is better, but we do not know whether ${\hat{θ}}_{2}$ is the best sample estimate of $θ$ . We need to quantify the difference between two matrices with identical size and an algorithm to minimize the difference to find the best $\hat{θ}$ .

Noted that, $S - Σ (\hat{θ})$ is actually a matrix function of $\hat{θ}$ , called discrepancy function $F$ . Minimizing a matrix function quickly becomes unmanageable as the size of matrix function increase. To that end, we usually summarize the difference between $S$ and $Σ (\hat{θ})$ as a scalar discrepancy function $F$ , different estimation methods only differ in the way they summarize.

3.4.1.3 Common estimation methods (skimming)

Ordinal least square (OLS)

$F_{O L S} (θ) = [s - σ (θ)]^{'} [s - σ (θ)],$

where $s$ and $σ (θ)$ are the $p (p + 1) / 2$ unique elements from $S$ and $Σ$ , respectively.

Multivariate normal distribution maximum likelihood (NML)

$F_{N M L} = \log | Σ (θ) | + tr (S Σ^{- 1} (θ)) - \log | S | - p .$

NML is usually the default estimation method in most SEM software. Note that, $(n - 1) F_{N M L} = - 2 \log λ = T$ .

3.4.2 Model fit

After fitting a specified model to data, we answer the question “how good is our model” by model-data fit. Model fit indices abound, most of them are directly based on likelihood ratio test (LRT).

3.4.2.1 The best model and the worst model

In SEM, the best model we can fit to data is the $Σ (θ) = S$ , that is the saturated model/unrestricted model/just-identified model. It is worth noting that in path analysis, unrestricted model means all possible paths are allowed, saturated because no extra parameter can be added to the existing model, $d f = 0$ . Following is an example of saturated model.

The worst model we can fit to data is the baseline model/null model. In this model, all $p$ variables are assumed to be independent to each other, thus the resulting $Σ (θ) = I_{P}$ .

3.4.2.2 Incremental measure of fit

In SEM, $T = (n - 1) F_{ML}$ is the LRT test statistics of tested model ( $H_{0}$ model), it asymptotically follows $χ^{2}$ distribution with $d f = p (p + 1) / 2 - q$ , where $q$ is the number of parameters.

$\begin{array}{r} Comparative Fit Index, CFI = 1 - \frac{max (T - d f, 0)}{max (T_{0} - d f_{0}, T - d f)}, \end{array}$ where $T_{0}$ is the resulting test statistics of LRT if we fit null model to data, and $d f_{0} = p (p + 1) / 2$ . CFI measures the extent that fitted model improves compared with the null model. It is restricted between 0 and 1.

$\begin{array}{r} Tucker Lewis Index, TLI/Non-normed Fit Index, NNFI = 1 - \frac{d f_{0}}{d f} (\frac{T - d f}{T_{0} - d f_{0}}) . \end{array}$ TLI adds penalty for complex model, thus tend to endorse model with less parameters, Note that, CFI and TLI should be very close to each other. If the CFI is less than one, then the CFI is always greater than the TLI, only one of the two should be reported to avoid redundancy. But TLI could exceed 1 under certain condition, if so, it is capped at 1 in most SEM software, except for Mplus.

For CFI and TLI:

$> 0.95$ indicates good model-data fit
$> 0.9$ indicates acceptable model-data fit

$\begin{array}{r} Root Mean Square Error of Approximation, RMSEA = \sqrt{\frac{T - d f}{d f (n - 1)}} . \end{array}$ RMSEA measures the average model misfit per $d f$ . It is always positive. Just like TLI, RMSEA tends to endorse small-size model. There is greater sampling error for small $d f$ and low $n$ models, especially for the former. Thus, models with small $d f$ and low $n$ can have artificially large values of the RMSEA. For instance, a $T$ of 2.098 (a value not statistically significant), with a $d f$ of 1 and $n$ of 70 yields an RMSEA of 0.126. For this reason Kenny & Kaniskan (2014) argue to not even compute the RMSEA for low $d f$ models.

A confidence interval can be computed for the RMSEA. Its formula is based on the non-central $χ^{2}$ distribution and usually the 90% interval is used. Ideally the lower value of the 90% confidence interval includes or is very near zero (or no worse than 0.05) and the upper value is not very large, i.e., less than 0.08 or perhaps a 0.10. The width of the confidence interval can be very informative about the precision in the estimate of the RMSEA.

$RMSEA \leq 0.05$ indicates a close fit of a model
$RMSEA \leq 0.08$ indicates a reasonable model
$RMSEA > 1$ indicates a bad model

Note that CFI, TLI and RMSEA treat a $T = d f$ as the best possible model.

$\begin{array}{r} Standardized Root Mean Square Residual, SRMR = \sqrt{\frac{2 \sum_{i = 1}^{p} \sum_{j = 1}^{i} {[\frac{s_{i j} - {\hat{σ}}_{i j}}{s_{i i} s_{j j}}]}^{2}}{p (p + 1)}} . \end{array}$

The SRMR is an absolute measure of fit and is defined as the standardized difference between the observed correlation and the predicted correlation. It is a positively biased measure and that bias is greater for small $N$ and for low $d f$ studies. Because the SRMR is an absolute measure of fit, a value of zero indicates perfect fit. The SRMR has no penalty for model complexity. A value less than .08 is generally considered a good fit (Hu & Bentler, 1999).

3.4.2.3 Comparative measure of fit

Flowing are 3 commonly used comparative measure of fit, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and sample-size adjusted BIC (SABIC).

$\begin{aligned} AIC & = T + 2 q \\ BIC & = T + \log (n) q \\ SABIC & = T + \log (\frac{n + 2}{24}) q . \end{aligned}$ Because AIC and BIC have no fixed scale, no cut-off value is available.

Lower values of AIC indicate a better fit and so the model with the lowest AIC is the best fitting model. There are somewhat different formulas given for the AIC in the literature, but those differences are not really meaningful as it is the difference in AIC that really matters. The AIC makes the researcher pay a penalty of two for every parameter that is estimated.

BIC increases the penalty as sample size increases. The BIC places a high value on parsimony. The SABIC or SABIC like the BIC places a penalty for adding parameters based on sample size, but not as high a penalty as the BIC.

n <- seq(100, 1000, 100)
plot(log(n), log((n + 2)/24))

3.4.3 Model comparison

In SEM, there are two types of model comparison:

nested model comparison,
unnested model comparison,

Nested model comparison is usually conducted using LRT-based $χ^{2}$ difference test.

A model can be seen as a special case of another by imposing constraints (force to be 0) on parameters. If the model fit of a complex model was good, then constraints can be set to test the resulting simpler model.

$Δ_{T} = T_{simpler} - T_{larger}$ If $Δ_{T}$ is significant, then the constraints are not appropriate. Otherwise, the simpler model can be used.

Unnested model comparison is usually conducted using fit indices (i.e., 1-factor model vs 2-factor model).

It is usually recommended to report multiple fit indices when comparing models (nested and non-nested), so that we can have more information. But the problem is that fit indices can disagree with each other and we do not know which one is right.

Lai’s paper

3.4.4 JARS Reporting Standards

Some recommendations related to what we just learned:

State the software (including version) used in the analysis. Also state the estimation method used and justify its use (i.e., whether its assumptions are supported by the data, if methods assuming multivariate normality was used, report statistics that measure univariate or multivariate skewness and kurtosis that support the assumption of normal distributions, otherwise state the strategy used to address nonnormality, such as use of a different estimation method that does not assume normality or use of normalizing transformations of the scores).
Disclose any default criteria in the software, such as the maximum number of iterations or level of tolerance, that were adjusted in order to achieve a converged and admissible solution.
Report fit statistics or indices about global (omnibus) fit interpreted using criteria justified by citation of most recent evidence-based recommendations for all models to be interpreted.
State the strategy or criteria used to select one model over another if alternative models were compared. Report results of difference tests for comparisons between alternative models.
Indicate whether one or more interpreted models was a product of respecification. If so, then describe the method used to search for misspecified parameters. State which parameters were fixed or freed to produce the interpreted model. Also provide a theoretical or conceptual rationale for parameters that were fixed or freed after specification searching.
Report both unstandardized and standardized estimates for all estimated parameters.