3  Path analysis

An easy and convenient representation of the relationships among a number of variables is using path diagram, we have saw a lot in past chapters. A path diagram can be viewed as a hypothesised/theory-based model, specifying the structure among variables in interest. We collect data to test whether our sample support the proposed model. Basically, path analysis is the analysis of the “path”.

3.1 Path diagram

  • Rectangle: observed or manifest variable
  • Circle: latent variable (i.e. error, factor)
  • Single-headed arrow: linear relationship between two variables, starts from an independent variable and ends on a dependent variable.
  • Double-headed arrow: variance of a variable or covariance between two variables.

3.2 What is the difference between multiple regression and path analysis

  • In path analysis, a variable can be both dependent variable and independent variable at the same time, e.g. mediator, in multiple regression, dependent variable can not be independent variable.
  • Multiple regression is more restricted in terms of the type of hypothesis it can test. We can only test whether a independent variable can effectively predict dependent variable while controlling other predictors; whereas path analysis can do more than that, e.g. mediation mechanism.

3.3 Effects Decomposition

When doing path analysis, we impose a theoretical structure upon our variables and derive Σ(θ). In Σ(θ), all the variances/correlations between one independent variable and one dependent variable can be partitioned into a summation of different parts using model parameters. This is called effects decomposition. There are four types of possible resulting effects.

3.3.1 Four types of effects

  1. Causal effect (most important)
  • Direct effect: a direct effect is represented by a single causal arrow from a independent variable to a dependent variable, i.e. x leads to y, the change in x will “cause” a change in y;
  • Indirect effect: an indirect effect is represented by two or more spliced causal arrows between an independent variable and a dependent variable, i.e. x leads to m, m leads to y, the causal effect of x on y is transmitted via m;
  1. Non-causal effect
  • Undecomposed effect: when there exists causal effect between an independent variable and a dependent variable, the causal effect accounts for part of the covariance/correlation between these two variables, the rest part of their covariance/correlation is undecomposed effect because it can not be decomposed clearly. This is usually because there is another independent variable that is in the same regression model but correlated with the very independent variable, we do not know which independent variable should be responsible for the leftover covariance/correlation. i.e. x1 and x2 are two predictors of y, the change in x1 can spread into x2 (because correlated), and the change in x2 lead to change in y;
  • Spurious covariance/correlation: the part of covariance/correlation between two dependent variables that roots in having same independent variable or correlated independent variables, i.e. x is the predictor of both y1 and y2, the change in x will lead to change in y1 and y2, making y1 and y2 seemingly correlated.

3.3.2 Effects decomposition via tracing diagram and Σ(θ)

Let’s look at an example, we have four variable:

  • SES: social economic status
  • IQ: intelligence quotient
  • nACH: need for achievement
  • GPA: grade-point average

We specify the relationship among this four variables as

The corresponding regressions are nACH=β31SES+β32IQ+ϵnACHGPA=β41SES+β42IQ+β43nACH+ϵGPA. The resultant Σ(θ) is Σ(θ)=[σSES2σSES,IQσSES,nACHσSES,GPAσIQ,SESσIQ2σIQ,nACHσIQ,GPAσnACH,SESσnACH,IQσnACH2σnACH,GPAσGPA,SESσGPA,IQσGPA,nACHσGPA2].

In the following figures, red represents direct effects, blue represent indirect effects, green represents undecomposited effects, and grey represents spurious covariance/correlation.

  • The decomposition of σSES,nACH

σSES,nACH=Cov(SES,β31SES+β32IQ+ϵnACH)=β31σSES2+β32σSES,IQ,

  • The decomposition of σIQ,nACH

σIQ,nACH=Cov(IQ,β31SES+β32IQ+ϵnACH)=β32σIQ2+β31σIQ,SES

  • The decomposition of σSES,GPA

σSES,GPA=Cov(SES,β41SES+β42IQ+β43nACH+ϵGPA)=β41σSES2+β42σSES,IQ+β43σSES,nACH=β41σSES2+β43β31σSES2+β42σSES,IQ+β43β32σSES,IQ.

  • The decomposition of σIQ,GPA

σIQ,GPA=Cov(IQ,β41SES+β42IQ+β43nACH+ϵGPA)=β41σIQ,SES+β42σIQ2+β43σIQ,nACH=β42σIQ2+β43β32σIQ2+β41σIQ,SES+β43β31σIQ,SES.

  • The decomposition of σnACH,GPA

σnACH,GPA=Cov(nACH,β41SES+β42IQ+β43nACH+ϵGPA)=β41σnACH,SES+β42σnACH,IQ+β43σnACH2=β43σnACH2+ β41β31σSES2+β41β32σSES,IQ+ β42β32σIQ2+β42β31σIQ,SES

By standardize all variables, the variance terms becomes 1 and disappear, the covariance terms become correlations, and the regression coefficients become standardized ones, we have

ρSES,nACH=β31+β32ρSES,IQρIQ,nACH=β32+β31ρIQ,SESρSES,GPA=β41+β43β31+β42ρSES,IQ+β43β32ρSES,IQρIQ,GPA=β42+β43β32+β41ρIQ,SES+β43β31ρIQ,SESρnACH,GPA=β43+β41β31+β42β32+β41β32ρSES,IQ+β42β31ρIQ,SES

Once the estimated parameters are available, it is very easy to manually conduct the effects decomposition of given two variables. For example, with the estimated paths below, the indirect effect between SES and GPA is 0.398×0.416.

3.3.3 Homework

ex3.11
  1. Decompose rx1y3;

3.4 Modeling process

After specifying a theoretical structure, the next step of SEM analysis (including path analysis), is modeling, i.e. collecting data and evaluating model. When dealing with multiple variables, it is very likely that we can have a group of competing models, we do not know which one is the best, we have to perform model evaluation in roughly two steps:

  1. model estimation: obtain the optimal estimates of unknown parameters and model-data fit for these model;
  2. model comparison: compare the model-data fit to find the best model.

3.4.1 Model estimation

3.4.1.1 Four basic matrices in SEM

At population level, when studying a set of mutually correlated variables, the covariance matrix of these variables is Σ. If we imposed a structure with unknown parameters θ upon these group of variables, we can derive the model-implied covariance matrix Σ(θ). If the structure we specified was true, i.e. correctly represent the relationship among variables in interest, then Σ=Σ(θ).

At sample level, the sample estimator of Σ is sample covariance matrix S, the sample estimator of Σ(θ) is Σ(θ^). If we specified the true model, S should be closely approximated by Σ(θ^), but due to sampling error, they won’t be exactly the same.

If we misspecified a wrong model, then the discrepancy between Σ and Σ(θ) will undoubtedly increase, so will that between S and Σ(θ^). Therefore, when performing model evaluation, we are effectively trying to find the best model with θ^ that can minimize the discrepancy between SΣ(θ^).

3.4.1.2 Discrepancy function

The purpose of model estimation is to find the best sample estimates for the unknown parameters in a given model. The “best” is achieved by minimize the discrepancy using sample. For example (this example is taken from the lecture note of SEM class taught by Professor Zhang Zhiyong at University of Notre Dame), we have two random variables x and y with

S=[2112]

We assume the true relationship between x and y is y=βx+ϵ, thus we have

Σ(θ)=[σx2βσx2βσx2β2σx2+σϵ2],

where the unknown parameters are β, σx2, and σϵ2. We can try different sets of θ^ to see how SΣ(θ^) differ. When θ^1=(0,1,1),

Σ(θ^1)=[1001],

then

SΣ(θ^1)=[1111].

When θ^2=(1,1,1),

Σ(θ^2)=[1112],

and

SΣ(θ^2)=[1000].

It is clear that for this given model, θ^2 is better, but we do not know whether θ^2 is the best sample estimate of θ. We need to quantify the difference between two matrices with identical size and an algorithm to minimize the difference to find the best θ^.

Noted that, SΣ(θ^) is actually a matrix function of θ^, called discrepancy function F. Minimizing a matrix function quickly becomes unmanageable as the size of matrix function increase. To that end, we usually summarize the difference between S and Σ(θ^) as a scalar discrepancy function F, different estimation methods only differ in the way they summarize.

3.4.1.3 Common estimation methods (skimming)

  • Ordinal least square (OLS)

FOLS(θ)=[sσ(θ)][sσ(θ)],

where s and σ(θ) are the p(p+1)/2 unique elements from S and Σ, respectively.

  • Multivariate normal distribution maximum likelihood (NML)

FNML=log|Σ(θ)|+tr(SΣ1(θ))log|S|p.

NML is usually the default estimation method in most SEM software. Note that, (n1)FNML=2logλ=T.

3.4.2 Model fit

After fitting a specified model to data, we answer the question “how good is our model” by model-data fit. Model fit indices abound, most of them are directly based on likelihood ratio test (LRT).

3.4.2.1 The best model and the worst model

In SEM, the best model we can fit to data is the Σ(θ)=S, that is the saturated model/unrestricted model/just-identified model. It is worth noting that in path analysis, unrestricted model means all possible paths are allowed, saturated because no extra parameter can be added to the existing model, df=0. Following is an example of saturated model.

The worst model we can fit to data is the baseline model/null model. In this model, all p variables are assumed to be independent to each other, thus the resulting Σ(θ)=IP.

3.4.2.2 Incremental measure of fit

In SEM, T=(n1)FML is the LRT test statistics of tested model (H0 model), it asymptotically follows χ2 distribution with df=p(p+1)/2q, where q is the number of parameters.

Comparative Fit Index, CFI=1max(Tdf,0)max(T0df0,Tdf), where T0 is the resulting test statistics of LRT if we fit null model to data, and df0=p(p+1)/2. CFI measures the extent that fitted model improves compared with the null model. It is restricted between 0 and 1.

Tucker Lewis Index, TLI/Non-normed Fit Index, NNFI=1df0df(TdfT0df0). TLI adds penalty for complex model, thus tend to endorse model with less parameters, Note that, CFI and TLI should be very close to each other. If the CFI is less than one, then the CFI is always greater than the TLI, only one of the two should be reported to avoid redundancy. But TLI could exceed 1 under certain condition, if so, it is capped at 1 in most SEM software, except for Mplus.

For CFI and TLI:

  • >0.95 indicates good model-data fit
  • >0.9 indicates acceptable model-data fit

Root Mean Square Error of Approximation, RMSEA=Tdfdf(n1). RMSEA measures the average model misfit per df. It is always positive. Just like TLI, RMSEA tends to endorse small-size model. There is greater sampling error for small df and low n models, especially for the former. Thus, models with small df and low n can have artificially large values of the RMSEA. For instance, a T of 2.098 (a value not statistically significant), with a df of 1 and n of 70 yields an RMSEA of 0.126. For this reason Kenny & Kaniskan () argue to not even compute the RMSEA for low df models.

A confidence interval can be computed for the RMSEA. Its formula is based on the non-central χ2 distribution and usually the 90% interval is used. Ideally the lower value of the 90% confidence interval includes or is very near zero (or no worse than 0.05) and the upper value is not very large, i.e., less than 0.08 or perhaps a 0.10. The width of the confidence interval can be very informative about the precision in the estimate of the RMSEA.

  • RMSEA0.05 indicates a close fit of a model
  • RMSEA0.08 indicates a reasonable model
  • RMSEA>1 indicates a bad model

Note that CFI, TLI and RMSEA treat a T=df as the best possible model.

Standardized Root Mean Square Residual, SRMR=2i=1pj=1i[sijσ^ijsiisjj]2p(p+1).

The SRMR is an absolute measure of fit and is defined as the standardized difference between the observed correlation and the predicted correlation. It is a positively biased measure and that bias is greater for small N and for low df studies. Because the SRMR is an absolute measure of fit, a value of zero indicates perfect fit. The SRMR has no penalty for model complexity. A value less than .08 is generally considered a good fit ().

3.4.2.3 Comparative measure of fit

Flowing are 3 commonly used comparative measure of fit, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and sample-size adjusted BIC (SABIC).

AIC=T+2qBIC=T+log(n)qSABIC=T+log(n+224)q. Because AIC and BIC have no fixed scale, no cut-off value is available.

Lower values of AIC indicate a better fit and so the model with the lowest AIC is the best fitting model. There are somewhat different formulas given for the AIC in the literature, but those differences are not really meaningful as it is the difference in AIC that really matters. The AIC makes the researcher pay a penalty of two for every parameter that is estimated.

BIC increases the penalty as sample size increases. The BIC places a high value on parsimony. The SABIC or SABIC like the BIC places a penalty for adding parameters based on sample size, but not as high a penalty as the BIC.

n <- seq(100, 1000, 100)
plot(log(n), log((n + 2)/24))

3.4.3 Model comparison

In SEM, there are two types of model comparison:

  • nested model comparison,
  • unnested model comparison,

Nested model comparison is usually conducted using LRT-based χ2 difference test.

A model can be seen as a special case of another by imposing constraints (force to be 0) on parameters. If the model fit of a complex model was good, then constraints can be set to test the resulting simpler model.

ΔT=TsimplerTlarger If ΔT is significant, then the constraints are not appropriate. Otherwise, the simpler model can be used.

Unnested model comparison is usually conducted using fit indices (i.e., 1-factor model vs 2-factor model).

It is usually recommended to report multiple fit indices when comparing models (nested and non-nested), so that we can have more information. But the problem is that fit indices can disagree with each other and we do not know which one is right.

Lai’s paper

3.4.4 JARS Reporting Standards

Some recommendations related to what we just learned:

  1. State the software (including version) used in the analysis. Also state the estimation method used and justify its use (i.e., whether its assumptions are supported by the data, if methods assuming multivariate normality was used, report statistics that measure univariate or multivariate skewness and kurtosis that support the assumption of normal distributions, otherwise state the strategy used to address nonnormality, such as use of a different estimation method that does not assume normality or use of normalizing transformations of the scores).
  2. Disclose any default criteria in the software, such as the maximum number of iterations or level of tolerance, that were adjusted in order to achieve a converged and admissible solution.
  3. Report fit statistics or indices about global (omnibus) fit interpreted using criteria justified by citation of most recent evidence-based recommendations for all models to be interpreted.
  4. State the strategy or criteria used to select one model over another if alternative models were compared. Report results of difference tests for comparisons between alternative models.
  5. Indicate whether one or more interpreted models was a product of respecification. If so, then describe the method used to search for misspecified parameters. State which parameters were fixed or freed to produce the interpreted model. Also provide a theoretical or conceptual rationale for parameters that were fixed or freed after specification searching.
  6. Report both unstandardized and standardized estimates for all estimated parameters.

3.5 Real data example