7.1 Multivariate regression

A complete presentation of this model is given in Section 3.4. We show here the setting, and the posterior distributions for facility in exposition. In particular, there are \(M\) multiply dependent variables which share the same set of regressors, and their stochastic errors are contemporaneously correlated. In particular, \(\boldsymbol{Y} = \left[ \boldsymbol{y_{1}} \ \boldsymbol{y_{2}} \ \ldots \ \boldsymbol{y_{M}} \right]\) is an \(N \times M\) matrix that is generated by \(\boldsymbol{Y} = \boldsymbol{X} \boldsymbol{B} + \boldsymbol{U}\) where \(\boldsymbol{X}\) is an \(N \times K\) matrix of regressors, \(\boldsymbol{B} = \left[ \boldsymbol{\beta}_{1} \ \boldsymbol{\beta}_{2} \ldots \boldsymbol{\beta}_{M} \right]\) is a \(K \times M\) matrix of parameters, and \(\boldsymbol{U} = \left[ \boldsymbol{\mu}_{1} \ \boldsymbol{\mu}_{2} \ldots \boldsymbol{\mu}_{M} \right]\) is a matrix of stochastic random errors such that \(\boldsymbol{\mu}_i \sim N(\boldsymbol{0}, \boldsymbol{\Sigma})\), for \(i = 1, 2, \dots, N\), each row of \(\boldsymbol{U}\).

The prior is given by \(\boldsymbol{B} \mid \boldsymbol{\Sigma} \sim N(\boldsymbol{B}_0, \boldsymbol{V}_0, \boldsymbol{\Sigma})\) and \(\boldsymbol{\Sigma} \sim IW(\boldsymbol{\Psi}_0, \alpha_0)\). Therefore, the conditional posterior distributions are:

\[ \boldsymbol{B} \mid \boldsymbol{\Sigma}, \boldsymbol{Y}, \boldsymbol{X} \sim N(\boldsymbol{B}_n, \boldsymbol{V}_n, \boldsymbol{\Sigma}), \]

\[ \boldsymbol{\Sigma} \mid \boldsymbol{Y}, \boldsymbol{X} \sim IW(\boldsymbol{\Psi}_n, \alpha_n), \]

where

\[ \boldsymbol{V}_n = (\boldsymbol{X}^{\top} \boldsymbol{X} + \boldsymbol{V}_0^{-1})^{-1}, \quad \boldsymbol{B}_n = \boldsymbol{V}_n (\boldsymbol{V}_0^{-1} \boldsymbol{B}_0 + \boldsymbol{X}^{\top} \boldsymbol{X} \hat{\boldsymbol{B}}), \]

\[ \hat{\boldsymbol{B}} = (\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \boldsymbol{X}^{\top} \boldsymbol{Y}, \quad \boldsymbol{S} = (\boldsymbol{Y} - \boldsymbol{X}\hat{\boldsymbol{B}})^{\top}(\boldsymbol{Y} - \boldsymbol{X}\hat{\boldsymbol{B}}),\] \[ \boldsymbol{\Psi}_n = \boldsymbol{\Psi}_{0} + \boldsymbol{S} + \boldsymbol{B}_{0}^{\top} \boldsymbol{V}_{0}^{-1} \boldsymbol{B}_{0} + \hat{\boldsymbol{B}}^{\top} \boldsymbol{X}^{\top} \boldsymbol{X} \hat{\boldsymbol{B}} - \boldsymbol{B}_n^{\top} \boldsymbol{V}_n^{-1} \boldsymbol{B}_n, \]

and

\[ \alpha_n = \alpha_0 + N. \]

We can use a Gibbs sampling algorithm in this model since the conditional posterior distributions are standard.

Example: The effect of institutions on per capita gross domestic product

To illustrate multivariate regression models, we use the dataset provided by Acemoglu, Johnson, and Robinson (2001), who analyzed the effect of property rights on economic growth.

We begin with the following simultaneous structural economic model:⁴¹ \[\begin{align} \log(\text{pcGDP95}_i) &= \beta_1 + \beta_2 \text{PAER}_i + \beta_3 \text{Africa} + \beta_4 \text{Asia} + \beta_5 \text{Other} + u_{1i}, \tag{7.1} \end{align}\] \[\begin{align} \text{PAER}_i &= \alpha_1 + \alpha_2 \log(\text{pcGDP95}_i) + \alpha_3 \log(\text{Mort}_i) + u_{2i}, \tag{7.2} \end{align}\] where pcGDP95, PAER, and Mort represent the per capita gross domestic product (GDP) in 1995, the average index of protection against expropriation between 1985 and 1995, and the settler mortality rate during the period of colonization, respectively. Africa, Asia, and Other are indicator variables for continents, with America serving as the baseline group.

In this model, there is simultaneous causality due to the contemporaneous effect of GDP on PAER, and vice versa. Therefore, estimating Equations (7.1) and (7.2) without accounting for this phenomenon results in posterior mean estimates that are biased and inconsistent from a sampling (frequentist) perspective.⁴² A potential strategy to address this issue is to estimate the reduced-form model, i.e., a model without simultaneous causality, where all endogenous variables are functions of exogenous variables. The former are determined within the model (e.g., \(\log(\text{pcGDP95}_i)\) and PAER in this example), while the latter are determined outside the model (e.g., \(\log(\text{Mort}_i)\), Africa, Asia, and Other in this example).

Replacing Equation (7.2) into Equation (7.1), and solving for \(\log(\textit{pcGDP95})\), \[\begin{align} \log(\text{pcGDP95}_i)=\pi_1+\pi_2\log(\text{Mort}_i)+\pi_3 \text{Africa}_i+\pi_4 \text{Asia}_i+\pi_5 \text{Other}_i+e_{1i}. \tag{7.3} \end{align}\] Then, by substituting Equation (7.3) into Equation (7.2), and solving for PAER, we obtain \[\begin{align} \text{PAER}_i = \gamma_1 + \gamma_2 \log(\text{Mort}_i) + \gamma_3 \text{Africa}_i + \gamma_4 \text{Asia}_i + \gamma_5 \text{Other}_i + e_{2i}, \tag{7.4} \end{align}\] where \(\pi_2 = \frac{\beta_2\alpha_3}{1 - \beta_2\alpha_2}\) and \(\gamma_2 = \frac{\alpha_3}{1 - \beta_2\alpha_2}\), given that \(\beta_2 \alpha_2 \neq 1\), i.e., independent equations (see Exercise 2).

Observe that Equations (7.3) and (7.4) have the form of a multivariate regression model, where the common set of regressors is \[ \boldsymbol{X} = \left[\log(\text{Mort}) \ \text{Africa} \ \text{Asia} \ \text{Other}\right] \] and the common set of dependent variables is \[ \boldsymbol{Y} = \left[\log(\text{pcGDP95}) \ \text{PAER}\right]. \] Therefore, we can estimate this model using the setup outlined in this section.

In the first stage, we estimate the parameters of the reduced-form model (Equations (7.3) and (7.4)), but the main interest lies in estimating the parameters of the structural model (Equations (7.1) and (7.2)). A valid question is whether we can recover (identify) the structural parameters from the reduced-form parameters. There are two criteria to answer this question: the order condition, which is necessary, and the rank condition, which is both necessary and sufficient.⁴³

The order condition

Given a system of equations with \(M\) endogenous variables and \(K\) exogenous variables (including the intercept), there are two ways to assess the order condition:

The parameters of an equation in the system are identified if there are at least \(M-1\) variables excluded from that equation (exclusion restrictions). The equation is exactly identified if the number of excluded variables equals \(M-1\), and over-identified if the number of excluded variables is greater than \(M-1\).
The parameters of equation \(m\) in the system are identified if \(K-K_m \geq M_m-1\), where \(K_m\) and \(M_m\) are the number of exogenous and endogenous variables in equation \(m\), respectively. The \(m\)-th equation is exactly identified if \(K-K_m = M_m-1\), and over-identified if \(K-K_m > M_m-1\).

Otherwise, the structural parameters are under-identified.

We can see from Equations (7.1) and (7.2) in this example that \(K=5\), \(M=2\), \(K_1=4\), \(K_2=2\), \(M_1=2\), and \(M_2=2\). This means that \(K-K_1=1=M-1\) and \(K-K_2=3>M-1=1\), that is, the order condition says that both equations satisfy the necessary condition of identification, the first equation would be exactly identified, and the second equation would be over identified. Observe that there is one excluded variable from the first equation, and there are three excluded variables from the second equation.

The rank condition

The rank condition (necessary and sufficient) states that, given a structural model with \(M\) equations (and \(M\) endogenous variables), an equation is identified if and only if there exists at least one nonzero determinant of an \((M-1)\times(M-1)\) matrix constructed from the variables excluded from the equation under analysis but included in at least one other equation of the system. Otherwise, the structural parameters are said to be under-identified.

It is useful to build the identification matrix to implement the rank condition. The next Table shows this matrix in this example.

Table 7.1: Example: Rank condition
\(\log(\text{pcGDP95})\)	\(\text{PAER}\)	Constant	\(\log(\text{Mort})\)	Africa	Asia	Other
1	\(-\beta_2\)	\(-\beta_1\)	0	\(-\beta_3\)	\(-\beta_4\)	\(-\beta_5\)
\(-\alpha_2\)	1	\(-\alpha_1\)	\(-\alpha_3\)	0	0	0

The only excluded variable in the \(\log(\text{pcGDP95})\) equation is \(\log(\text{Mort})\). Therefore, there is only one matrix that can be constructed using the excluded variables from this equation, which is \([-\alpha_3]\) (see column 4 in the Table). The determinant of this matrix is \(-\alpha_3\), and as long as this coefficient is nonzero (i.e., \(\alpha_3 \neq 0\)), meaning that the mortality rate is relevant in the PAER equation, the coefficients in the \(\log(\text{pcGDP95})\) equation are exactly identified. For example, \(\beta_2 = \frac{\pi_2}{\gamma_2}\), which represents the effect of property rights on GDP, is exactly identified.

It is crucial to observe the importance of excluding \(\log(\text{Mort})\) from the \(\log(\text{pcGDP95})\) equation, while including \(\log(\text{Mort})\) in the PAER equation. This is known as the exclusion restriction, which requires the presence of an exogenous source of variability in the PAER equation to help identify the \(\log(\text{pcGDP95})\) equation. The presence of relevant exogenous sources of variability is an essential factor in the identification, estimation, and inference of structural parameters.

As for the identification of the structural parameters in the PAER equation, there are three potential matrices that can be constructed: \([-\beta_3]\), \([-\beta_4]\), and \([-\beta_5]\) (see columns 5, 6, and 7 in the Table). As long as any of these parameters are relevant in the \(\log(\text{pcGDP95})\) equation, the PAER equation is identified. In this case, the PAER equation is over-identified, meaning there are multiple ways to estimate the parameters in this equation. For example, \(\alpha_2 = \gamma_3/\pi_3 = \gamma_4/\pi_4 = \gamma_5/\pi_5\) (see Exercise 2).

In general, recovering the structural parameters from the reduced-form parameters can be challenging due to the need for relevant identification restrictions, which can be difficult to find in some applications.⁴⁴

For this example, we set non-informative priors: \(\boldsymbol{B}_0 = \left[\boldsymbol{0}_5 \ \boldsymbol{0}_5\right]\), \(\boldsymbol{V}_0 = 100 \boldsymbol{I}_K\), \(\boldsymbol{\Psi}_0 = 5 \boldsymbol{I}_2\), and \(\alpha_0 = 5\).⁴⁵ Once our GUI is displayed (see the beginning of this chapter), we should follow the next Algorithm to run multivariate linear models in the GUI (see Chapter 5 for details, particularly on how to set the data set).

Algorithm: Multivariate Linear Model

Select Multivariate Models on the top panel
Select Simple Multivariate model using the left radio button
Upload the dataset selecting first if there is header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Select the number of dependent variables in the box Number of endogenous variables: m
Select the number of independent variables (including the intercept) in the box Number of exogenous variables: k
Set the hyperparameters: mean vectors, covariance matrix, degrees of freedom, and the scale matrix. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

The following R code shows how to perform the Gibss sampling algorithm in this example using the dataset 4Institutions.csv. We ask to run this example using the rmultireg command from the bayesm package as an exercise. We find that the posterior mean structural effect of property rights on GDP is 0.98, and the 95% credible interval is (0.56, 2.87). This means that there is evidence supporting a positive effect of property rights on gross domestic product.

rm(list = ls())
set.seed(12345)
DataInst <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/4Institutions.csv", sep = ",", header = TRUE, quote = "")
attach(DataInst)
Y <- cbind(logpcGDP95, PAER)
X <- cbind(1, logMort, Africa, Asia, Other)
M <- dim(Y)[2]
K <- dim(X)[2]
N <- dim(Y)[1]
# Hyperparameters
B0 <- matrix(0, K, M)
c0 <- 100
V0 <- c0*diag(K)
Psi0 <- 5*diag(M)
a0 <- 5
# Posterior parameters
Bhat <- solve(t(X)%*%X)%*%t(X)%*%Y 
S <- t(Y - X%*%Bhat)%*%(Y - X%*%Bhat)
Vn <- solve(solve(V0) + t(X)%*%X) 
Bn <- Vn%*%(solve(V0)%*%B0 + t(X)%*%X%*%Bhat)
Psin <- Psi0 + S + t(B0)%*%solve(V0)%*%B0 + t(Bhat)%*%t(X)%*%X%*%Bhat - t(Bn)%*%solve(Vn)%*%Bn
an <- a0 + N
#Posterior draws
s <- 10000 #Number of posterior draws
SIGs <- replicate(s, LaplacesDemon::rinvwishart(an, Psin))
BsCond <- sapply(1:s, function(s) {MixMatrix::rmatrixnorm(n = 1, mean=Bn, U = Vn,V = SIGs[,,s])})
summary(coda::mcmc(t(BsCond)))

## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean      SD  Naive SE Time-series SE
##  [1,] 10.3789 0.44935 0.0044935      0.0044935
##  [2,] -0.4170 0.09972 0.0009972      0.0009972
##  [3,] -0.7318 0.24032 0.0024032      0.0024032
##  [4,] -0.5840 0.28959 0.0028959      0.0028959
##  [5,]  0.2972 0.48248 0.0048248      0.0048248
##  [6,]  8.5156 0.75222 0.0075222      0.0075222
##  [7,] -0.4283 0.16725 0.0016725      0.0016725
##  [8,] -0.2677 0.40099 0.0040099      0.0040361
##  [9,]  0.3475 0.48749 0.0048749      0.0048749
## [10,]  1.2454 0.81787 0.0081787      0.0081787
## 
## 2. Quantiles for each variable:
## 
##          2.5%      25%     50%        75%    97.5%
## var1   9.4953 10.07783 10.3766 10.6768666 11.27717
## var2  -0.6136 -0.48366 -0.4158 -0.3496649 -0.22143
## var3  -1.1985 -0.89381 -0.7327 -0.5695042 -0.25578
## var4  -1.1527 -0.77805 -0.5827 -0.3900056 -0.01057
## var5  -0.6587 -0.02514  0.2945  0.6224207  1.24241
## var6   7.0210  8.01999  8.5116  9.0111246 10.00396
## var7  -0.7558 -0.53971 -0.4285 -0.3165409 -0.10184
## var8  -1.0600 -0.53703 -0.2676  0.0002646  0.52081
## var9  -0.6090  0.02377  0.3500  0.6740956  1.30291
## var10 -0.3446  0.69699  1.2418  1.7939884  2.86752

SIGMs <- t(sapply(1:s, function(l) {gdata::lowerTriangle(SIGs[,,l], diag=TRUE, byrow=FALSE)}))
summary(coda::mcmc(SIGMs))

## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean      SD  Naive SE Time-series SE
## [1,] 0.5381 0.09524 0.0009524      0.0009813
## [2,] 0.5372 0.13144 0.0013144      0.0013144
## [3,] 1.5225 0.26763 0.0026763      0.0026763
## 
## 2. Quantiles for each variable:
## 
##        2.5%    25%    50%    75%  97.5%
## var1 0.3853 0.4695 0.5267 0.5942 0.7545
## var2 0.3181 0.4449 0.5234 0.6137 0.8359
## var3 1.0863 1.3324 1.4924 1.6799 2.1354

hdiBs <- HDInterval::hdi(t(BsCond), credMass = 0.95) # Highest posterior density credible interval
hdiBs

##           [,1]       [,2]       [,3]        [,4]      [,5]      [,6]
## lower  9.44900 -0.6131229 -1.1769984 -1.16535714 -0.670223  7.041383
## upper 11.22169 -0.2212954 -0.2393919 -0.02786525  1.227650 10.017329
##              [,7]       [,8]       [,9]      [,10]
## lower -0.74678771 -1.0419989 -0.6358514 -0.3253181
## upper -0.09525447  0.5322414  1.2748474  2.8767518
## attr(,"credMass")
## [1] 0.95

hdiSIG <- HDInterval::hdi(SIGMs, credMass = 0.95) # Highest posterior density credible interval
hdiSIG

##            [,1]      [,2]     [,3]
## lower 0.3735344 0.2960049 1.023421
## upper 0.7318534 0.8022998 2.034439
## attr(,"credMass")
## [1] 0.95

beta2 <- BsCond[2,]/BsCond[7,] 
summary(coda::mcmc(beta2)) # Effect of property rights on GDP

## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##         0.9796        16.8430         0.1684         0.1684 
## 
## 2. Quantiles for each variable:
## 
##   2.5%    25%    50%    75%  97.5% 
## 0.5604 0.7984 0.9677 1.2329 2.8709

References

Acemoglu, D., S. Johnson, and J. Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” The American Economic Review 91 (5): 1369–1401.

Gujarati, Damodar N., and Dawn C. Porter. 2009. Basic Econometrics. 5th ed. New York, NY: McGraw-Hill Education.

Koop, Gary M. 2003. Bayesian Econometrics. John Wiley & Sons Inc.

———. 2016. Introductory Econometrics: A Modern Approach. 6th ed. Boston, MA: Cengage Learning.

———. 1996b. “Introduction to Bayesian Inference in Econometrics.”

This model captures the potential underlying economic relationship between the variables.↩︎
Note that \(\mathbb{E}[u_1\text{PAER}]\neq 0\), which means failing to meet a necessary condition for obtaining unbiased and consistent estimators of \(\boldsymbol{\beta}\). See Exercise 1.↩︎
We should clarify that the posterior distribution of the structural parameters, \(\pi(\boldsymbol{\beta},\boldsymbol{\alpha}\mid \boldsymbol{\gamma},\boldsymbol{\pi},\mathbf{Y},\mathbf{X})\), is proportional to the prior distribution of the structural parameters conditional on the reduced-form parameters, \(\pi(\boldsymbol{\beta},\boldsymbol{\alpha}\mid \boldsymbol{\gamma},\boldsymbol{\pi})\). In other words, it is only the prior distribution of the reduced-form parameters that is updated by the sample information. The updating of the structural parameters occurs solely through the reduced-form parameters. See Section 9.3 in Zellner (1996b) for details.↩︎
Good introductory-level textbooks on identification in linear systems include Gujarati and Porter (2009), and Jeffrey M. Wooldridge (2016).↩︎
Note that we are setting the priors in the reduced-form model. This may have unintended consequences for the posterior distributions of the structural parameters, which are ultimately the parameters of interest to researchers. For further discussion, see G. M. Koop (2003).↩︎