7.1 Multivariate regression
A complete presentation of this model is given in Section 3.4. We show here the setting, and the posterior distributions for facility in exposition. In particular, there are M multiply dependent variables which share the same set of regressors, and their stochastic errors are contemporaneously correlated. In particular, \boldsymbol{Y} = \left[ \boldsymbol{y_{1}} \ \boldsymbol{y_{2}} \ \ldots \ \boldsymbol{y_{M}} \right] is an N \times M matrix that is generated by \boldsymbol{Y} = \boldsymbol{X} \boldsymbol{B} + \boldsymbol{U} where \boldsymbol{X} is an N \times K matrix of regressors, \boldsymbol{B} = \left[ \boldsymbol{\beta}_{1} \ \boldsymbol{\beta}_{2} \ldots \boldsymbol{\beta}_{M} \right] is a K \times M matrix of parameters, and \boldsymbol{U} = \left[ \boldsymbol{\mu}_{1} \ \boldsymbol{\mu}_{2} \ldots \boldsymbol{\mu}_{M} \right] is a matrix of stochastic random errors such that \boldsymbol{\mu}_i \sim N(\boldsymbol{0}, \boldsymbol{\Sigma}), for i = 1, 2, \dots, N, each row of \boldsymbol{U}.
The prior is given by \boldsymbol{B} \mid \boldsymbol{\Sigma} \sim N(\boldsymbol{B}_0, \boldsymbol{V}_0, \boldsymbol{\Sigma}) and \boldsymbol{\Sigma} \sim IW(\boldsymbol{\Psi}_0, \alpha_0). Therefore, the conditional posterior distributions are:
\boldsymbol{B} \mid \boldsymbol{\Sigma}, \boldsymbol{Y}, \boldsymbol{X} \sim N(\boldsymbol{B}_n, \boldsymbol{V}_n, \boldsymbol{\Sigma}),
\boldsymbol{\Sigma} \mid \boldsymbol{Y}, \boldsymbol{X} \sim IW(\boldsymbol{\Psi}_n, \alpha_n),
where
\boldsymbol{V}_n = (\boldsymbol{X}^{\top} \boldsymbol{X} + \boldsymbol{V}_0^{-1})^{-1}, \quad \boldsymbol{B}_n = \boldsymbol{V}_n (\boldsymbol{V}_0^{-1} \boldsymbol{B}_0 + \boldsymbol{X}^{\top} \boldsymbol{X} \hat{\boldsymbol{B}}),
\hat{\boldsymbol{B}} = (\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \boldsymbol{X}^{\top} \boldsymbol{Y}, \quad \boldsymbol{\Psi}_n = \boldsymbol{\Psi}_{0} + \boldsymbol{S} + \boldsymbol{B}_{0}^{\top} \boldsymbol{V}_{0}^{-1} \boldsymbol{B}_{0} + \hat{\boldsymbol{B}}^{\top} \boldsymbol{X}^{\top} \boldsymbol{X} \hat{\boldsymbol{B}} - \boldsymbol{B}_n^{\top} \boldsymbol{V}_n^{-1} \boldsymbol{B}_n,
and
\alpha_n = \alpha_0 + N.
We can use a Gibbs sampling algorithm in this model since the conditional posterior distributions are standard.
Example: The effect of institutions on per capita gross domestic product
To illustrate multivariate regression models, we use the dataset provided by Acemoglu, Johnson, and Robinson (2001), who analyzed the effect of property rights on economic growth.
We begin with the following simultaneous structural economic model:36 \begin{align} \log(\text{pcGDP95}_i) &= \beta_1 + \beta_2 \text{PAER}_i + \beta_3 \text{Africa} + \beta_4 \text{Asia} + \beta_5 \text{Other} + u_{1i}, \tag{7.1} \end{align} \begin{align} \text{PAER}_i &= \alpha_1 + \alpha_2 \log(\text{pcGDP95}_i) + \alpha_3 \log(\text{Mort}_i) + u_{2i}, \tag{7.2} \end{align} where pcGDP95, PAER, and Mort represent the per capita gross domestic product (GDP) in 1995, the average index of protection against expropriation between 1985 and 1995, and the settler mortality rate during the period of colonization, respectively. Africa, Asia, and Other are indicator variables for continents, with America serving as the baseline group.
In this model, there is reverse (simultaneous) causality due to the contemporaneous effect of GDP on PAER, and vice versa.37 Therefore, estimating Equations (7.1) and (7.2) without accounting for this phenomenon results in posterior mean estimates that are biased and inconsistent from a sampling (frequentist) perspective.38 A potential strategy to address this issue is to estimate the reduced-form model, i.e., a model without simultaneous causality, where all endogenous variables are functions of exogenous variables. The former are determined within the model (e.g., \log(\text{pcGDP95}_i) and PAER in this example), while the latter are determined outside the model (e.g., \log(\text{Mort}_i), Africa, Asia, and Other in this example).
Replacing Equation (7.2) into Equation (7.1), and solving for \log(\textit{pcGDP95}), \begin{align} \log(\text{pcGDP95}_i)=\pi_1+\pi_2\log(\text{Mort}_i)+\pi_3 \text{Africa}+\pi_4 \text{Asia}+\pi_5 \text{Other}+e_{1i}. \tag{7.3} \end{align} Then, by substituting Equation (7.3) into Equation (7.2), and solving for PAER, we obtain \begin{align} \text{PAER}_i = \gamma_1 + \gamma_2 \log(\text{Mort}_i) + \gamma_3 \text{Africa} + \gamma_4 \text{Asia} + \gamma_5 \text{Other} + e_{2i}, \tag{7.4} \end{align} where \pi_2 = \frac{\beta_2\alpha_3}{1 - \beta_2\alpha_2} and \gamma_2 = \frac{\alpha_3}{1 - \beta_2\alpha_2}, given that \beta_2 \alpha_2 \neq 1, i.e., independent equations (see Exercise 2).
Observe that Equations (7.3) and (7.4) have the form of a multivariate regression model, where the common set of regressors is \boldsymbol{X} = \left[\log(\text{Mort}) \ \text{Africa} \ \text{Asia} \ \text{Other}\right] and the common set of dependent variables is \boldsymbol{Y} = \left[\log(\text{pcGDP95}) \ \text{PAER}\right]. Therefore, we can estimate this model using the setup outlined in this section.
In the first stage, we estimate the parameters of the reduced-form model (Equations (7.3) and (7.4)), but the main interest lies in estimating the parameters of the structural model (Equations (7.1) and (7.2)). A valid question is whether we can recover (identify) the structural parameters from the reduced-form parameters. There are two criteria to answer this question: the order condition, which is necessary, and the rank condition, which is both necessary and sufficient.
The order condition
Given a system of equations with M endogenous variables, and K exogenous variables (including the intercept), there are two ways to assess the order condition:
- The parameters of an equation in the system are identified if there are at least M-1 variables excluded from the equation (exclusion restrictions). The equation is exactly identified if the number of excluded variables is M-1, and is over identified if the number of excluded variables is greater than M-1.
- The parameters of equation m in the system are identified if K-K_m\geq M_m-1, where K_m and M_m are the number of exogenous and endogenous variables in equation m, respectively. The m-th equation is exactly identified if K-K_m = M_m-1, and over identified if K-K_m > M_m-1.
We can see from Equations (7.1) and (7.2) in this example that K=5, M=2, K_1=4, K_2=2, M_1=2, and M_2=2. This means that K-K_1=1=M-1 and K-K_2=3>M-1=1, that is, the order condition says that both equations satisfy the necessary condition of identification, the first equation would be exactly identified, and the second equation would be over identified. Observe that there is one excluded variable from the first equation, and there are three excluded variables from the second equation.
The rank condition
The rank condition (necessary and sufficient) says that given a structural model with M equations (M endogenous variables), an equation is identified if and only if there is at least one determinant different from zero from a (M-1)\times(M-1) matrix built using the excluded variables in the analyzed equation, but included in any other equation of the system.
It is useful to build the identification matrix to implement the rank condition. The next Table shows this matrix in this example.
\log(\text{pcGDP95}) | \text{PAER} | Constant | \log(\text{Mort}) | Africa | Asia | Other |
1 | -\beta_2 | -\beta_1 | 0 | -\beta_3 | -\beta_4 | -\beta_5 |
-\alpha_2 | 1 | -\alpha_1 | -\alpha_3 | 0 | 0 | 0 |
The only excluded variable in the \log(\text{pcGDP95}) equation is \log(\text{Mort}). Therefore, there is only one matrix that can be constructed using the excluded variables from this equation, which is [-\alpha_3] (see column 4 in the Table). The determinant of this matrix is -\alpha_3, and as long as this coefficient is nonzero (i.e., \alpha_3 \neq 0), meaning that the mortality rate is relevant in the PAER equation, the coefficients in the \log(\text{pcGDP95}) equation are exactly identified. For example, \beta_2 = \frac{\pi_2}{\gamma_2}, which represents the effect of property rights on GDP, is exactly identified.
It is crucial to observe the importance of excluding \log(\text{Mort}) from the \log(\text{pcGDP95}) equation, while including \log(\text{Mort}) in the PAER equation. This is known as the exclusion restriction, which requires the presence of an exogenous source of variability in the PAER equation to help identify the \log(\text{pcGDP95}) equation. The presence of relevant exogenous sources of variability is an essential factor in the identification, estimation, and inference of structural parameters.
As for the identification of the structural parameters in the PAER equation, there are three potential matrices that can be constructed: [-\beta_3], [-\beta_4], and [-\beta_5] (see columns 5, 6, and 7 in the Table). As long as any of these parameters are relevant in the \log(\text{pcGDP95}) equation, the PAER equation is identified. In this case, the PAER equation is over-identified, meaning there are multiple ways to estimate the parameters in this equation. For example, \alpha_2 = \gamma_3/\pi_3 = \gamma_4/\pi_4 = \gamma_5/\pi_5 (see Exercise 2).
In general, recovering the structural parameters from the reduced-form parameters can be challenging due to the need for relevant identification restrictions, which can be difficult to find in some applications.39
For this example, we set non-informative priors: \boldsymbol{B}_0 = \left[\boldsymbol{0}_5 \ \boldsymbol{0}_5\right], \boldsymbol{V}_0 = 100 \boldsymbol{I}_K, \boldsymbol{\Psi}_0 = 5 \boldsymbol{I}_2, and \alpha_0 = 5.40 Once our GUI is displayed (see the beginning of this chapter), we should follow the next Algorithm to run multivariate linear models in the GUI (see Chapter 5 for details, particularly on how to set the data set).
Algorithm: Multivariate Linear Model
Select Multivariate Models on the top panel
Select Simple Multivariate model using the left radio button
Upload the dataset selecting first if there is header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Select the number of dependent variables in the box Number of endogenous variables: m
Select the number of independent variables (including the intercept) in the box Number of exogenous variables: k
Set the hyperparameters: mean vectors, covariance matrix, degrees of freedom, and the scale matrix. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons
The following R code shows how to perform the Gibss sampling algorithm in this example using the dataset 4Institutions.csv. We ask to run this example using the rmultireg command from the bayesm package as an exercise. We find that the posterior mean structural effect of property rights on GDP is 0.98, and the 95% credible interval is (0.56, 2.87). This means that there is evidence supporting a positive effect of property rights on gross domestic product.
rm(list = ls())
set.seed(12345)
DataInst <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/4Institutions.csv", sep = ",", header = TRUE, quote = "")
attach(DataInst)
Y <- cbind(logpcGDP95, PAER)
X <- cbind(1, logMort, Africa, Asia, Other)
M <- dim(Y)[2]
K <- dim(X)[2]
N <- dim(Y)[1]
# Hyperparameters
B0 <- matrix(0, K, M)
c0 <- 100
V0 <- c0*diag(K)
Psi0 <- 5*diag(M)
a0 <- 5
# Posterior parameters
Bhat <- solve(t(X)%*%X)%*%t(X)%*%Y
S <- t(Y - X%*%Bhat)%*%(Y - X%*%Bhat)
Vn <- solve(solve(V0) + t(X)%*%X)
Bn <- Vn%*%(solve(V0)%*%B0 + t(X)%*%X%*%Bhat)
Psin <- Psi0 + S + t(B0)%*%solve(V0)%*%B0 + t(Bhat)%*%t(X)%*%X%*%Bhat - t(Bn)%*%solve(Vn)%*%Bn
an <- a0 + N
#Posterior draws
s <- 10000 #Number of posterior draws
SIGs <- replicate(s, LaplacesDemon::rinvwishart(an, Psin))
BsCond <- sapply(1:s, function(s) {MixMatrix::rmatrixnorm(n = 1, mean=Bn, U = Vn,V = SIGs[,,s])})
summary(coda::mcmc(t(BsCond)))
##
## Iterations = 1:10000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## [1,] 10.3789 0.44935 0.0044935 0.0044935
## [2,] -0.4170 0.09972 0.0009972 0.0009972
## [3,] -0.7318 0.24032 0.0024032 0.0024032
## [4,] -0.5840 0.28959 0.0028959 0.0028959
## [5,] 0.2972 0.48248 0.0048248 0.0048248
## [6,] 8.5156 0.75222 0.0075222 0.0075222
## [7,] -0.4283 0.16725 0.0016725 0.0016725
## [8,] -0.2677 0.40099 0.0040099 0.0040361
## [9,] 0.3475 0.48749 0.0048749 0.0048749
## [10,] 1.2454 0.81787 0.0081787 0.0081787
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## var1 9.4953 10.07783 10.3766 10.6768666 11.27717
## var2 -0.6136 -0.48366 -0.4158 -0.3496649 -0.22143
## var3 -1.1985 -0.89381 -0.7327 -0.5695042 -0.25578
## var4 -1.1527 -0.77805 -0.5827 -0.3900056 -0.01057
## var5 -0.6587 -0.02514 0.2945 0.6224207 1.24241
## var6 7.0210 8.01999 8.5116 9.0111246 10.00396
## var7 -0.7558 -0.53971 -0.4285 -0.3165409 -0.10184
## var8 -1.0600 -0.53703 -0.2676 0.0002646 0.52081
## var9 -0.6090 0.02377 0.3500 0.6740956 1.30291
## var10 -0.3446 0.69699 1.2418 1.7939884 2.86752
SIGMs <- t(sapply(1:s, function(l) {gdata::lowerTriangle(SIGs[,,l], diag=TRUE, byrow=FALSE)}))
summary(coda::mcmc(SIGMs))
##
## Iterations = 1:10000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## [1,] 0.5381 0.09524 0.0009524 0.0009813
## [2,] 0.5372 0.13144 0.0013144 0.0013144
## [3,] 1.5225 0.26763 0.0026763 0.0026763
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## var1 0.3853 0.4695 0.5267 0.5942 0.7545
## var2 0.3181 0.4449 0.5234 0.6137 0.8359
## var3 1.0863 1.3324 1.4924 1.6799 2.1354
hdiBs <- HDInterval::hdi(t(BsCond), credMass = 0.95) # Highest posterior density credible interval
hdiBs
## [,1] [,2] [,3] [,4] [,5] [,6]
## lower 9.44900 -0.6131229 -1.1769984 -1.16535714 -0.670223 7.041383
## upper 11.22169 -0.2212954 -0.2393919 -0.02786525 1.227650 10.017329
## [,7] [,8] [,9] [,10]
## lower -0.74678771 -1.0419989 -0.6358514 -0.3253181
## upper -0.09525447 0.5322414 1.2748474 2.8767518
## attr(,"credMass")
## [1] 0.95
hdiSIG <- HDInterval::hdi(SIGMs, credMass = 0.95) # Highest posterior density credible interval
hdiSIG
## [,1] [,2] [,3]
## lower 0.3735344 0.2960049 1.023421
## upper 0.7318534 0.8022998 2.034439
## attr(,"credMass")
## [1] 0.95
##
## Iterations = 1:10000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.9796 16.8430 0.1684 0.1684
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.5604 0.7984 0.9677 1.2329 2.8709
References
This model captures the potential underlying economic relationship between the variables.↩︎
Simultaneous causality is one of the most controversial causation issues from a philosophy of science perspective. The root of the issue is that causation is typically based on the time sequence of cause and effect.↩︎
Note that \mathbb{E}[u_1\text{PAER}]\neq 0, which means failing to meet a necessary condition for obtaining unbiased and consistent estimators of . See Exercise 1.↩︎
Good introductory-level textbooks on identification in linear systems include Gujarati and Porter (2009), and Jeffrey M. Wooldridge (2016).↩︎
Note that we are setting the priors in the reduced-form model. This may have unintended consequences for the posterior distributions of the structural parameters, which are ultimately the parameters of interest to researchers. For further discussion, see G. M. Koop (2003).↩︎