9.1 Normal model
The longitudinal/panel normal model establishes \boldsymbol{y}_i=\boldsymbol{X}_i\boldsymbol{\beta}+\boldsymbol{W}_i\boldsymbol{b}_i+\boldsymbol{\mu}_i where \boldsymbol{y}_i are T_i-dimensional vectors corresponding to units i=1,2,\dots,N, \boldsymbol{X}_i and \boldsymbol{W}_i are T_i\times K_1 and T_i\times K_2 matrices, respectively. In the statistical literature, \boldsymbol{\beta} is a K_1-dimensional vector of fixed effects, and \boldsymbol{b}_i is a K_2-dimensional vector of unit-specific random effects that allow unit-specific means, and enable capturing marginal dependence among the observations on the cross-sectional units. We assume normal stochastic errors, \boldsymbol{\mu}_i\sim{N}(\boldsymbol{0},\sigma^2\boldsymbol{I}_{T_i}), which means that the likelihood function is
\begin{align*} p(\boldsymbol{\beta},\boldsymbol{b},\sigma^2\mid \boldsymbol{y}, \boldsymbol{X},\boldsymbol{W}) & \propto \prod_{i=1}^N |\sigma^2\boldsymbol{I}_{T_i}|^{-1/2}\exp\left\{-\frac{1}{2\sigma^2}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)\right\}\\ & = (\sigma^2)^{-\frac{\sum_{i=1}^N T_i}{2}}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)\right\}, \end{align*}
where \boldsymbol{b}=[\boldsymbol{b}_1^{\top}, \boldsymbol{b}_2^{\top},\dots, \boldsymbol{b}_N^{\top}]^{\top}.
Panel data modeling in the Bayesian approach assumes a hierarchical structure in the random effects. Following S. Chib and Carlin (1999), there is a first stage where \boldsymbol{b}_i\sim{N}(\boldsymbol{0},\boldsymbol{D}), \boldsymbol{D} allows serial correlation within each cross-sectional unit i, and then, there is a second stage where \boldsymbol{D}\sim{I}{W}(d_0,d_0\boldsymbol{D}_0). Thus, we can see that there is an additional layer of priors as there is a prior on the hyperparameter \boldsymbol{D}.
In addition, we have standard conjugate prior distributions for \boldsymbol{\beta} and \sigma^2, \boldsymbol{\beta} \sim {N}(\boldsymbol{\beta}_0,\boldsymbol{B}_0) and \sigma^2 \sim {I}{G}(\alpha_0, \delta_0).
S. Chib and Carlin (1999) propose a blocking algorithm to perform inference in longitudinal hierarchical models by considering the distribution of \boldsymbol{y}_i marginalized over the random effects. Given that \boldsymbol{y}_i\mid \boldsymbol{\beta},\boldsymbol{b}_i,\sigma^2,\boldsymbol{X}_i,\boldsymbol{W}_i\sim N(\boldsymbol{X}_i\boldsymbol{\beta}+\boldsymbol{W}_i\boldsymbol{b}_i,\sigma^2\boldsymbol{I}_{T_i}), we can see that \boldsymbol{y}_i\mid \boldsymbol{\beta},\boldsymbol{D},\sigma^2,\boldsymbol{X}_i,\boldsymbol{W}_i\sim{N}(\boldsymbol{X}_i\boldsymbol{\beta},\boldsymbol{V}_i), where \boldsymbol{V}_i=\sigma^2\boldsymbol{I}_{T_i}+\boldsymbol{W}_i\boldsymbol{D}\boldsymbol{W}_i^{\top} given that \mathbb{E}[\boldsymbol{b}_i]=\boldsymbol{0} and Var[\boldsymbol{b}_i]=\boldsymbol{D}. If we have just random intercepts, then \boldsymbol{W}_i=\boldsymbol{i}_{T_i}, where \boldsymbol{i}_{T_i} is a T_i-dimensional vector of ones. Thus, \boldsymbol{V}_i=\sigma^2\boldsymbol{I}_{T_i}+\sigma_{b}^2\boldsymbol{i}_{T_i}\boldsymbol{i}_{T_i}^{\top}, the variance is \sigma^2+\sigma^2_{b} and the covariance is \sigma^2_{b} within each cross-sectional unit through time.
We can deduce the posterior distribution of \boldsymbol{\beta} given \sigma^2 and \boldsymbol{D},
\begin{align*}
\pi(\boldsymbol{\beta}\mid \sigma^2, \boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W}) & \propto \exp\left\{-\frac{1}{2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta})^{\top}\boldsymbol{V}_i^{-1}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta})\right\}\\
&\times \exp\left\{-\frac{1}{2}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)^{\top}\boldsymbol{B}_0^{-1}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)\right\}.
\end{align*}
This implies that (see Exercise 1)
\begin{equation*}
\boldsymbol{\beta}\mid \sigma^2,\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {N}(\boldsymbol{\beta}_n,\boldsymbol{B}_n),
\end{equation*}
where \boldsymbol{B}_n = (\boldsymbol{B}_0^{-1} +\sum_{i=1}^N \boldsymbol{X}_i^{\top}\boldsymbol{V}_i^{-1}\boldsymbol{X}_i)^{-1}, \boldsymbol{\beta}_n= \boldsymbol{B}_n(\boldsymbol{B}_0^{-1}\boldsymbol{\beta}_0 + \sum_{i=1}^N\boldsymbol{X}_i^{\top}\boldsymbol{V}_i^{-1}\boldsymbol{y}_i).
We can use the likelihood p(\boldsymbol{\beta},\boldsymbol{b}_i,\sigma^2\mid \boldsymbol{y}, \boldsymbol{X},\boldsymbol{W}) to get the posterior distributions of \boldsymbol{b}_i, \sigma^2 and \boldsymbol{D}. In particular, \begin{align*} \pi(\boldsymbol{b}_i\mid \boldsymbol{\beta},\sigma^2,\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W})&\propto \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)\right\}\\ &\times \exp\left\{-\frac{1}{2}\sum_{i=1}^N \boldsymbol{b}_i^{\top}\boldsymbol{D}^{-1}\boldsymbol{b}_i\right\}\\ &\propto\exp\left\{-\frac{1}{2}\sum_{i=1}^N(-2\boldsymbol{b}_i^{\top}(\sigma^{-2}\boldsymbol{W}_i^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}))+ \boldsymbol{b}_i^{\top}(\sigma^{-2}\boldsymbol{W}_i^{\top}\boldsymbol{W}_i+\boldsymbol{D}^{-1})\boldsymbol{b}_i)\right\}\\ &\propto\exp\left\{-\frac{1}{2}(-2\boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{B}_{ni}(\sigma^{-2}\boldsymbol{W}_i^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}))+ \boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_i)\right\}\\ &=\exp\left\{-\frac{1}{2}(-2\boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_{ni}+ \boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_i)\right\}, \end{align*} where \boldsymbol{B}_{ni}=(\sigma^{-2}\boldsymbol{W}_i^{\top}\boldsymbol{W}_i+\boldsymbol{D}^{-1})^{-1} and \boldsymbol{b}_{ni}=\boldsymbol{B}_{ni}(\sigma^{-2}\boldsymbol{W}_i^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta})).
We can complete the square in this expression by adding and subtracting \boldsymbol{b}_{ni}^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_{ni}. Thus, \begin{align*} \pi(\boldsymbol{b}_i\mid \boldsymbol{\beta},\sigma^2,\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W})&\propto \exp\left\{-\frac{1}{2}(-2\boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_{ni}+ \boldsymbol{b}_i^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_i+\boldsymbol{b}_{ni}^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_{ni}-\boldsymbol{b}_{ni}^{\top}\boldsymbol{B}_{ni}^{-1}\boldsymbol{b}_{ni})\right\}\\ &\propto \exp\left\{(\boldsymbol{b}_i-\boldsymbol{b}_{ni})^{\top}\boldsymbol{B}_{ni}^{-1}(\boldsymbol{b}_i-\boldsymbol{b}_{ni})\right\}. \end{align*} This is the kernel of a multivariate normal distribution with mean \boldsymbol{b}_{ni} and variance \boldsymbol{B}_{ni}. Thus, \begin{equation*} \boldsymbol{b}_i\mid \boldsymbol{\beta},\sigma^2,\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {N}(\boldsymbol{b}_{ni},\boldsymbol{B}_{ni}), \end{equation*} Let’s see the posterior distribution of \sigma^2, \begin{align*} \pi(\sigma^2\mid \boldsymbol{\beta},\boldsymbol{b},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W})&\propto (\sigma^2)^{-\frac{\sum_{i=1}^N T_i}{2}}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)\right\}\\ &\times (\sigma^2)^{-\alpha_0-1}\exp\left\{-\frac{\delta_0}{\sigma^2}\right\}\\ &=(\sigma^2)^{-\frac{\sum_{i=1}^N T_i}{2}-\alpha_0-1}\\ &\times \exp\left\{-\frac{1}{\sigma^2}\left(\delta_0+\sum_{i=1}^N\frac{(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)}{2}\right)\right\}. \end{align*} Thus, \begin{equation*} \sigma^2\mid \boldsymbol{\beta}, \boldsymbol{b}, \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {I}{G}(\alpha_n, \delta_n), \end{equation*} where \alpha_n=\alpha_0+\frac{1}{2}\sum_{i=1}^N T_i and \delta_n=\delta_0+\frac{1}{2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i).
The posterior distribution of \boldsymbol{D} is the following,
\begin{align*}
\pi(\boldsymbol{D}\mid \boldsymbol{b})&\propto |\boldsymbol{D}|^{-N/2} \exp\left\{-\frac{1}{2}\sum_{i=1}^N \boldsymbol{b}_i^{\top}\boldsymbol{D}^{-1}\boldsymbol{b}_i\right\}\\
&\times |\boldsymbol{D}|^{-(d_0+K_2+1)/2}\exp\left\{-\frac{1}{2}tr(d_0\boldsymbol{D}_0\boldsymbol{D}^{-1})\right\}\\
&=|\boldsymbol{D}|^{-(d_0+N+K_2+1)/2}\exp\left\{-\frac{1}{2}tr\left(\left(d_0\boldsymbol{D}_0+\sum_{i=1}^N\boldsymbol{b}_i\boldsymbol{b}_i^{\top}\right)\boldsymbol{D}^{-1}\right) \right\}.
\end{align*}
This is the kernel of an inverse Wishart distribution with degrees of freedom d_n=d_0+N and scale matrix \boldsymbol{D}_n=d_0\boldsymbol{D}_0+\sum_{i=1}^N\boldsymbol{b}_i\boldsymbol{b}_i^{\top}. Thus,
\begin{equation*}
\boldsymbol{D}\mid \boldsymbol{b} \sim {I}{W}(d_n, \boldsymbol{D}_n).
\end{equation*}
Observe that the posterior distribution of \boldsymbol{D} dependents just on \boldsymbol{b}.
All the posterior conditional distributions belong to standard families, this implies that we can use a Gibbs sampling algorithm to perform inference in these hierarchical normal models.
Example: The relation between productivity and public investment
We used the dataset named 8PublicCap.csv used by Ramírez Hassan (2017) to analyze the relation between public investment and gross state product in the setting of a spatial panel dataset consisting of 48 US states from 1970 to 1986. In particular, we perform inference based on the following equation \begin{equation*} \log(\text{gsp}_{it})=b_i+\beta_1+\beta_2\log(\text{pcap}_{it})+\beta_3\log(\text{pc}_{it})+\beta_4\log(\text{emp}_{it})+\beta_5\text{unemp}_{it}+\mu_{it}, \end{equation*}
where gsp in the gross state product, pcap is public capital, and pc is private capital all in USD, emp is employment (people), and unemp is the unemployment rate in percentage.
The following Algorithm shows how to perform inference in hierarchical longitudinal normal models in our GUI. See also Chapter 5 for details regarding the dataset structure.
Algorithm: Hierarchical Longitudinal Normal Models
Select Hierarchical Longitudinal Model on the top panel
Select Normal model using the left radio button
Upload the dataset, selecting first if there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Write down the formula of the fixed effects equation in the Main Equation: Fixed Effects box. This formula must be written using the syntax of the formula command of R software. This equation includes intercept by default, do not include it in the equation
Write down the formula of the random effects equation in the Main Equation: Random Effects box without writing the dependent variable, that is, starting the equation with the tilde (“~”) symbol. This formula must be written using the syntax of the formula command of R software. This equation includes intercept by default, do not include it in the equation. If there are just random intercepts do not write anything in this box
Write down the name of the grouping variable, that is, the variable that indicates the cross-sectional units
Set the hyperparameters of the fixed effects: mean vector, covariance matrix, shape and scale parameters. This step is not necessary as by default our GUI uses non-informative priors
Set the hyperparameters of the random effects: degrees of freedom and scale matrix of the inverse Wishart distribution. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons
We ask in Exercise 2 to run this application in our GUI using 10,000 MCMC iterations plus a burn-in equal to 5,000 iterations, and a thinning parameter equal to 1. We also used the default values for the hyperparameters of the prior distributions, that is, \boldsymbol{\beta}_0=\boldsymbol{0}_5, \boldsymbol{B}_0=\boldsymbol{I}_5, \alpha_0=\delta_0=0.001, d_0=5 and \boldsymbol{D}_0=\boldsymbol{I}_1. It seems that all posterior draws come from stationary distributions, as suggested by the diagnostics and posterior plots (see Exercise 2).
The following code uses the command MCMChregress from the package MCMCpack to run this application. This command is also used by our GUI to perform inference in hierarchical longitudinal normal models.
We can see that the 95% symmetric credible intervals for public capital, private capital, employment, and unemployment are (-2.54e-02, -2.06e-02), (2.92e-01, 2.96e-01), (7.62e-01, 7.67e-01) and (-5.47e-03, -5.31e-03), respectively. The posterior mean elasticity estimate of public capital to GSP is -0.023, that is, an increase by 1% in public capital means a 0.023% decrease in gross state product. The posterior mean estimates of private capital and employment elasticities are 0.294 and 0.765, respectively. In addition, a 1 percentage point increase in the unemployment rate means a decrease of 0.54% in GSP. It seems that all these variables are statistically relevant.
In addition, the posterior mean estimates of the variance associated with the unobserved heterogeneity and stochastic errors are 1.06e-01 and 1.45e-03. We obtained the posterior chain of the proportion of the variance associated with the unobserved heterogeneity. The 95% symmetric credible interval is (0.98, 0.99) for this proportion, that is, unobserved heterogeneity is very important to explain the total variability.
rm(list = ls())
set.seed(12345)
DataGSP <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/8PublicCap.csv", sep = ",", header = TRUE, quote = "")
attach(DataGSP)
## The following object is masked from DataUSfilcal:
##
## year
## The following object is masked from Data (pos = 9):
##
## id
## The following object is masked from DataUtEst (pos = 10):
##
## id
## The following object is masked from Data (pos = 15):
##
## id
## The following object is masked from mydata:
##
## id
## The following object is masked from dataset (pos = 18):
##
## year
## The following object is masked from dataset (pos = 20):
##
## year
## The following object is masked from DataUtEst (pos = 21):
##
## id
K1 <- 5; K2 <- 1
b0 <- rep(0, K1); B0 <- diag(K1)
r0 <- 5; R0 <- diag(K2)
a0 <- 0.001; d0 <- 0.001
Resultshreg <- MCMCpack::MCMChregress(fixed = log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, random = ~1, group = "id", data = DataGSP, burnin = 5000, mcmc = 10000, thin = 1, r = r0, R = R0, nu = a0, delta = d0)
##
## Running the Gibbs sampler. It may be long, keep cool :)
##
## **********:10.0%
## **********:20.0%
## **********:30.0%
## **********:40.0%
## **********:50.0%
## **********:60.0%
## **********:70.0%
## **********:80.0%
## **********:90.0%
## **********:100.0%
Betas <- Resultshreg[["mcmc"]][,1:K1]
Sigma2RanEff <- Resultshreg[["mcmc"]][,54]
Sigma2 <- Resultshreg[["mcmc"]][,55]
summary(Betas)
##
## Iterations = 5001:15000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 2.330139 7.940e-03 7.940e-05 7.940e-05
## beta.log(pcap) -0.023082 1.225e-03 1.225e-05 1.225e-05
## beta.log(pc) 0.293729 1.007e-03 1.007e-05 1.007e-05
## beta.log(emp) 0.764645 1.333e-03 1.333e-05 1.340e-05
## beta.unemp -0.005387 4.114e-05 4.114e-07 4.071e-07
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) 2.314556 2.324656 2.330193 2.33567 2.345548
## beta.log(pcap) -0.025442 -0.023926 -0.023104 -0.02227 -0.020655
## beta.log(pc) 0.291738 0.293068 0.293721 0.29441 0.295716
## beta.log(emp) 0.761969 0.763755 0.764650 0.76554 0.767220
## beta.unemp -0.005468 -0.005414 -0.005387 -0.00536 -0.005307
##
## Iterations = 5001:15000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.1058028 0.0215133 0.0002151 0.0002151
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.07208 0.09086 0.10331 0.11751 0.15600
##
## Iterations = 5001:15000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 1.453e-03 7.361e-05 7.361e-07 7.698e-07
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.001316 0.001403 0.001451 0.001501 0.001606
There are many extensions of this model. For instance, S. Chib and Carlin (1999) propose introducing heteroskedasticity in this model by assuming \mu_{it} \mid \tau_{it} \sim N(0, \sigma^2/\tau_{it}), \tau_{it} \sim G(v/2,v/2). We ask in Exercise 2 to perform inference on the relationship between productivity and public investment using this setting.
Another potential extension is to allow dependence between \boldsymbol{b}_i and some controls, let’s say \boldsymbol{z}_i, a K_3-dimensional vector, and assume \boldsymbol{b}_i \sim N(\boldsymbol{Z}_i \boldsymbol{\gamma}, \boldsymbol{D}) where \boldsymbol{Z}_i = \mathbf{I}_{K_2} \otimes \boldsymbol{z}_i^{\top}, and complete the model using a prior for \boldsymbol{\gamma}, \boldsymbol{\gamma} \sim N(\boldsymbol{\gamma}_0, \boldsymbol{\Gamma}_0). We ask to perform a simulation using this setting in Exercise 3.
Example: Simulation exercise of the longitudinal normal model with heteroskedasticity
Let’s perform a simulation exercise to assess some potential extensions of the longitudinal hierarchical normal model. The point of departure is to assume that y_{it}=\beta_0+\beta_1x_{it1}+\beta_2x_{it2}+\beta_3x_{it3}+b_i+w_{it1}b_{i1}+\mu_{it}, where x_{itk}\sim N(0,1), k=1,2,3, w_{it1}\sim N(0,1), b_i\sim N(0, 0.7^{1/2}), b_{i1}\sim N(0, 0.6^{1/2}), \mu_{it}\sim N(0, (0.1/\tau)^{1/2}), \tau_{it}\sim G(v/2,v/2) and \boldsymbol{\beta}=[0.5 \ 0.4 \ 0.6 \ -0.6]^{\top}, i=1,2,\dots,50. The sample size is 2000 in an unbalanced panel structure.
Following same stages as in this section and Exercise 1, the posterior conditional distributions assuming that \mu_{it}\mid \tau_{it}\sim N(0, \sigma^2/\tau_{it}), \tau_{it}\sim G(v/2,v/2) are given by
\begin{equation*}
\boldsymbol{\beta}\mid \sigma^2,\boldsymbol{\tau},\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {N}(\boldsymbol{\beta}_n,\boldsymbol{B}_n),
\end{equation*}
where \boldsymbol{\tau}=[\tau_{it}]^{\top}, \boldsymbol{B}_n = (\boldsymbol{B}_0^{-1} +\sum_{i=1}^N \boldsymbol{X}_i^{\top}\boldsymbol{V}_i^{-1}\boldsymbol{X}_i)^{-1}, \boldsymbol{\beta}_n= \boldsymbol{B}_n(\boldsymbol{B}_0^{-1}\boldsymbol{\beta}_0 + \sum_{i=1}^N\boldsymbol{X}_i^{\top}\boldsymbol{V}_i^{-1}\boldsymbol{y}_i), \boldsymbol{V}_i=\sigma^2\boldsymbol{\Psi}_i+\sigma_{b}^2\boldsymbol{i}_{T_i}\boldsymbol{i}_{T_i}^{\top} and \boldsymbol{\Psi}_i=diag\left\{\tau_{it}^{-1}\right\}.
\begin{equation*}
\boldsymbol{b}_i\mid \boldsymbol{\beta},\sigma^2,\boldsymbol{\tau},\boldsymbol{D},\boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {N}(\boldsymbol{b}_{ni},\boldsymbol{B}_{ni}),
\end{equation*}
where \boldsymbol{B}_{ni}=(\sigma^{-2}\boldsymbol{W}_i^{\top}\boldsymbol{\Psi}_i^{-1}\boldsymbol{W}_i+\boldsymbol{D}^{-1})^{-1} and \boldsymbol{b}_{ni}=\boldsymbol{B}_{ni}(\sigma^{-2}\boldsymbol{W}_i^{\top}\boldsymbol{\Psi}_i^{-1}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta})).
\begin{equation*}
\sigma^2\mid \boldsymbol{\beta}, \boldsymbol{b}, \boldsymbol{\tau}, \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {I}{G}(\alpha_n, \delta_n),
\end{equation*}
where \alpha_n=\alpha_0+\frac{1}{2}\sum_{i=1}^N T_i and \delta_n=\delta_0+\frac{1}{2}\sum_{i=1}^N(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i)^{\top}\boldsymbol{\Psi}_i^{-1}(\boldsymbol{y}_i-\boldsymbol{X}_i\boldsymbol{\beta}-\boldsymbol{W}_i\boldsymbol{b}_i).
\begin{equation*}
\boldsymbol{D}\mid \boldsymbol{b} \sim {I}{W}(d_n, \boldsymbol{D}_n),
\end{equation*}
where d_n=d_0+N and \boldsymbol{D}_n=d_0\boldsymbol{D}_0+\sum_{i=1}^N\boldsymbol{b}_i\boldsymbol{b}_i^{\top}. And
\begin{equation*}
\tau_{it}\mid \sigma^2, \boldsymbol{\beta}, \boldsymbol{b}, \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W} \sim {G}(v_{1n}/2, v_{2ni}/2),
\end{equation*}
where v_{1n}=v+1 and v_{2ni}=v+\sigma^{-2}(y_{it}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^2.
The following code implements this simulation, and gets draws of the posterior distributions. We set MCMC iterations, burn-in and thinning parameters equal to 5000, 1000 and 1, respectively. In addition, \boldsymbol{\beta}_0=\boldsymbol{0}_5, \boldsymbol{B}_0=\boldsymbol{I}_5, \alpha_0=\delta_0=0.001, d_0=2, \boldsymbol{D}_0=\boldsymbol{I}_2 and v=5.
rm(list = ls()); set.seed(010101)
NT <- 2000; N <- 50
id <- c(1:N, sample(1:N, NT - N,replace=TRUE))
table(id)
## id
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 42 35 48 53 44 36 43 34 42 38 40 39 41 43 40 30 27 56 39 51 36 30 45 35 33 48
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## 37 51 32 46 47 41 44 39 40 48 37 44 37 42 42 34 42 34 41 35 36 34 31 38
x1 <- rnorm(NT); x2 <- rnorm(NT); x3 <- rnorm(NT)
X <- cbind(1, x1, x2, x3); K1 <- dim(X)[2]
w1 <- rnorm(NT); W <- cbind(1, w1)
K2 <- dim(W)[2]; B <- c(0.5, 0.4, 0.6, -0.6)
D <- c(0.7, 0.6)
b1 <- rnorm(N, 0, sd = D[1]^0.5)
b2 <- rnorm(N, 0, sd = D[2]^0.5)
b <- cbind(b1, b2)
v <- 5; tau <- rgamma(NT, shape = v/2, rate = v/2)
sig2 <- 0.1; u <- rnorm(NT, 0, sd = (sig2/tau)^0.5)
y <- NULL
for(i in 1:NT){
yi <- X[i,]%*%B + W[i,]%*%b[id[i],] + u[i]
y <- c(y, yi)
}
Data <- as.data.frame(cbind(y, x1, x2, x3, w1, id))
mcmc <- 5000; burnin <- 1000; thin <- 1; tot <- mcmc + burnin
b0 <- rep(0, K1); B0 <- diag(K1); B0i <- solve(B0)
r0 <- K2; R0 <- diag(K2); a0 <- 0.001; d0 <- 0.001
PostBeta <- function(sig2, D, tau){
XVX <- matrix(0, K1, K1)
XVy <- matrix(0, K1, 1)
for(i in 1:N){
ids <- which(id == i)
Ti <- length(ids)
Wi <- W[ids, ]
taui <- tau[ids]
Vi <- sig2*solve(diag(1/taui)) + Wi%*%D%*%t(Wi)
ViInv <- solve(Vi)
Xi <- X[ids, ]
XVXi <- t(Xi)%*%ViInv%*%Xi
XVX <- XVX + XVXi
yi <- y[ids]
XVyi <- t(Xi)%*%ViInv%*%yi
XVy <- XVy + XVyi
}
Bn <- solve(B0i + XVX)
bn <- Bn%*%(B0i%*%b0 + XVy)
Beta <- MASS::mvrnorm(1, bn, Bn)
return(Beta)
}
Postb <- function(Beta, sig2, D, tau){
Di <- solve(D); bis <- matrix(0, N, K2)
for(i in 1:N){
ids <- which(id == i)
Wi <- W[ids, ]; Xi <- X[ids, ]
yi <- y[ids]; taui <- tau[ids]
Taui <- solve(diag(1/taui))
Wtei <- sig2^(-1)*t(Wi)%*%Taui%*%(yi - Xi%*%Beta)
Bni <- solve(sig2^(-1)*t(Wi)%*%Taui%*%Wi + Di)
bni <- Bni%*%Wtei
bi <- MASS::mvrnorm(1, bni, Bni)
bis[i, ] <- bi
}
return(bis)
}
PostSig2 <- function(Beta, bs, tau){
an <- a0 + 0.5*NT
ete <- 0
for(i in 1:N){
ids <- which(id == i)
Xi <- X[ids, ]; yi <- y[ids]
Wi <- W[ids, ]; taui <- tau[ids]
Taui <- solve(diag(1/taui))
ei <- yi - Xi%*%Beta - Wi%*%bs[i, ]
etei <- t(ei)%*%Taui%*%ei
ete <- ete + etei
}
dn <- d0 + 0.5*ete
sig2 <- MCMCpack::rinvgamma(1, shape = an, scale = dn)
return(sig2)
}
PostD <- function(bs){
rn <- r0 + N
btb <- matrix(0, K2, K2)
for(i in 1:N){
bsi <- bs[i, ]
btbi <- bsi%*%t(bsi)
btb <- btb + btbi
}
Rn <- d0*R0 + btb
Sigma <- MCMCpack::riwish(v = rn, S = Rn)
return(Sigma)
}
PostTau <- function(sig2, Beta, bs){
v1n <- v + 1
v2n <- NULL
for(i in 1:NT){
Xi <- X[i, ]; yi <- y[i]
Wi <- W[i, ]; bi <- bs[id[i],]
v2ni <- v + sig2^(-1)*(yi - Xi%*%Beta - Wi%*%bi)^2
v2n <- c(v2n, v2ni)
}
tau <- rgamma(NT, shape = rep(v1n/2, NT), rate = v2n/2)
return(tau)
}
PostBetas <- matrix(0, tot, K1); PostDs <- matrix(0, tot, K2*(K2+1)/2)
PostSig2s <- rep(0, tot); Postbs <- array(0, c(N, K2, tot))
PostTaus <- matrix(0, tot, NT); RegLS <- lm(y ~ X - 1)
SumLS <- summary(RegLS)
Beta <- SumLS[["coefficients"]][,1]
sig2 <- SumLS[["sigma"]]^2; D <- diag(K2)
tau <- rgamma(NT, shape = v/2, rate = v/2)
pb <- winProgressBar(title = "progress bar", min = 0, max = tot, width = 300)
for(s in 1:tot){
bs <- Postb(Beta = Beta, sig2 = sig2, D = D, tau = tau)
D <- PostD(bs = bs)
Beta <- PostBeta(sig2 = sig2, D = D, tau = tau)
sig2 <- PostSig2(Beta = Beta, bs = bs, tau = tau)
tau <- PostTau(sig2 = sig2, Beta = Beta, bs = bs)
PostBetas[s,] <- Beta
PostDs[s,] <- matrixcalc::vech(D)
PostSig2s[s] <- sig2
Postbs[, , s] <- bs
PostTaus[s,] <- tau
setWinProgressBar(pb, s, title=paste( round(s/tot*100, 0),"% done"))
}
close(pb)
## NULL
keep <- seq((burnin+1), tot, thin)
Bs <- PostBetas[keep,]; Ds <- PostDs[keep,]
bs <- Postbs[, , keep]; sig2s <- PostSig2s[keep]
taus <- PostTaus[keep,]
summary(coda::mcmc(Bs))
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## [1,] 0.3215 0.12222 0.0017284 0.0017051
## [2,] 0.3696 0.01471 0.0002081 0.0001595
## [3,] 0.6254 0.01555 0.0002199 0.0001687
## [4,] -0.6068 0.01498 0.0002118 0.0001636
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## var1 0.07833 0.2412 0.3232 0.4022 0.5619
## var2 0.34101 0.3598 0.3697 0.3793 0.3988
## var3 0.59596 0.6150 0.6251 0.6351 0.6574
## var4 -0.63722 -0.6165 -0.6067 -0.5966 -0.5785
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## [1,] 0.70746 0.14999 0.002121 0.002121
## [2,] -0.08457 0.09101 0.001287 0.001316
## [3,] 0.54517 0.11304 0.001599 0.001647
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## var1 0.4720 0.5995 0.68858 0.79206 1.05285
## var2 -0.2721 -0.1405 -0.08185 -0.02482 0.09186
## var3 0.3689 0.4644 0.52978 0.60946 0.81999
##
## Iterations = 1:5000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.1531713 0.0592737 0.0008383 0.0014390
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.1022 0.1157 0.1324 0.1683 0.3217
We can see that all the 95% credible intervals encompass the population parameters, except for the second fixed effect and the variance of the model, but both for a tiny margin.