9.1 Normal model

The longitudinal/panel normal model establishes yi=Xiβ+Wibi+μi where yi are Ti-dimensional vectors corresponding to units i=1,2,,N, Xi and Wi are Ti×K1 and Ti×K2 matrices, respectively. In the statistical literature, β is a K1-dimensional vector of fixed effects, and bi is a K2-dimensional vector of unit-specific random effects that allow unit-specific means, and enable capturing marginal dependence among the observations on the cross-sectional units. We assume normal stochastic errors, μiN(0,σ2ITi), which means that the likelihood function is

p(β,b,σ2y,X,W)Ni=1|σ2ITi|1/2exp{12σ2(yiXiβWibi)(yiXiβWibi)}=(σ2)Ni=1Ti2exp{12σ2Ni=1(yiXiβWibi)(yiXiβWibi)},

where b=[b1,b2,,bN].

Panel data modeling in the Bayesian approach assumes a hierarchical structure in the random effects. Following S. Chib and Carlin (1999), there is a first stage where biN(0,D), D allows serial correlation within each cross-sectional unit i, and then, there is a second stage where DIW(d0,d0D0). Thus, we can see that there is an additional layer of priors as there is a prior on the hyperparameter D.

In addition, we have standard conjugate prior distributions for β and σ2, βN(β0,B0) and σ2IG(α0,δ0).

S. Chib and Carlin (1999) propose a blocking algorithm to perform inference in longitudinal hierarchical models by considering the distribution of yi marginalized over the random effects. Given that yiβ,bi,σ2,Xi,WiN(Xiβ+Wibi,σ2ITi), we can see that yiβ,D,σ2,Xi,WiN(Xiβ,Vi), where Vi=σ2ITi+WiDWi given that E[bi]=0 and Var[bi]=D. If we have just random intercepts, then Wi=iTi, where iTi is a Ti-dimensional vector of ones. Thus, Vi=σ2ITi+σ2biTiiTi, the variance is σ2+σ2b and the covariance is σ2b within each cross-sectional unit through time.

We can deduce the posterior distribution of β given σ2 and D, π(βσ2,D,y,X,W)exp{12Ni=1(yiXiβ)V1i(yiXiβ)}×exp{12(ββ0)B10(ββ0)}. This implies that (see Exercise 1)
βσ2,D,y,X,WN(βn,Bn), where Bn=(B10+Ni=1XiV1iXi)1, βn=Bn(B10β0+Ni=1XiV1iyi).

We can use the likelihood p(β,bi,σ2y,X,W) to get the posterior distributions of bi, σ2 and D. In particular, π(biβ,σ2,D,y,X,W)exp{12σ2Ni=1(yiXiβWibi)(yiXiβWibi)}×exp{12Ni=1biD1bi}exp{12Ni=1(2bi(σ2Wi(yiXiβ))+bi(σ2WiWi+D1)bi)}exp{12(2biB1niBni(σ2Wi(yiXiβ))+biB1nibi)}=exp{12(2biB1nibni+biB1nibi)}, where Bni=(σ2WiWi+D1)1 and bni=Bni(σ2Wi(yiXiβ)).

We can complete the square in this expression by adding and subtracting bniB1nibni. Thus, π(biβ,σ2,D,y,X,W)exp{12(2biB1nibni+biB1nibi+bniB1nibnibniB1nibni)}exp{(bibni)B1ni(bibni)}. This is the kernel of a multivariate normal distribution with mean bni and variance Bni. Thus, biβ,σ2,D,y,X,WN(bni,Bni), Let’s see the posterior distribution of σ2, π(σ2β,b,y,X,W)(σ2)Ni=1Ti2exp{12σ2Ni=1(yiXiβWibi)(yiXiβWibi)}×(σ2)α01exp{δ0σ2}=(σ2)Ni=1Ti2α01×exp{1σ2(δ0+Ni=1(yiXiβWibi)(yiXiβWibi)2)}. Thus, σ2β,b,y,X,WIG(αn,δn), where αn=α0+12Ni=1Ti and δn=δ0+12Ni=1(yiXiβWibi)(yiXiβWibi).

The posterior distribution of D is the following, π(Db)|D|N/2exp{12Ni=1biD1bi}×|D|(d0+K2+1)/2exp{12tr(d0D0D1)}=|D|(d0+N+K2+1)/2exp{12tr((d0D0+Ni=1bibi)D1)}. This is the kernel of an inverse Wishart distribution with degrees of freedom dn=d0+N and scale matrix Dn=d0D0+Ni=1bibi. Thus,
DbIW(dn,Dn). Observe that the posterior distribution of D dependents just on b.

All the posterior conditional distributions belong to standard families, this implies that we can use a Gibbs sampling algorithm to perform inference in these hierarchical normal models.

Example: The relation between productivity and public investment

We used the dataset named 8PublicCap.csv used by Ramírez Hassan (2017) to analyze the relation between public investment and gross state product in the setting of a spatial panel dataset consisting of 48 US states from 1970 to 1986. In particular, we perform inference based on the following equation log(gspit)=bi+β1+β2log(pcapit)+β3log(pcit)+β4log(empit)+β5unempit+μit,

where gsp in the gross state product, pcap is public capital, and pc is private capital all in USD, emp is employment (people), and unemp is the unemployment rate in percentage.

The following Algorithm shows how to perform inference in hierarchical longitudinal normal models in our GUI. See also Chapter 5 for details regarding the dataset structure.

Algorithm: Hierarchical Longitudinal Normal Models

  1. Select Hierarchical Longitudinal Model on the top panel

  2. Select Normal model using the left radio button

  3. Upload the dataset, selecting first if there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend

  4. Select MCMC iterations, burn-in, and thinning parameters using the Range sliders

  5. Write down the formula of the fixed effects equation in the Main Equation: Fixed Effects box. This formula must be written using the syntax of the formula command of R software. This equation includes intercept by default, do not include it in the equation

  6. Write down the formula of the random effects equation in the Main Equation: Random Effects box without writing the dependent variable, that is, starting the equation with the tilde (“~”) symbol. This formula must be written using the syntax of the formula command of R software. This equation includes intercept by default, do not include it in the equation. If there are just random intercepts do not write anything in this box

  7. Write down the name of the grouping variable, that is, the variable that indicates the cross-sectional units

  8. Set the hyperparameters of the fixed effects: mean vector, covariance matrix, shape and scale parameters. This step is not necessary as by default our GUI uses non-informative priors

  9. Set the hyperparameters of the random effects: degrees of freedom and scale matrix of the inverse Wishart distribution. This step is not necessary as by default our GUI uses non-informative priors

  10. Click the Go! button

  11. Analyze results

  12. Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

We ask in Exercise 2 to run this application in our GUI using 10,000 MCMC iterations plus a burn-in equal to 5,000 iterations, and a thinning parameter equal to 1. We also used the default values for the hyperparameters of the prior distributions, that is, β0=05, B0=I5, α0=δ0=0.001, d0=5 and D0=I1. It seems that all posterior draws come from stationary distributions, as suggested by the diagnostics and posterior plots (see Exercise 2).

The following code uses the command MCMChregress from the package MCMCpack to run this application. This command is also used by our GUI to perform inference in hierarchical longitudinal normal models.

We can see that the 95% symmetric credible intervals for public capital, private capital, employment, and unemployment are (-2.54e-02, -2.06e-02), (2.92e-01, 2.96e-01), (7.62e-01, 7.67e-01) and (-5.47e-03, -5.31e-03), respectively. The posterior mean elasticity estimate of public capital to GSP is -0.023, that is, an increase by 1% in public capital means a 0.023% decrease in gross state product. The posterior mean estimates of private capital and employment elasticities are 0.294 and 0.765, respectively. In addition, a 1 percentage point increase in the unemployment rate means a decrease of 0.54% in GSP. It seems that all these variables are statistically relevant.

In addition, the posterior mean estimates of the variance associated with the unobserved heterogeneity and stochastic errors are 1.06e-01 and 1.45e-03. We obtained the posterior chain of the proportion of the variance associated with the unobserved heterogeneity. The 95% symmetric credible interval is (0.98, 0.99) for this proportion, that is, unobserved heterogeneity is very important to explain the total variability.

rm(list = ls())
set.seed(12345)
DataGSP <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/8PublicCap.csv", sep = ",", header = TRUE, quote = "")
attach(DataGSP)
## The following object is masked from DataUSfilcal:
## 
##     year
## The following object is masked from Data (pos = 9):
## 
##     id
## The following object is masked from DataUtEst (pos = 10):
## 
##     id
## The following object is masked from Data (pos = 15):
## 
##     id
## The following object is masked from mydata:
## 
##     id
## The following object is masked from dataset (pos = 18):
## 
##     year
## The following object is masked from dataset (pos = 20):
## 
##     year
## The following object is masked from DataUtEst (pos = 21):
## 
##     id
K1 <- 5; K2 <- 1
b0 <- rep(0, K1); B0 <- diag(K1)
r0 <- 5; R0 <- diag(K2)
a0 <- 0.001; d0 <- 0.001
Resultshreg <- MCMCpack::MCMChregress(fixed = log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, random = ~1, group = "id", data = DataGSP, burnin = 5000, mcmc = 10000, thin = 1, r = r0, R = R0, nu = a0, delta = d0)
## 
## Running the Gibbs sampler. It may be long, keep cool :)
## 
## **********:10.0%
## **********:20.0%
## **********:30.0%
## **********:40.0%
## **********:50.0%
## **********:60.0%
## **********:70.0%
## **********:80.0%
## **********:90.0%
## **********:100.0%
Betas <- Resultshreg[["mcmc"]][,1:K1]
Sigma2RanEff <- Resultshreg[["mcmc"]][,54]
Sigma2 <- Resultshreg[["mcmc"]][,55]
summary(Betas)
## 
## Iterations = 5001:15000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                       Mean        SD  Naive SE Time-series SE
## beta.(Intercept)  2.330139 7.940e-03 7.940e-05      7.940e-05
## beta.log(pcap)   -0.023082 1.225e-03 1.225e-05      1.225e-05
## beta.log(pc)      0.293729 1.007e-03 1.007e-05      1.007e-05
## beta.log(emp)     0.764645 1.333e-03 1.333e-05      1.340e-05
## beta.unemp       -0.005387 4.114e-05 4.114e-07      4.071e-07
## 
## 2. Quantiles for each variable:
## 
##                       2.5%       25%       50%      75%     97.5%
## beta.(Intercept)  2.314556  2.324656  2.330193  2.33567  2.345548
## beta.log(pcap)   -0.025442 -0.023926 -0.023104 -0.02227 -0.020655
## beta.log(pc)      0.291738  0.293068  0.293721  0.29441  0.295716
## beta.log(emp)     0.761969  0.763755  0.764650  0.76554  0.767220
## beta.unemp       -0.005468 -0.005414 -0.005387 -0.00536 -0.005307
summary(Sigma2RanEff)
## 
## Iterations = 5001:15000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.1058028      0.0215133      0.0002151      0.0002151 
## 
## 2. Quantiles for each variable:
## 
##    2.5%     25%     50%     75%   97.5% 
## 0.07208 0.09086 0.10331 0.11751 0.15600
summary(Sigma2)
## 
## Iterations = 5001:15000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      1.453e-03      7.361e-05      7.361e-07      7.698e-07 
## 
## 2. Quantiles for each variable:
## 
##     2.5%      25%      50%      75%    97.5% 
## 0.001316 0.001403 0.001451 0.001501 0.001606

There are many extensions of this model. For instance, S. Chib and Carlin (1999) propose introducing heteroskedasticity in this model by assuming μitτitN(0,σ2/τit), τitG(v/2,v/2). We ask in Exercise 2 to perform inference on the relationship between productivity and public investment using this setting.

Another potential extension is to allow dependence between bi and some controls, let’s say zi, a K3-dimensional vector, and assume biN(Ziγ,D) where Zi=IK2zi, and complete the model using a prior for γ, γN(γ0,Γ0). We ask to perform a simulation using this setting in Exercise 3.

Example: Simulation exercise of the longitudinal normal model with heteroskedasticity

Let’s perform a simulation exercise to assess some potential extensions of the longitudinal hierarchical normal model. The point of departure is to assume that yit=β0+β1xit1+β2xit2+β3xit3+bi+wit1bi1+μit, where xitkN(0,1), k=1,2,3, wit1N(0,1), biN(0,0.71/2), bi1N(0,0.61/2), μitN(0,(0.1/τ)1/2), τitG(v/2,v/2) and β=[0.5 0.4 0.6 0.6], i=1,2,,50. The sample size is 2000 in an unbalanced panel structure.

Following same stages as in this section and Exercise 1, the posterior conditional distributions assuming that μitτitN(0,σ2/τit), τitG(v/2,v/2) are given by βσ2,τ,D,y,X,WN(βn,Bn), where τ=[τit], Bn=(B10+Ni=1XiV1iXi)1, βn=Bn(B10β0+Ni=1XiV1iyi), Vi=σ2Ψi+σ2biTiiTi and Ψi=diag{τ1it}. biβ,σ2,τ,D,y,X,WN(bni,Bni), where Bni=(σ2WiΨ1iWi+D1)1 and bni=Bni(σ2WiΨ1i(yiXiβ)). σ2β,b,τ,y,X,WIG(αn,δn), where αn=α0+12Ni=1Ti and δn=δ0+12Ni=1(yiXiβWibi)Ψ1i(yiXiβWibi).
DbIW(dn,Dn), where dn=d0+N and Dn=d0D0+Ni=1bibi. And τitσ2,β,b,y,X,WG(v1n/2,v2ni/2), where v1n=v+1 and v2ni=v+σ2(yitxitβwitbi)2.

The following code implements this simulation, and gets draws of the posterior distributions. We set MCMC iterations, burn-in and thinning parameters equal to 5000, 1000 and 1, respectively. In addition, β0=05, B0=I5, α0=δ0=0.001, d0=2, D0=I2 and v=5.

rm(list = ls()); set.seed(010101)
NT <- 2000; N <- 50
id <- c(1:N, sample(1:N, NT - N,replace=TRUE))
table(id)
## id
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
## 42 35 48 53 44 36 43 34 42 38 40 39 41 43 40 30 27 56 39 51 36 30 45 35 33 48 
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
## 37 51 32 46 47 41 44 39 40 48 37 44 37 42 42 34 42 34 41 35 36 34 31 38
x1 <- rnorm(NT); x2 <- rnorm(NT); x3 <- rnorm(NT) 
X <- cbind(1, x1, x2, x3); K1 <- dim(X)[2]
w1 <- rnorm(NT); W <- cbind(1, w1)
K2 <- dim(W)[2]; B <- c(0.5, 0.4, 0.6, -0.6)
D <- c(0.7, 0.6)
b1 <- rnorm(N, 0, sd = D[1]^0.5)
b2 <- rnorm(N, 0, sd = D[2]^0.5)
b <- cbind(b1, b2)
v <- 5; tau <- rgamma(NT, shape = v/2, rate = v/2)
sig2 <- 0.1; u <- rnorm(NT, 0, sd = (sig2/tau)^0.5)
y <- NULL
for(i in 1:NT){
    yi <- X[i,]%*%B + W[i,]%*%b[id[i],] + u[i] 
    y <- c(y, yi)
}
Data <- as.data.frame(cbind(y, x1, x2, x3, w1, id))
mcmc <- 5000; burnin <- 1000; thin <- 1; tot <- mcmc + burnin
b0 <- rep(0, K1); B0 <- diag(K1); B0i <- solve(B0) 
r0 <- K2; R0 <- diag(K2); a0 <- 0.001; d0 <- 0.001
PostBeta <- function(sig2, D, tau){
    XVX <- matrix(0, K1, K1)
    XVy <- matrix(0, K1, 1)
    for(i in 1:N){
        ids <- which(id == i)
        Ti <- length(ids)
        Wi <- W[ids, ]
        taui <- tau[ids]
        Vi <- sig2*solve(diag(1/taui)) + Wi%*%D%*%t(Wi)
        ViInv <- solve(Vi)
        Xi <- X[ids, ]
        XVXi <- t(Xi)%*%ViInv%*%Xi
        XVX <- XVX + XVXi
        yi <- y[ids]
        XVyi <- t(Xi)%*%ViInv%*%yi
        XVy <- XVy + XVyi
    }
    Bn <- solve(B0i + XVX)
    bn <- Bn%*%(B0i%*%b0 + XVy)
    Beta <- MASS::mvrnorm(1, bn, Bn)
    return(Beta)
}
Postb <- function(Beta, sig2, D, tau){
    Di <- solve(D);     bis <- matrix(0, N, K2)
    for(i in 1:N){
        ids <- which(id == i)
        Wi <- W[ids, ]; Xi <- X[ids, ]
        yi <- y[ids]; taui <- tau[ids]
        Taui <- solve(diag(1/taui))
        Wtei <- sig2^(-1)*t(Wi)%*%Taui%*%(yi - Xi%*%Beta)
        Bni <- solve(sig2^(-1)*t(Wi)%*%Taui%*%Wi + Di)
        bni <- Bni%*%Wtei
        bi <- MASS::mvrnorm(1, bni, Bni)
        bis[i, ] <- bi
    }
    return(bis)
}
PostSig2 <- function(Beta, bs, tau){
    an <- a0 + 0.5*NT
    ete <- 0
    for(i in 1:N){
        ids <- which(id == i)
        Xi <- X[ids, ]; yi <- y[ids]
        Wi <- W[ids, ]; taui <- tau[ids]
        Taui <- solve(diag(1/taui))
        ei <- yi - Xi%*%Beta - Wi%*%bs[i, ]
        etei <- t(ei)%*%Taui%*%ei
        ete <- ete + etei
    }
    dn <- d0 + 0.5*ete 
    sig2 <- MCMCpack::rinvgamma(1, shape = an, scale = dn)
    return(sig2)
}
PostD <- function(bs){
    rn <- r0 + N
    btb <- matrix(0, K2, K2)
    for(i in 1:N){
        bsi <- bs[i, ]
        btbi <- bsi%*%t(bsi)
        btb <- btb + btbi
    }
    Rn <- d0*R0 + btb
    Sigma <- MCMCpack::riwish(v = rn, S = Rn)
    return(Sigma)
}
PostTau <- function(sig2, Beta, bs){
    v1n <- v + 1
    v2n <- NULL
    for(i in 1:NT){
        Xi <- X[i, ]; yi <- y[i]
        Wi <- W[i, ]; bi <- bs[id[i],]
        v2ni <- v + sig2^(-1)*(yi - Xi%*%Beta - Wi%*%bi)^2
        v2n <- c(v2n, v2ni)
    }
    tau <- rgamma(NT, shape = rep(v1n/2, NT), rate = v2n/2)
    return(tau)
}
PostBetas <- matrix(0, tot, K1); PostDs <- matrix(0, tot, K2*(K2+1)/2)
PostSig2s <- rep(0, tot); Postbs <- array(0, c(N, K2, tot))
PostTaus <- matrix(0, tot, NT); RegLS <- lm(y ~ X - 1)
SumLS <- summary(RegLS)
Beta <- SumLS[["coefficients"]][,1]
sig2 <- SumLS[["sigma"]]^2; D <- diag(K2)
tau <- rgamma(NT, shape = v/2, rate = v/2) 
pb <- winProgressBar(title = "progress bar", min = 0, max = tot, width = 300)
for(s in 1:tot){
    bs <- Postb(Beta = Beta, sig2 = sig2, D = D, tau = tau)
    D <- PostD(bs = bs)
    Beta <- PostBeta(sig2 = sig2, D = D, tau = tau)
    sig2 <- PostSig2(Beta = Beta, bs = bs, tau = tau)
    tau <- PostTau(sig2 = sig2, Beta = Beta, bs = bs)
    PostBetas[s,] <- Beta
    PostDs[s,] <- matrixcalc::vech(D)
    PostSig2s[s] <- sig2
    Postbs[, , s] <- bs
    PostTaus[s,] <- tau
    setWinProgressBar(pb, s, title=paste( round(s/tot*100, 0),"% done"))
}
close(pb)
## NULL
keep <- seq((burnin+1), tot, thin)
Bs <- PostBetas[keep,]; Ds <- PostDs[keep,]
bs <- Postbs[, , keep]; sig2s <- PostSig2s[keep]
taus <- PostTaus[keep,]
summary(coda::mcmc(Bs))
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD  Naive SE Time-series SE
## [1,]  0.3215 0.12222 0.0017284      0.0017051
## [2,]  0.3696 0.01471 0.0002081      0.0001595
## [3,]  0.6254 0.01555 0.0002199      0.0001687
## [4,] -0.6068 0.01498 0.0002118      0.0001636
## 
## 2. Quantiles for each variable:
## 
##          2.5%     25%     50%     75%   97.5%
## var1  0.07833  0.2412  0.3232  0.4022  0.5619
## var2  0.34101  0.3598  0.3697  0.3793  0.3988
## var3  0.59596  0.6150  0.6251  0.6351  0.6574
## var4 -0.63722 -0.6165 -0.6067 -0.5966 -0.5785
summary(coda::mcmc(Ds))
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean      SD Naive SE Time-series SE
## [1,]  0.70746 0.14999 0.002121       0.002121
## [2,] -0.08457 0.09101 0.001287       0.001316
## [3,]  0.54517 0.11304 0.001599       0.001647
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%      50%      75%   97.5%
## var1  0.4720  0.5995  0.68858  0.79206 1.05285
## var2 -0.2721 -0.1405 -0.08185 -0.02482 0.09186
## var3  0.3689  0.4644  0.52978  0.60946 0.81999
summary(coda::mcmc(sig2s))
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.1531713      0.0592737      0.0008383      0.0014390 
## 
## 2. Quantiles for each variable:
## 
##   2.5%    25%    50%    75%  97.5% 
## 0.1022 0.1157 0.1324 0.1683 0.3217

We can see that all the 95% credible intervals encompass the population parameters, except for the second fixed effect and the variance of the model, but both for a tiny margin.

References

Chib, S., and B. Carlin. 1999. “On MCMC Sampling in Hierarchical Longitudinal Models.” Statistics and Computing 9: 17–26.
Ramírez Hassan, A. 2017. “The Interplay Between the Bayesian and Frequentist Approaches: A General Nesting Spatial Panel Data Model.” Spatial Economic Analysis 12 (1): 92–112.