9.3 Poisson model

We can use same ideas as in Section 9.2 to perform inference in longitudinal/panel datasets where the dependent variable takes non-negative integers. Let’s assume that $y_{it}\sim P(\lambda_{it})$ where $\log(\lambda_{it})=y_{it}^*$ such that $y_{it}^*\sim N(\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}+\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i,\sigma^2)$ . We can augment the model with the latent variable $y_{it}^{*}$ , and again use a Metropolis-within-Gibbs algorithm to perform inference in this model.

The posterior distribution of this model is $\begin{align*} \pi(\boldsymbol{\beta},\sigma^2, \boldsymbol{b}_i, \boldsymbol{D}, \boldsymbol{y}^*\mid \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W})&\propto \prod_{i=1}^N \prod_{t=1}^{T_i}\left\{\lambda_{it}^{y_{it}}\exp\left\{-\lambda_{it}\right\}\right.\\ &\left.\times (\sigma^2)^{-1}\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^*-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^*-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}\right\}\\ &\times \exp\left\{-\frac{1}{2}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)^{\top}\boldsymbol{B}_0^{-1}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)\right\}\\ &\times \exp\left\{-\frac{1}{2}\sum_{i=1}^N \boldsymbol{b}_i^{\top}\boldsymbol{D}^{-1}\boldsymbol{b}_i\right\}\\ &\times (\sigma^2)^{-\alpha_0-1}\exp\left\{-\frac{\delta_0}{\sigma^2}\right\}\\ &\times |\boldsymbol{D}|^{-(d_0+K_2+1)/2}\exp\left\{-\frac{1}{2}tr(d_0\boldsymbol{D}_0\boldsymbol{D}^{-1})\right\}. \end{align*}$

We can get samples of $y_{it}^*$ from a normal distribution with mean equal to $\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}+\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i$ and variance $\sigma^2$ , and use these samples to get $\lambda_{it}=\exp(y_{it}^*)$ , $y_{it}^{*c}=y_{it}^{*}+\epsilon_{it}$ , where $\epsilon_{it}\sim\mathcal{N}(0,v^2)$ , $v$ is a tuning parameter to get good acceptance rates, and $\lambda_{it}^c=\exp(y_{it}^{*c})$ . The acceptance rate of the Metropolis-Hastings algorithm is $\begin{align*} \alpha=\min\left(1,\frac{ \lambda_{it}^{cy_{it}}\exp(-\lambda_{it}^c)\times\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^{c*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^{c*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}}{\lambda_{it}^{y_{it}}\exp(-\lambda_{it})\times\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}}\right). \end{align*}$

In addition, we should use the posterior conditional distributions from Section 9.1 to complete the algorithm getting samples of $\boldsymbol{\beta}$ , $\sigma^2$ , $\boldsymbol{b}_i$ and $\boldsymbol{D}$ replacing $y_{it}$ by ${y}_{it}^*$ .

We should take into account for doing predictions that $\mathbb{E}[{\lambda}_{it}]=\exp\left\{\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}+\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i+0.5\sigma^2\right\}$ (Diggle et al. 2002).

Example: Simulation exercise

Let’s perform a simulation exercise to assess the performance of the hierarchical longitudinal Poisson model. The point of departure is to assume that $y_{it}^* = \beta_0 + \beta_1 x_{it1} + \beta_2 x_{it2} + \beta_3 x_{it3} + b_i + w_{it1} b_{i1},$ where $x_{itk} \sim N(0,1)$ for $k = 1, 2, 3$ , $w_{it1} \sim N(0,1)$ , $b_i \sim N(0, 0.7^{1/2})$ , $b_{i1} \sim N(0, 0.6^{1/2})$ , and $\boldsymbol{\beta} = [0.5 \ 0.4 \ 0.6 \ -0.6]^{\top}$ , with $i = 1, 2, \dots, 50$ . Additionally, $y_{it} \sim P(\lambda_{it})$ , where $\lambda_{it} = \exp(y_{it}^*)$ . The sample size is 1000 in an unbalanced panel structure.

We set the priors as $\boldsymbol{\beta}_0 = \boldsymbol{0}_4$ , $\boldsymbol{B}_0 = \boldsymbol{I}_4$ , $\alpha_0 = \delta_0 = 0.001$ , $d_0 = 2$ , and $\boldsymbol{D}_0 = \boldsymbol{I}_2$ . The number of MCMC iterations, burn-in, and thinning parameters are 15,000, 5,000, and 10, respectively.

The following code shows how to perform inference in the hierarchical longitudinal Poisson model programming the Metropolis-within-Gibbs sampler.

rm(list = ls()); set.seed(010101)
NT <- 1000; N <- 50
id <- c(1:N, sample(1:N, NT - N,replace=TRUE))
x1 <- rnorm(NT); x2 <- rnorm(NT); x3 <- rnorm(NT) 
X <- cbind(1, x1, x2, x3)
K1 <- dim(X)[2]; w1 <- rnorm(NT) 
W <- cbind(1, w1); K2 <- dim(W)[2]
B <- c(0.5, 0.4, 0.6, -0.6)
D <- c(0.7, 0.6); sig2 <- 0.1
b1 <- rnorm(N, 0, sd = D[1]^0.5)
b2 <- rnorm(N, 0, sd = D[2]^0.5)
b <- cbind(b1, b2)
yl <- NULL
for(i in 1:NT){
    ylmeani <- X[i,]%*%B + W[i,]%*%b[id[i],]
    yli <- rnorm(1, ylmeani, sig2^0.5)
    yl <- c(yl, yli)
}
lambdait <- exp(yl); y <- rpois(NT, lambdait)
Data <- as.data.frame(cbind(y, x1, x2, x3, w1, id))
mcmc <- 15000; burnin <- 5000; thin <- 10; tot <- mcmc + burnin
b0 <- rep(0, K1); B0 <- diag(K1); B0i <- solve(B0) 
r0 <- K2; R0 <- diag(K2); a0 <- 0.001; d0 <- 0.001
LatentMHV1 <- function(tuning, Beta, bs, sig2){
    ylhat <- rep(0, NT)
    accept <- NULL
    for(i in 1:NT){
        ids <- which(id == i)
        yi <- y[i]
        ylhatmeani <- X[i,]%*%Beta + W[i,]%*%bs[id[i],]
        ylhati <- rnorm(1, ylhatmeani, sd = sig2^0.5)
        lambdahati <- exp(ylhati)
        ei <- rnorm(1, 0, sd = tuning)
        ylpropi <- ylhati + ei
        lambdapropi <- exp(ylpropi)
        logPosthati <- sum(dpois(yi, lambdahati, log = TRUE) + dnorm(ylhati, ylhatmeani, sig2^0.5, log = TRUE))
        logPostpropi <- sum(dpois(yi, lambdapropi, log = TRUE) + dnorm(ylpropi, ylhatmeani, sig2^0.5, log = TRUE))
        alphai <- min(1, exp(logPostpropi - logPosthati))
        ui <- runif(1)
        if(ui <= alphai){
            ylhati <- ylpropi; accepti <- 1
        }else{
            ylhati <- ylhati; accepti <- 0
        }
        ylhat[i] <- ylhati
        accept <- c(accept, accepti)
    }
    res <- list(ylhat = ylhat, accept = mean(accept))
    return(res)
}
PostBeta <- function(D, ylhat, sig2){
    XVX <- matrix(0, K1, K1); XVy <- matrix(0, K1, 1)
    for(i in 1:N){
        ids <- which(id == i); Ti <- length(ids)
        Wi <- W[ids, ]
        Vi <- diag(Ti)*sig2 + Wi%*%D%*%t(Wi)
        ViInv <- solve(Vi); Xi <- X[ids, ]
        XVXi <- t(Xi)%*%ViInv%*%Xi
        XVX <- XVX + XVXi
        yi <- ylhat[ids]
        XVyi <- t(Xi)%*%ViInv%*%yi
        XVy <- XVy + XVyi
    }
    Bn <- solve(B0i + XVX); bn <- Bn%*%(B0i%*%b0 + XVy)
    Beta <- MASS::mvrnorm(1, bn, Bn)
    return(Beta)
}
Postb <- function(Beta, D, ylhat, sig2){
    Di <- solve(D); bis <- matrix(0, N, K2)
    for(i in 1:N){
        ids <- which(id == i)
        Wi <- W[ids, ]; Xi <- X[ids, ]
        yi <- ylhat[ids]
        Wtei <- sig2^(-1)*t(Wi)%*%(yi - Xi%*%Beta)
        Bni <- solve(sig2^(-1)*t(Wi)%*%Wi + Di)
        bni <- Bni%*%Wtei
        bi <- MASS::mvrnorm(1, bni, Bni)
        bis[i, ] <- bi
    }
    return(bis)
}
PostD <- function(bs){
    rn <- r0 + N; btb <- matrix(0, K2, K2)
    for(i in 1:N){
        bsi <- bs[i, ]; btbi <- bsi%*%t(bsi)
        btb <- btb + btbi
    }
    Rn <- d0*R0 + btb
    Sigma <- MCMCpack::riwish(v = rn, S = Rn)
    return(Sigma)
}
PostSig2 <- function(Beta, bs, ylhat){
    an <- a0 + 0.5*NT; ete <- 0
    for(i in 1:N){
        ids <- which(id == i)
        Xi <- X[ids, ]
        yi <- ylhat[ids]
        Wi <- W[ids, ]
        ei <- yi - Xi%*%Beta - Wi%*%bs[i, ]
        etei <- t(ei)%*%ei
        ete <- ete + etei
    }
    dn <- d0 + 0.5*ete 
    sig2 <- MCMCpack::rinvgamma(1, shape = an, scale = dn)
    return(sig2)
}
PostBetas <- matrix(0, tot, K1); PostDs <- matrix(0, tot, K2*(K2+1)/2)
Postbs <- array(0, c(N, K2, tot)); PostSig2s <- rep(0, tot)
Accepts <- rep(NULL, tot)
RegPois <- glm(y ~ X - 1, family = poisson(link = "log"))
SumPois <- summary(RegPois)
Beta <- SumPois[["coefficients"]][,1]
sig2 <- sum(SumPois[["deviance.resid"]]^2)/SumPois[["df.residual"]]
D <- diag(K2); bs1 <- rnorm(N, 0, sd = D[1,1]^0.5)
bs2 <- rnorm(N, 0, sd = D[2,2]^0.5); bs <- cbind(bs1, bs2)
tuning <- 0.1; ropt <- 0.44
tunepariter <- seq(round(tot/10, 0), tot, round(tot/10, 0));   l <- 1
pb <- winProgressBar(title = "progress bar", min = 0, max = tot, width = 300)
for(s in 1:tot){
    LatY <- LatentMHV1(tuning = tuning, Beta = Beta, bs = bs, sig2 = sig2)
    ylhat <- LatY[["ylhat"]]
    bs <- Postb(Beta = Beta, D = D, ylhat=ylhat, sig2 = sig2)
    D <- PostD(bs = bs)
    Beta <- PostBeta(D = D, ylhat = ylhat, sig2 = sig2)
    sig2 <- PostSig2(Beta = Beta, bs = bs, ylhat = ylhat)
    PostBetas[s,] <- Beta
    PostDs[s,] <- matrixcalc::vech(D)
    Postbs[, , s] <- bs; PostSig2s[s] <- sig2
    AcceptRate <- LatY[["accept"]]
    Accepts[s] <- AcceptRate
    if(AcceptRate > ropt){
        tuning = tuning*(2-(1-AcceptRate)/(1-ropt))
    }else{
        tuning = tuning/(2-AcceptRate/ropt)
    }
    if(s == tunepariter[l]){
        print(AcceptRate); l <- l + 1
    }
    setWinProgressBar(pb, s, title=paste( round(s/tot*100, 0),"% done"))
}

## [1] 0.434
## [1] 0.447
## [1] 0.448
## [1] 0.451
## [1] 0.439
## [1] 0.46
## [1] 0.44
## [1] 0.409
## [1] 0.477
## [1] 0.479

close(pb)

## NULL

keep <- seq((burnin+1), tot, thin)
Bs <- PostBetas[keep,]; Ds <- PostDs[keep,]
bs <- Postbs[, , keep]; sig2s <- PostSig2s[keep]
summary(coda::mcmc(Bs))

## 
## Iterations = 1:1500
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD Naive SE Time-series SE
## [1,]  0.5169 0.20869 0.005388       0.004732
## [2,]  0.3610 0.06215 0.001605       0.002049
## [3,]  0.5450 0.06383 0.001648       0.001999
## [4,] -0.5713 0.06544 0.001690       0.001903
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%     50%     75%   97.5%
## var1  0.1038  0.3803  0.5199  0.6534  0.9259
## var2  0.2432  0.3166  0.3608  0.4003  0.4796
## var3  0.4213  0.5017  0.5453  0.5885  0.6682
## var4 -0.7038 -0.6149 -0.5729 -0.5269 -0.4459

summary(coda::mcmc(Ds))

## 
## Iterations = 1:1500
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD Naive SE Time-series SE
## [1,]  0.5965 0.16852 0.004351       0.005085
## [2,] -0.1323 0.09418 0.002432       0.003072
## [3,]  0.2885 0.09854 0.002544       0.003397
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%     50%      75%   97.5%
## var1  0.3331  0.4788  0.5732  0.69316 0.99135
## var2 -0.3354 -0.1926 -0.1277 -0.06692 0.03674
## var3  0.1252  0.2182  0.2780  0.34731 0.51055

We can see that all 95% credible intervals encompass the population parameters of the fixed effects, the posterior medians are relatively near the population values. However, we do not get good posterior estimates of the covariance matrix of the random effects as the 95% credible intervals do not encompass the second element of the diagonal of this matrix. In addition, the posterior draws of this algorithm over-estimates the over-dispersion parameter.

We can perform inference for the hierarchical longitudinal Poisson model in our GUI using the following Algorithm. Our GUI is based on the MCMChpoisson command from the MCMCpack package.⁵³

Algorithm: Hierarchical Longitudinal Poisson Model

Select Hierarchical Longitudinal Model on the top panel
Select Poisson model using the left radio button
Upload the dataset selecting first if there is a header in the file and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Write down the formula of the fixed effects equation in the Main Equation: Fixed Effects box. This formula must be written using the syntax of the formula command of R software. This equation includes the intercept by default, so do not include it in the equation
Write down the formula of the random effects equation in the Main Equation: Random Effects box without writing the dependent variable, that is, starting the equation with the tilde (~) symbol. This formula must be written using the syntax of the formula command of R software. This equation includes the intercept by default, so do not include it in the equation. If there are just random intercepts, do not write anything in this box
Write down the name of the grouping variable, that is, the variable that indicates the cross-sectional units
Set the hyperparameters of the fixed effects: mean vector, covariance matrix, shape, and scale parameters. This step is not necessary as by default our GUI uses non-informative priors
Set the hyperparameters of the random effects: degrees of freedom and scale matrix of the inverse Wishart distribution. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

References

Diggle, Peter., P. Heagerty, Liang K-Y., and S. Zeger. 2002. Analysis of Longitudinal Data. Oxford University Press.

At the time of writing this book there was an issue with the function MCMChpoisson from MCMCpack. We contact the maintainer, but users may have issues running this algorithm or running this function directly in R.↩︎