6.7 Negative binomial model

The dependent variable in the negative binomial model is a nonnegative integer or count. In contrast to the Poisson model, the negative binomial model accounts for over-dispersion. The Poisson model assumes equi-dispersion, meaning the mean and variance are equal.

We assume that $y_i \stackrel{i.i.d.} {\sim} \text{NB}(\gamma, \theta_i)$ , where the density function for individual $i$ is $\frac{\Gamma(y_i + \gamma)}{\Gamma(\gamma) y_i!} (1 - \theta_i)^{y_i} \theta_i^{\gamma},$ with the success probability $\theta_i = \frac{\gamma}{\lambda_i + \gamma}$ , where $\lambda_i = \exp\left\{\mathbf{x}_i^{\top} \boldsymbol{\beta}\right\}$ is the mean, and $\gamma = \exp\left\{\alpha \right\}$ is the target for the number of successful trials, or the dispersion parameter, and $\mathbf{x}_i$ is a $K$ -dimensional vector of regressors.

We assume independent priors for this model: $\boldsymbol{\beta} \sim N(\boldsymbol{\beta}_0, \mathbf{B}_0)$ and $\alpha \sim G(\alpha_0, \delta_0)$ .

This model does not have standard conditional posterior distributions. Therefore, P. E. Rossi, Allenby, and McCulloch (2012) propose using a random-walk Metropolis–Hastings algorithm where the proposal distribution for $\boldsymbol{\beta}$ is Gaussian, centered at the current stage, with covariance matrix $s_{\boldsymbol{\beta}}^2 \hat{\mathbf{\Sigma}}_{\boldsymbol{\beta}}$ , where $s_{\boldsymbol{\beta}}$ is a tuning parameter and $\hat{\mathbf{\Sigma}}_{\boldsymbol{\beta}}$ is the maximum likelihood covariance estimator. Additionally, the proposal for $\alpha$ is normal, centered at the current value, with variance $s_{\alpha}^2 \hat{\sigma}_{\alpha}^2$ , where $s_{\alpha}$ is a tuning parameter and $\hat{\sigma}_{\alpha}^2$ is the maximum likelihood variance estimator.

Example: Simulation exercise

Let’s do a simulation exercise to check the performance of the M-H algorithms in the negative binomial model. There are two regressors, $x_{i1} \sim U(0,1)$ and $x_{i2} \sim N(0,1)$ , and the intercept. The dispersion parameter is $\gamma = \exp\left\{1.2\right\}$ , and $\boldsymbol{\beta} = \left[1 \ 1 \ 1\right]^{\top}$ . The sample size is 1,000.

We run this simulation using 10,000 MCMC iterations, a burn-in equal to 1,000, and a thinning parameter equal to 5. We set vague priors for the location parameters, particularly, $\boldsymbol{\beta}_0 = \boldsymbol{0}_{3}$ and $\boldsymbol{B}_0 = 1000\boldsymbol{I}_{3}$ , and $\alpha_0 = 0.5$ and $\delta_0 = 0.1$ , which are the default values in the rnegbinRw command from the bayesm package in R. In addition, the tuning parameters of the Metropolis–Hastings algorithms are $s_{\boldsymbol{\beta}} = 2.93/k^{1/2}$ and $s_{\alpha} = 2.93$ , which are also the default parameters in rnegbinRw, where $k$ is the number of location parameters.

We can run the negative binomial models in our GUI following the steps in the next Algorithm.

Algorithm: Negative binomial models

Select Univariate Models on the top panel
Select Negative Binomial (Poisson) model using the left radio button
Upload the dataset by first selecting whether there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend. You should see a preview of the dataset
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Select dependent and independent variables using the Formula builder table
Click the Build formula button to generate the formula in R syntax. You can modify the formula in the Main equation box using valid arguments of the formula command structure in R
Set the hyperparameters: mean vector, covariance matrix, shape, and scale parameters. This step is not necessary as by default our GUI uses non-informative priors
Select the tuning parameters for the Metropolis-Hastings algorithms
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

The following R code shows how to perform inference in the negative binomial model programming the M-H algorithms from scratch. We ask to estimate this example using the rnegbinRw command in Exercise 8.

We observe from the results that all 95% credible intervals encompass the population parameters, and the posterior means are very close to the population parameters.

rm(list = ls())
set.seed(010101)
N <- 2000 # Sample size
x1 <- runif(N); x2 <- rnorm(N)
X <- cbind(1, x1, x2); k <- dim(X)[2]; B <- rep(1, k)
alpha <- 1.2; gamma <- exp(alpha); lambda <- exp(X%*%B)
y <- rnbinom(N, mu = lambda, size = gamma)
# log likelihood
logLik <- function(par){
    alpha <- par[1]; beta <- par[2:(k+1)]
    gamma <- exp(alpha)
    lambda <- exp(X%*%beta)
    logLikNB <- sum(sapply(1:N, function(i){dnbinom(y[i], size = gamma, mu = lambda[i], log = TRUE)}))
    return(-logLikNB)
}
# Parameters: Proposal
par0 <- rep(0.5, k+1)
res.optim <- suppressWarnings(optim(par0, logLik, method="BFGS", hessian=TRUE))
res.optim$par

## [1] 1.3173049 1.0267103 0.9981069 0.9669848

res.optim$convergence

## [1] 0

Covar <- solve(res.optim$hessian) 
CovarBetas <- Covar[2:(k+1),2:(k+1)]
VarAlpha <- Covar[1:1]
# Hyperparameters: Priors
B0 <- 1000*diag(k); b0 <- rep(0, k)
alpha0 <- 0.5; delta0 <- 0.1
# Metropolis-Hastings function 
MHfunction <- function(iter, sbeta, salpha){
    Beta <- rep(0, k);  Acept1 <- NULL; Acept2 <- NULL
    BetasPost <- matrix(NA, iter, k); alpha <- 1
    alphaPost <- rep(NA, iter); par <- c(alpha, Beta)
    pb <- winProgressBar(title = "progress bar", min = 0, max = iter, width = 300)
    for(s in 1:iter){
        LogPostBeta <- -logLik(par) + dgamma(alpha, shape = alpha0, scale = delta0, log = TRUE) + mvtnorm::dmvnorm(Beta, mean = b0, sigma = B0, log = TRUE)
        BetaC <- c(MASS::mvrnorm(1, mu = Beta, Sigma = sbeta^2*CovarBetas))
        parC <- c(alpha, BetaC)
        LogPostBetaC <- -logLik(parC) + dgamma(alpha, shape = alpha0, scale = delta0, log = TRUE) +  mvtnorm::dmvnorm(BetaC, mean = b0, sigma = B0, log = TRUE)
        alpha1 <- min(exp((LogPostBetaC - mvtnorm::dmvnorm(BetaC, mean = Beta, sigma = sbeta^2*CovarBetas, log = TRUE))-(LogPostBeta - mvtnorm::dmvnorm(Beta, mean = Beta, sigma = sbeta^2*CovarBetas, log = TRUE))),1)
        u1 <- runif(1)
        if(u1 <= alpha1){Acept1i <- 1; Beta <- BetaC}else{
            Acept1i <- 0; Beta <- Beta
        }
        par <- c(alpha, Beta)
        LogPostBeta <- -logLik(par) + dgamma(alpha, shape = alpha0, scale = delta0, log = TRUE) + mvtnorm::dmvnorm(Beta, mean = b0, sigma = B0, log = TRUE)
        alphaC <- rnorm(1, mean = alpha, sd = salpha*VarAlpha^0.5)
        parC <- c(alphaC, Beta)
        LogPostBetaC <- -logLik(parC) + dgamma(alphaC, shape = alpha0, scale = delta0, log = TRUE) +  mvtnorm::dmvnorm(Beta, mean = b0, sigma = B0, log = TRUE)
        alpha2 <- min(exp((LogPostBetaC - dnorm(alphaC, mean = alpha, sd = salpha*VarAlpha^0.5, log = TRUE))-(LogPostBeta - dnorm(alpha, mean = alpha, sd = salpha*VarAlpha^0.5, log = TRUE))),1)
        u2 <- runif(1)
        if(u2 <= alpha2){Acept2i <- 1; alpha <- alphaC}else{
            Acept2i <- 0; alpha <- alpha
        }
        
        BetasPost[s, ] <- Beta; alphaPost[s] <- alpha
        Acept1 <- c(Acept1, Acept1i); Acept2 <- c(Acept2, Acept2i)
        setWinProgressBar(pb, s, title=paste( round(s/iter*100, 0),"% done"))
    }
    close(pb)
    AcepRateBeta <- mean(Acept1); AcepRateAlpha <- mean(Acept2)
    Results <- list(AcepRateBeta = AcepRateBeta, AcepRateAlpha = AcepRateAlpha, BetasPost = BetasPost, alphaPost = alphaPost)
    return(Results)
}
# MCMC parameters
mcmc <- 10000
burnin <- 1000
thin <- 5
iter <- mcmc + burnin
keep <- seq(burnin, iter, thin)
sbeta <- 2.93/sqrt(k); salpha <- 2.93
# Run M-H
ResultsPost <- MHfunction(iter = iter, sbeta = sbeta, salpha = salpha)
ResultsPost$AcepRateBeta

## [1] 0.3892727

ResultsPost$AcepRateAlpha

## [1] 0.4021818

summary(coda::mcmc(ResultsPost$BetasPost[keep[-1], ]))

## 
## Iterations = 1:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 2000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean      SD  Naive SE Time-series SE
## [1,] 1.0269 0.04612 0.0010314      0.0014371
## [2,] 0.9973 0.07593 0.0016978      0.0023584
## [3,] 0.9676 0.02343 0.0005239      0.0006485
## 
## 2. Quantiles for each variable:
## 
##        2.5%    25%    50%    75% 97.5%
## var1 0.9429 0.9946 1.0276 1.0567 1.118
## var2 0.8529 0.9454 0.9967 1.0525 1.138
## var3 0.9221 0.9525 0.9677 0.9831 1.014

summary(coda::mcmc(ResultsPost$alphaPost[keep[-1]]))

## 
## Iterations = 1:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 2000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##       1.280156       0.058052       0.001298       0.001491 
## 
## 2. Quantiles for each variable:
## 
##  2.5%   25%   50%   75% 97.5% 
## 1.169 1.242 1.279 1.317 1.396

References

Rossi, Peter E, Greg M Allenby, and Rob McCulloch. 2012. Bayesian Statistics and Marketing. John Wiley & Sons.