7.5 Negative Binomial Regression

When modeling count data, Poisson regression assumes that the mean and variance are equal:

$\text{Var}(Y_i) = E(Y_i) = \mu_i$

However, in many real-world datasets, the variance exceeds the mean—a phenomenon known as overdispersion. When overdispersion is present, the Poisson model underestimates the variance, leading to:

Inflated test statistics (small p-values).
Overconfident predictions.
Poor model fit.

7.5.1 Negative Binomial Distribution

To address overdispersion, Negative Binomial regression introduces an extra dispersion parameter $\theta$ to allow variance to be greater than the mean: $\text{Var}(Y_i) = \mu_i + \theta \mu_i^2$ where:

$\mu_i = \exp(\mathbf{x_i' \theta})$ is the expected count.
$\theta$ is the dispersion parameter.
When $\theta \to 0$ , the NB model reduces to the Poisson model.

Thus, Negative Binomial regression is a generalization of Poisson regression that accounts for overdispersion.

7.5.2 Application: Negative Binomial Regression

We apply Negative Binomial regression to the bioChemists dataset to model the number of research articles (Num_Article) as a function of several predictors.

7.5.2.1 Fitting the Negative Binomial Model

# Load necessary package
library(MASS)

# Fit Negative Binomial model
NegBinom_Mod <- MASS::glm.nb(Num_Article ~ ., data = bioChemists)

# Model summary
summary(NegBinom_Mod)
#> 
#> Call:
#> MASS::glm.nb(formula = Num_Article ~ ., data = bioChemists, init.theta = 2.264387695, 
#>     link = log)
#> 
#> Coefficients:
#>                  Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      0.256144   0.137348   1.865 0.062191 .  
#> SexWomen        -0.216418   0.072636  -2.979 0.002887 ** 
#> MarriedMarried   0.150489   0.082097   1.833 0.066791 .  
#> Num_Kid5        -0.176415   0.052813  -3.340 0.000837 ***
#> PhD_Quality      0.015271   0.035873   0.426 0.670326    
#> Num_MentArticle  0.029082   0.003214   9.048  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(2.2644) family taken to be 1)
#> 
#>     Null deviance: 1109.0  on 914  degrees of freedom
#> Residual deviance: 1004.3  on 909  degrees of freedom
#> AIC: 3135.9
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  2.264 
#>           Std. Err.:  0.271 
#> 
#>  2 x log-likelihood:  -3121.917

Interpretation:

The coefficients are on the log scale.
The dispersion parameter $\theta$ (also called size parameter in some contexts) is estimated as 2.264 with a standard error of 0.271. Check Over-Dispersion for more detail.
Since $\theta$ is significantly different from 1, this confirms overdispersion, validating the choice of the Negative Binomial model over Poisson regression.

7.5.2.2 Model Comparison: Poisson vs. Negative Binomial

7.5.2.2.1 Checking Overdispersion in Poisson Model

Before using NB regression, we confirm overdispersion by computing:

$\hat{\phi} = \frac{\text{deviance}}{\text{degrees of freedom}}$

# Overdispersion check for Poisson model
Poisson_Mod$deviance / Poisson_Mod$df.residual
#> [1] 1.797988

If $\hat{\phi} > 1$ , overdispersion is present.
A large value suggests that Poisson regression underestimates variance.

7.5.2.2.2 Likelihood Ratio Test: Poisson vs. Negative Binomial

We compare the Poisson and Negative Binomial models using a likelihood ratio test, where: $G^2 = 2 \times ( \log L_{NB} - \log L_{Poisson})$ with $\text{df} = 1$

# Likelihood ratio test between Poisson and Negative Binomial
pchisq(2 * (logLik(NegBinom_Mod) - logLik(Poisson_Mod)),
       df = 1,
       lower.tail = FALSE)
#> 'log Lik.' 4.391728e-41 (df=7)

Small p-value (< 0.05) → Negative Binomial model is significantly better.
Large p-value (> 0.05) → Poisson model is adequate.

Since overdispersion is confirmed, the Negative Binomial model is preferred.

7.5.2.3 Model Diagnostics and Evaluation

7.5.2.3.1 Checking Dispersion Parameter $\theta$

The Negative Binomial dispersion parameter $\theta$ can be retrieved:

# Extract dispersion parameter estimate
NegBinom_Mod$theta
#> [1] 2.264388

A large $\theta$ suggests that overdispersion is not extreme, while a small $\theta$ (close to 0) would indicate the Poisson model is reasonable.

7.5.2.4 Predictions and Rate Ratios

In Negative Binomial regression, exponentiating the coefficients gives rate ratios:

# Convert coefficients to rate ratios
data.frame(`Odds Ratios` = exp(coef(NegBinom_Mod)))
#>                 Odds.Ratios
#> (Intercept)       1.2919388
#> SexWomen          0.8053982
#> MarriedMarried    1.1624030
#> Num_Kid5          0.8382698
#> PhD_Quality       1.0153884
#> Num_MentArticle   1.0295094

A rate ratio of:

> 1 $\to$ Increases expected article count.
< 1 $\to$ Decreases expected article count.
= 1 $\to$ No effect.

For example:

If PhD_Quality has an exponentiated coefficient of 1.5, individuals from higher-quality PhD programs are expected to publish 50% more articles.
If Sex has an exponentiated coefficient of 0.8, females publish 20% fewer articles than males, all else equal.

7.5.2.5 Alternative Approach: Zero-Inflated Models

If a dataset has excess zeros (many individuals publish no articles), Zero-Inflated Negative Binomial (ZINB) models may be required.

$\text{P}(Y_i = 0) = p + (1 - p) f(Y_i = 0 | \mu, \theta)$

where:

$p$ is the probability of always being a zero (e.g., inactive researchers).
$f(Y_i)$ follows the Negative Binomial distribution.

7.5.3 Fitting a Zero-Inflated Negative Binomial Model

# Load package for zero-inflated models
library(pscl)

# Fit ZINB model
ZINB_Mod <- zeroinfl(Num_Article ~ ., data = bioChemists, dist = "negbin")

# Model summary
summary(ZINB_Mod)
#> 
#> Call:
#> zeroinfl(formula = Num_Article ~ ., data = bioChemists, dist = "negbin")
#> 
#> Pearson residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2942 -0.7601 -0.2909  0.4448  6.4155 
#> 
#> Count model coefficients (negbin with log link):
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      0.4167465  0.1435966   2.902  0.00371 ** 
#> SexWomen        -0.1955068  0.0755926  -2.586  0.00970 ** 
#> MarriedMarried   0.0975826  0.0844520   1.155  0.24789    
#> Num_Kid5        -0.1517325  0.0542061  -2.799  0.00512 ** 
#> PhD_Quality     -0.0007001  0.0362697  -0.019  0.98460    
#> Num_MentArticle  0.0247862  0.0034927   7.097 1.28e-12 ***
#> Log(theta)       0.9763565  0.1354695   7.207 5.71e-13 ***
#> 
#> Zero-inflation model coefficients (binomial with logit link):
#>                 Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)     -0.19169    1.32282  -0.145  0.88478   
#> SexWomen         0.63593    0.84892   0.749  0.45379   
#> MarriedMarried  -1.49947    0.93867  -1.597  0.11017   
#> Num_Kid5         0.62843    0.44278   1.419  0.15582   
#> PhD_Quality     -0.03771    0.30801  -0.122  0.90254   
#> Num_MentArticle -0.88229    0.31623  -2.790  0.00527 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> Theta = 2.6548 
#> Number of iterations in BFGS optimization: 43 
#> Log-likelihood: -1550 on 13 Df

This model accounts for:

Structural zero inflation.
Overdispersion.

ZINB is often preferred when many observations are zero. However, since ZINB does not fall under the GLM framework, we will discuss it further in Nonlinear and Generalized Linear Mixed Models.

Why ZINB is Not a GLM?

Unlike GLMs, which assume a single response distribution from the exponential family, ZINB is a mixture model with two components:
- Count model – A negative binomial regression for the main count process.
- Inflation model – A logistic regression for excess zeros.

Because ZINB combines two distinct processes rather than using a single exponential family distribution, it does not fit within the standard GLM framework.

ZINB is part of finite mixture models and is sometimes considered within generalized linear mixed models (GLMMs) or semi-parametric models.