Chapter 28 A Generalized Linear Model for Poison Response Data

For all \(i = 1, \ldots, n\), \(y_i \sim \text{Possion}(\lambda_i)\), \(\log(\lambda_i) = x_i'\beta\). The Poisson log likelihood is \[ \ell(\beta\mid y) = \sum_{i=1}^n \left[y_ix_i'\beta - exp(x_i'\beta) - \log(y_i!) \right] \] The \(\ell(\beta\mid y)\) can be maximized using Fisher’s scoring method to obtain the MLE.

Let \(\lambda = \exp(x'\beta)\) and \(\tilde{\lambda} = exp(\tilde{x}'\beta)\) where \(\tilde{x} = [x_1, \ldots, x_{j-1}, x_{j} + 1, x_{j+1}, \ldots, x_p ]'\), we have \(\tilde{\lambda}/\lambda = \exp(\beta_j)\). This means that all other explanatory variables held constant, the mean response at \(x_j + 1\) is \(\exp(\beta_j)\) times the mean response at \(x_j\).

o = glm(y ~ x, family = poisson(link = "log"))
summary(o) 

# likelihood ratio test 
anova(o, test = "Chisq")

Lack of Fit: Under saturated model, \(\lambda_i = y_i\). Then the likelihood ratio statistic for testing the Poisson GLM as the reduced model vs. the saturated model as the full model is \[ 2 \sum_{i = 1}^n \left[y_i\log\left(\frac{y_i}{\hat\lambda_i}\right) - (y_i - \hat\lambda_i) \right] \] which is the Deviance Statistic for the Poisson case.

The deviance residuals are given by \[ d_i \equiv \text{sign}(y_i - \hat\lambda_i)\sqrt{2\left[y_i\log\left(\frac{y_i}{\hat\lambda_i} \right) - (y_i - \hat\lambda_i)\right]} \] The Pearson’s Chi-square statistic is \(X^2 = \sum_{i=1}^{n}\left(\frac{y_{i}-\widehat{E}\left(y_{i}\right)}{\sqrt{\widehat{\operatorname{Var}}\left(y_{i}\right)}}\right)^{2} = \sum_{i=1}^n \left(\frac{y_i - \hat\lambda_i}{\sqrt{\hat\lambda_i}}\right)^2\). The Pearson residure \(r_i = (y_i - \hat\lambda_i)/\sqrt{\hat\lambda_i}\).

d = resid(o, type = "deviance")
r = resid(o, type = "pearson")

For the Poisson case, \(Var(y) = E(y) = \lambda\) is a function of \(E(y)\). If either the Deviance Statistic or the Pearson Chi-Square Statistic suggests a lack of fit that cannot be explained by other reasons (e.g., poor model for the mean or a few extreme outliers), overdispersion may be the problem.

Quasi-likelihood: Suppose \(Var(y_i) = \phi \lambda_i\) for some unknown dispersion parameter \(\phi > 1\). \(\phi\) can be estimated by \(\hat\phi = \sum_{i = 1}^n d_i^2/(n-p)\) or \(\hat\phi = \sum_{i=1}^n r_i^2/(n-p)\).

  • The estimated variance of \(\hat\beta\) is multiplied by \(\hat\phi\).
  • For Wald type inferences, the standard normal null distribution is replaced by \(t\) with \(n-p\) degrees of freedom.
  • Any test statistic \(T\) that was assumed \(\chi_q^2\) under \(H_0\) is replaced with \(T/(q\hat\phi)\) and compared to an \(F\) distribution with \(q\) and \(n-p\) degrees of freedom.
# estimates of the dispersion parameter 
deviance(o)/df.residual(o)

sum(r^2)/df.residual(o)