# Chapter 28 A Generalized Linear Model for Poison Response Data

For all \(i = 1, \ldots, n\), \(y_i \sim \text{Possion}(\lambda_i)\), \(\log(\lambda_i) = x_i'\beta\). The Poisson log likelihood is
\[
\ell(\beta\mid y) = \sum_{i=1}^n \left[y_ix_i'\beta - exp(x_i'\beta) - \log(y_i!) \right]
\]
The \(\ell(\beta\mid y)\) can be maximized using *Fisher’s scoring method* to obtain the MLE.

Let \(\lambda = \exp(x'\beta)\) and \(\tilde{\lambda} = exp(\tilde{x}'\beta)\) where \(\tilde{x} = [x_1, \ldots, x_{j-1}, x_{j} + 1, x_{j+1}, \ldots, x_p ]'\), we have \(\tilde{\lambda}/\lambda = \exp(\beta_j)\). This means that all other explanatory variables held constant, the mean response at \(x_j + 1\) is \(\exp(\beta_j)\) times the mean response at \(x_j\).

```
= glm(y ~ x, family = poisson(link = "log"))
o summary(o)
# likelihood ratio test
anova(o, test = "Chisq")
```

**Lack of Fit**: Under saturated model, \(\lambda_i = y_i\). Then the likelihood ratio statistic for testing the Poisson GLM as the reduced model vs. the saturated model as the full model is
\[
2 \sum_{i = 1}^n \left[y_i\log\left(\frac{y_i}{\hat\lambda_i}\right) - (y_i - \hat\lambda_i) \right]
\]
which is the *Deviance Statistic* for the Poisson case.

The *deviance residuals* are given by
\[
d_i \equiv \text{sign}(y_i - \hat\lambda_i)\sqrt{2\left[y_i\log\left(\frac{y_i}{\hat\lambda_i} \right) - (y_i - \hat\lambda_i)\right]}
\]
The Pearson’s Chi-square statistic is \(X^2 = \sum_{i=1}^{n}\left(\frac{y_{i}-\widehat{E}\left(y_{i}\right)}{\sqrt{\widehat{\operatorname{Var}}\left(y_{i}\right)}}\right)^{2} = \sum_{i=1}^n \left(\frac{y_i - \hat\lambda_i}{\sqrt{\hat\lambda_i}}\right)^2\). The Pearson residure \(r_i = (y_i - \hat\lambda_i)/\sqrt{\hat\lambda_i}\).

```
= resid(o, type = "deviance")
d = resid(o, type = "pearson") r
```

For the Poisson case, \(Var(y) = E(y) = \lambda\) is a function of \(E(y)\). If either the Deviance Statistic or the Pearson Chi-Square Statistic suggests a lack of fit that cannot be explained by other reasons (e.g., poor model for the mean or a few extreme outliers), **overdispersion** may be the problem.

**Quasi-likelihood**: Suppose \(Var(y_i) = \phi \lambda_i\) for some unknown dispersion parameter \(\phi > 1\). \(\phi\) can be estimated by \(\hat\phi = \sum_{i = 1}^n d_i^2/(n-p)\) or \(\hat\phi = \sum_{i=1}^n r_i^2/(n-p)\).

- The estimated variance of \(\hat\beta\) is multiplied by \(\hat\phi\).
- For Wald type inferences, the standard normal null distribution is replaced by \(t\) with \(n-p\) degrees of freedom.
- Any test statistic \(T\) that was assumed \(\chi_q^2\) under \(H_0\) is replaced with \(T/(q\hat\phi)\) and compared to an \(F\) distribution with \(q\) and \(n-p\) degrees of freedom.

```
# estimates of the dispersion parameter
deviance(o)/df.residual(o)
sum(r^2)/df.residual(o)
```