# Chapter 27 A Generalized Linear Model for Binomial Response Data

For all $$i = 1, \ldots, n$$, $$y_i\sim \text{Binomial}(m_i, \pi_i)$$, where $$m_i$$ is known number of trials for observation $$i$$, $$\pi_i = \frac{\exp(x_i'\beta)}{1+\exp(x_i'\beta)}$$, and $$y_1, \ldots, y_n$$ are independent. The Binomial log likelihood is $\ell(\beta\mid y) = \sum_{i = 1}^n [y_ix_i'\beta - m_i\log(1 + \exp(-x_i'\beta))] + \text{constant}$ We can compare the fit of a logitstic regression model known as saturated model. The MLE of $$\pi_i$$ under the logistic regression model is $$\hat\pi_i = \frac{\exp(x_i'\hat\beta)}{1 + \exp(x_i'\hat\beta)}$$, and the MLE of $$\pi_i$$ under saturated model is $$y_i/m_i$$. Then the likelihood ratio statistic for testing the logistic regression model as the reduced model VS. the saturated model as the full model is $2 \sum_{i=1}^{n}\left[y_{i} \log \left(\frac{y_{i} / m_{i}}{\hat{\pi}_{i}}\right)+\left(m_{i}-y_{i}\right) \log \left(\frac{1-y_{i} / m_{i}}{1-\hat{\pi}_{i}}\right)\right]$ which is called the Deviance Statistics, the Residual Deviance or just the Deviance.

A Lack-of-fit Test: when $$n$$ is large, and/or $$m_1, \ldots m_n$$, are each suitablely large, the Deviance Statistic is approximately $$\chi_{n-p}^2$$ if the logistic regression model is correct.

Deviance Residual: $d_{i} \equiv \operatorname{sign}\left(y_{i} / m_{i}-\hat{\pi}_{i}\right) \sqrt{2\left[y_{i} \log \left(\frac{y_{i}}{m_{i} \hat{\pi}_{i}}\right)+\left(m_{i}-y_{i}\right) \log \left(\frac{m_{i}-y_{i}}{m_{i}-m_{i} \hat{\pi}_{i}}\right)\right]}$ The residual deviance statistic is the sum of the squared deviance residuals $$(\sum_{i = 1}^n d_i^2)$$.

Pearson’s Chi-Square Statistic: Another lack of fit statistic that is approximately $$\chi_{n-p}^2$$ under the null is Pearson’s Chi-Square Statistic: \begin{aligned} X^{2} = \sum_{i=1}^{n}\left(\frac{y_{i}-\widehat{E}\left(y_{i}\right)}{\sqrt{\widehat{\operatorname{Var}}\left(y_{i}\right)}}\right)^{2} = \sum_{i=1}^{n}\left(\frac{y_{i}-m_{i} \hat{\pi}_{i}}{\sqrt{m_{i} \hat{\pi}_{i}\left(1-\hat{\pi}_{i}\right)}}\right)^{2} . \end{aligned} The term $$r_i = \frac{y_{i}-m_{i} \hat{\pi}_{i}}{\sqrt{m_{i} \hat{\pi}_{i}\left(1-\hat{\pi}_{i}\right)}}$$ is known as Pearson residual.

Residual Diagnostics: For large $$m_i$$ values, both $$d_i$$ and $$r_i$$ should be approximately distributed as standard normal random variables if the logistic regression model is correct.

o=g1m(cbind(tumor, total-tumor)~dose,
summary(o)
Overdispersion: in the GLM framework, its often the case that $$Var(y_i)$$ is the function of $$E(y_i)$$. That is the case for logistic regression where $$Var(y_i) = m_i\pi(1-\pi_i) = m_i\pi_i - (m_i\pi_i)^2/m_i = E(y_i) - [E(y_i)]^2/m_i$$. If the variability of our response is greater than we should expect based on our estimates of the mean, we say that there is overdispersion.
Quasi-likelihood Inference: in the binomial case, we make all the same assumptions as before except that we assume $$Var(y_i) = \phi m_i\pi_i(1-\pi_i)$$ for some unknown dispersion parameter $$\phi > 1$$. The dispersion parameter can be estimated by $$\hat\phi = \sum_{i=1}^n d_i^2/(n-p)$$ or $$\hat\phi = \sum_{i=1}^n r_i^2/(n-p)$$.
• The estimated variance of $$\hat\beta$$ is multiplied by $$\hat\phi$$.
• For Wald type inferences, the standard normal null distribution is replaced by $$t$$ with $$n - p$$ degrees of freedom.
• Any test statistic $$T$$ that was assumed $$\chi_q^2$$ under $$H_0$$ is replaced with $$T/(q\hat\phi)$$ and compared to an $$F$$ distribution with $$q$$ and $$n-p$$ degrees of freedom.
oq=g1m(cbind(tumor, total-tumor)~dosef,
summary(oq)