Chapter 26 A Generalized Linear Model for Bernoulli Response Data

Example: for each $i = 1, \ldots, n$ , $y_i \sim \text{Bernoulli}(\pi_i)$ , $\pi = \frac{\exp(x_i'\beta)}{1 + \exp(x_i'\beta)}$ and $y_1, \ldots, y_n$ are independent. This model is called a logistic regression model.

The function $g(\pi) = \log(\frac{\pi}{1-\pi})$ is called logit function. $\log(\frac{\pi}{1-\pi})$ is called log(odds).

Note that $g(\pi) = x_i'\beta$ . In GLM terminology, the logit is called the link function. However, for GLMs, it is not necessary that the mean of $y_i$ be a linear function of $\beta$ . Here are some other link functions for logistic regression:

probit: $\Phi^{-1}(\pi) = x'\beta$
complementary log-log (cloglog in R): $\log(-\log(1-\pi)) = x'\beta$

For GLMs, Fisher’s Scoring Method is typically used to obtain an MLE for $\beta$ , denote as $\hat\beta$ . Fisher’s Scoring Method is a variation of the Newton-Raphson algorithm in which the Hessianm atrix (matrix of second partial derivatives) is replaced by its expected value (-Fisher Information matrix).

For sufficiently large samples, $\hat\beta$ is approximately normal with mean $\beta$ and a variance-covariance matrix that can be approximated by the estimated inverse of the Fisher information matrix, i.e. $\hat\beta \sim N(\beta, I^{-1}(\beta))$ .

The Odds ratio: $\frac{\tilde{\pi}}{1-\tilde{\pi}}/\frac{\pi}{1-\pi} = \exp(\beta_j)$ . This can be explained as: A one unit increase in the jth explanatory variable (with all other explanatory variables held constant) is associated with a multiplicative change in the odds of success by the factor $\exp(\beta_j)$ .

If $(L_j, U_j)$ is a $100(1-\alpha)\%$ confidence interval for $\beta_j$ . then $(\exp(L_j), \exp(U_j))$ is a $100(1-\alpha)\%$ confidence interval for $\exp(\beta_j)$ . Also, a $100(1-\alpha)\%$ CI for $\pi$ is $\left(\frac{1}{1+\exp(-L_j)},\frac{1}{1+\exp(-U_j)}\right)$ .