Chapter 5 Generalized linear models

As we saw in Chapter 2, linear regression assumes that the response variable $Y$ is such that

$\begin{align*} Y|(X_1=x_1,\ldots,X_p=x_p)\sim \mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_px_p,\sigma^2) \end{align*}$

and hence

$\begin{align*} \mathbb{E}[Y|X_1=x_1,\ldots,X_p=x_p]=\beta_0+\beta_1x_1+\cdots+\beta_px_p. \end{align*}$

This, in particular, implies that $Y$ is continuous. In this chapter we will see how generalized linear models can deal with other kinds of distributions for $Y|(X_1=x_1,\ldots,X_p=x_p),$ particularly with discrete responses, by modeling the transformed conditional expectation. The simplest generalized linear model is logistic regression, which arises when $Y$ is a binary response, that is, a variable encoding two categories with $0$ and $1.$ This model would be useful, for example, to predict $Y$ given $X$ from the sample $\{(X_i,Y_i)\}_{i=1}^n$ in Figure 5.1.

$Scatterplot of a sample $\{(X_i,Y_i)\}_{i=1}^n$ sampled from a logistic regression.$

Figure 5.1: Scatterplot of a sample $\{(X_i,Y_i)\}_{i=1}^n$ sampled from a logistic regression.