Chapter 5 Generalized linear models

As we saw in Chapter 2, linear regression assumes that the response variable Y is such that

Y|(X1=x1,,Xp=xp)N(β0+β1x1++βpxp,σ2)

and hence

E[Y|X1=x1,,Xp=xp]=β0+β1x1++βpxp.

This, in particular, implies that Y is continuous. In this chapter we will see how generalized linear models can deal with other kinds of distributions for Y|(X1=x1,,Xp=xp), particularly with discrete responses, by modeling the transformed conditional expectation. The simplest generalized linear model is logistic regression, which arises when Y is a binary response, that is, a variable encoding two categories with 0 and 1. This model would be useful, for example, to predict Y given X from the sample {(Xi,Yi)}ni=1 in Figure 5.1.

Scatterplot of a sample \(\{(X_i,Y_i)\}_{i=1}^n\) sampled from a logistic regression.

Figure 5.1: Scatterplot of a sample {(Xi,Yi)}ni=1 sampled from a logistic regression.