4.4 Assumptions of the model

Some probabilistic assumptions are required for performing inference on the model parameters \(\boldsymbol\beta\) from the sample \((\mathbf{X}_1, Y_1),\ldots,(\mathbf{X}_n, Y_n)\). These assumptions are somehow simpler than the ones for linear regression.

The key concepts of the logistic model.

Figure 4.7: The key concepts of the logistic model.

The assumptions of the logistic model are the following:

  1. Linearity in the logit31: \(\mathrm{logit}(p(\mathbf{x}))=\log\frac{ p(\mathbf{x})}{1-p(\mathbf{x})}=\beta_0+\beta_1x_1+\ldots+\beta_kx_k\).
  2. Binariness: \(Y_1,\ldots,Y_n\) are binary variables.
  3. Independence: \(Y_1,\ldots,Y_n\) are independent.
A good one-line summary of the logistic model is the following (independence is assumed) \[\begin{align} Y|(X_1=x_1,\ldots,X_k=x_k)&\sim\mathrm{Ber}\left(\mathrm{logistic}(\beta_0+\beta_1x_1+\ldots+\beta_kx_k)\right)\nonumber\\ &=\mathrm{Ber}\left(\frac{1}{1+e^{-(\beta_0+\beta_1x_1+\ldots+\beta_kx_k)}}\right).\tag{4.9} \end{align}\]

There are three important points of the linear model assumptions missing in the ones for the logistic model:

  • Why is homoscedasticity not required? As seen in the previous section, Bernoulli variables are determined only by the probability of success, in this case \(p(\mathbf{x})\). That determines also the variance, which is variable, so there is heteroskedasticity. In the linear model, we have to control \(\sigma^2\) explicitly due to the higher flexibility of the normal.
  • Where are the errors? The errors played a fundamental role in the linear model assumptions, but are not employed in logistic regression. The errors are not fundamental for building the linear model but just a helpful concept related to least squares. The linear model can be constructed without errors as (3.5), which has a logistic analogous in (4.9).
  • Why is normality not present? A normal distribution is not adequate to replace the Bernoulli distribution in (4.9) since the response \(Y\) has to be binary and the Normal or other continuous distribution would put yield illegal values for \(Y\).

Recall that:

  • Nothing is said about the distribution of X1, …, Xk. They could be deterministic or random. They could be discrete or continuous.
  • X1, …, Xk are not required to be independent between them.

  1. An equivalent way of stating this assumption is \(p(\mathbf{x})=\mathrm{logistic}(\beta_0+\beta_1x_1+\ldots+\beta_kx_k)\).