B.2 Logistic regression
B.2.1 Model formulation
When the response can take two values only, codified for convenience as (success) and (failure), is called a binary variable. A binary variable, known also as a Bernoulli variable, is a 292
If is a binary variable and are predictors associated with the purpose in logistic regression is to estimate
this is, how the probability of is changing according to particular values, denoted by of the predictors A tempting possibility is to consider a linear model for (B.7), However, such a model will run into serious problems inevitably: negative probabilities and probabilities greater than one will arise.
A solution is to consider a function to encapsulate the value of in and map it back to There are several alternatives to do so, based on distribution functions that deliver Different choices of give rise to different models, the most common one being the logistic distribution function:
Its inverse, known as the logit function, is
This is a link function, that is, a function that maps a given space (in this case ) onto The term link function is used in generalized linear models, which follow exactly the same philosophy of the logistic regression – mapping the domain of to in order to apply there a linear model. As said above, different link functions are possible, but we will concentrate here exclusively on the logit as a link function.
The logistic model is defined as the next parametric form for (B.7):
The linear form inside the exponent has a clear interpretation:
- If then ( and are equally likely).
- If then ( less likely).
- If then ( more likely).
To be more precise on the interpretation of the coefficients we need to introduce the concept of odds. The odds is an equivalent form of expressing the distribution of probabilities in a binary variable. Since and both the success and failure probabilities can be inferred from Instead of using to characterize the distribution of we can use
The odds is the ratio between the probability of success and the probability of failure. It is extensively used in betting due to its better interpretability.293 Conversely, if the odds of is given, we can easily know what is the probability of success using the inverse294 of (B.9):
Remark. Recall that the odds is a number in The and values are attained for and respectively. The log-odds (or logit) is a number in
We can rewrite (B.8) in terms of the odds (B.9). If we do so, then
This provides the following interpretation of the coefficients:
- : is the odds of when
- : is the multiplicative increment of the odds for an increment of one unit in provided that the remaining variables do not change. If the increment in is of units, then the multiplicative increment in the odds is
B.2.2 Model assumptions and estimation
Some probabilistic assumptions are required to perform inference on the model parameters from a sample These assumptions are somehow simpler than the ones for linear regression.

Figure B.4: The key concepts of the logistic model. The blue bars represent the conditional distribution of probability of for each cut in the axis. The red points represent a sample following the model.
The assumptions of the logistic model are the following:
- Linearity in the logit:295
- Binariness: are binary variables.
- Independence: are independent.
A good one-line summary of the logistic model is the following (independence is assumed):
Since the log-likelihood of is
Unfortunately, due to the nonlinearity of the optimization problem, there are no explicit solutions for These have to be obtained numerically by means of an iterative procedure, which may run into problems in low-sample situations with perfect classification. Unlike in the linear model, inference is not exact from the assumptions, but rather approximate in terms of maximum likelihood theory. We do not explore this further and refer the interested reader to, e.g., Section 5.3 in García-Portugués (2025).
Figure B.5: The logistic regression fit and its dependence on (horizontal displacement) and (steepness of the curve). Recall the effect of the sign of on the curve: if positive, the logistic curve has an “s” form; if negative, the form is a reflected “s”. Application available here.
Figure B.5 shows how the log-likelihood changes with respect to the values for in three data patterns. The data of the illustration has been generated with the following code.
# Data
set.seed(34567)
x <- rnorm(50, sd = 1.5)
y1 <- -0.5 + 3 * x
y2 <- 0.5 - 2 * x
y3 <- -2 + 5 * x
y1 <- rbinom(50, size = 1, prob = 1 / (1 + exp(-y1)))
y2 <- rbinom(50, size = 1, prob = 1 / (1 + exp(-y2)))
y3 <- rbinom(50, size = 1, prob = 1 / (1 + exp(-y3)))
# Data
data_mle <- data.frame(x = x, y1 = y1, y2 = y2, y3 = y3)
Let’s check that indeed the coefficients given by R’s glm
are the ones that maximize the likelihood of the animation of Figure B.5. We do so for y ~ x1
.
# Call glm
# glm employs formula = response ~ predictor1 + predictor2 + ...
# (names according to the data frame names) for denoting the regression
# to be done. We need to specify family = "binomial" to make a
# logistic regression
mod <- glm(y1 ~ x, family = "binomial", data = data_mle)
summary(mod)
##
## Call:
## glm(formula = y1 ~ x, family = "binomial", data = data_mle)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.47853 -0.40139 0.02097 0.38880 2.12362
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1692 0.4725 -0.358 0.720274
## x 2.4282 0.6599 3.679 0.000234 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 69.315 on 49 degrees of freedom
## Residual deviance: 29.588 on 48 degrees of freedom
## AIC: 33.588
##
## Number of Fisher Scoring iterations: 6
# mod is a list with a lot of information
# str(mod) # Long output
# Coefficients
mod$coefficients
## (Intercept) x
## -0.1691947 2.4281626
# Plot the fitted regression curve
x_grid <- seq(-5, 5, l = 200)
y_grid <- 1 / (1 + exp(-(mod$coefficients[1] + mod$coefficients[2] * x_grid)))
plot(x_grid, y_grid, type = "l", col = 2, xlab = "x", ylab = "y")
points(x, y1)
References
Recall that ↩︎
For example, if a horse has a probability of winning a race (), then the odds of the horse is This means that the horse has a probability of winning that is twice larger than the probability of losing. This is sometimes written as a or (spelled “two-to-one”).↩︎
For example, if the odds of the horse was that would correspond to a probability of winning ↩︎
An equivalent way of stating this assumption is ↩︎