17.6 Types of Marginal Effect

17.6.1 Average Marginal Effect

The Average Marginal Effect (AME) measures the expected change in the predicted probability when an independent variable increases by a small amount while holding all other variables constant. Unlike marginal effects at the mean (MEM), AMEs average the marginal effects across all observations, providing a more representative measure.

Applications of AMEs

Marketing: How much does increasing ad spend change the probability of a customer purchase?
Finance: How does an interest rate change impact the probability of loan approval?
Econometrics: What is the effect of education on the probability of employment?

Since nonlinear models like logit and probit do not have constant marginal effects, AMEs require numerical differentiation. There are two common approaches:

One-Sided Numerical Derivative: Uses a small forward step $h$ to estimate the derivative.
Two-Sided Numerical Derivative: Takes both a forward step and a backward step to improve accuracy.

17.6.1.1 One-Sided Numerical Derivative

To estimate $\frac{\partial p(\mathbf{X},\beta)}{\partial X}$ numerically:

Algorithm

Estimate the model using logistic (or probit) regression.
For each observation $i$ :
1. Compute predicted probability using the observed data: $\hat{Y}_{i0} = p(\mathbf{X}_i, \hat{\beta}).$
2. Increase $X$ by a small step $h$ , where:
  - If $X$ is continuous, choose:
    $h = (|\bar{X}| + 0.001) \times 0.001.$
  - If $X$ is discrete, set $h = 1$ .
3. Compute the new predicted probability: $\hat{Y}_{i1} = p(\mathbf{X}_i + h, \hat{\beta}).$
4. Compute the numerical derivative: $\frac{\hat{Y}_{i1} - \hat{Y}_{i0}}{h}.$
Average across all observations: $E\left[\frac{\hat{Y}_{i1} - \hat{Y}_{i0}}{h}\right] \approx \frac{\partial p (Y|\mathbf{X}, \beta)}{\partial X}.$

# Load necessary packages
library(margins)
library(sandwich)
library(lmtest)

# Simulate data
set.seed(123)
n <- 100
X <- rnorm(n)
Y <- rbinom(n, 1, plogis(0.5 + 0.8 * X))  # Logistic function

# Logistic regression
logit_model <- glm(Y ~ X, family = binomial(link = "logit"))

# Define step size h for continuous variable
X_mean <- mean(X)
h <- (abs(X_mean) + 0.001) * 0.001

# Compute predicted probabilities at original X
pred_Y0 <- predict(logit_model, type = "response")

# Compute predicted probabilities at X + h
X_new <- X + h
data_new <- data.frame(X = X_new)
pred_Y1 <-
    predict(logit_model, newdata = data_new, type = "response")

# Compute marginal effects
marginal_effects <- (pred_Y1 - pred_Y0) / h

# Compute Average Marginal Effect (AME)
AME_one_sided <- mean(marginal_effects)

# Display results
data.frame(Method = "One-Sided AME", Estimate = AME_one_sided)
#>          Method  Estimate
#> 1 One-Sided AME 0.1921614

The AME is the average effect of $X$ on the probability of $Y=1$ .
Since logistic regression is nonlinear, the effect varies across observations.
This method assumes a small $h$ provides a good approximation.

17.6.1.2 Two-Sided Numerical Derivative

To improve accuracy, we use the two-sided derivative:

Algorithm

Estimate the model using logistic (or probit) regression.
For each observation $i$ :
1. Compute the original predicted probability: $\hat{Y}_{i0} = p(\mathbf{X}_i, \hat{\beta}).$
2. Compute the new predicted probabilities:
  - Increase $X$ by $h$ : $\hat{Y}_{i1} = p(\mathbf{X}_i + h, \hat{\beta}).$
  - Decrease $X$ by $h$ : $\hat{Y}_{i2} = p(\mathbf{X}_i - h, \hat{\beta}).$
3. Compute the numerical derivative: $\frac{\hat{Y}_{i1} - \hat{Y}_{i2}}{2h}.$
Average across all observations: $E\left[\frac{\hat{Y}_{i1} - \hat{Y}_{i2}}{2h}\right] \approx \frac{\partial p (Y|\mathbf{X}, \beta)}{\partial X}.$

# Compute predicted probabilities at X - h
X_new_minus <- X - h
data_new_minus <- data.frame(X = X_new_minus)
pred_Y2 <-
    predict(logit_model, newdata = data_new_minus, type = "response")

# Compute two-sided marginal effects
marginal_effects_2sided <- (pred_Y1 - pred_Y2) / (2 * h)

# Compute Average Marginal Effect (AME) - Two-Sided
AME_two_sided <- mean(marginal_effects_2sided)

# Display results
data.frame(Method = "Two-Sided AME", Estimate = AME_two_sided)
#>          Method  Estimate
#> 1 Two-Sided AME 0.1921633

Comparison of One-Sided vs. Two-Sided AME
Method	Accuracy	Computational Cost	Bias
One-Sided	Lower	Faster	Higher
Two-Sided	Higher	Slightly Slower	Lower

One-sided AME is computationally simpler but can introduce bias.
Two-sided AME reduces bias but requires two function evaluations per observation.

17.6.2 Marginal Effects at the Mean

Marginal effects in nonlinear models are not constant, as they depend on the values of independent variables. One way to summarize them is by computing Marginal Effects at the Mean (MEM), which estimates marginal effects at the average values of the independent variables.

MEM is commonly used in:

Econometrics: Evaluating the effect of education on wages at the average level of experience.
Finance: Assessing the impact of credit scores on loan approval probability for a typical applicant.
Marketing: Estimating the effect of price on purchase probability for an average customer.

Unlike the Average Marginal Effect, which averages marginal effects over all observations, MEM computes the effect at a single point—the mean of the explanatory variables.

Let $p(\mathbf{X}, \beta)$ be the predicted probability in a nonlinear model (e.g., logistic regression). The MEM is computed as:

$\frac{\partial p(\bar{\mathbf{X}}, \beta)}{\partial X}$

where $\bar{\mathbf{X}}$ is the vector of mean values of all explanatory variables.

For a logistic regression model:

$E[Y|X] = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}$

the MEM for a continuous variable $X$ is:

$\frac{\partial E[Y|X]}{\partial X} \Bigg|_{X = \bar{X}} = \beta_1 \cdot p(\bar{X}) \cdot (1 - p(\bar{X})).$

MEM vs. AME
Method	Description	Pros	Cons
MEM	Marginal effect computed at the mean values of predictors.	Easy to interpret, based on a single reference point.	Not representative if data is skewed or nonlinear.
AME	Average of marginal effects across all observations.	More generalizable, considers full distribution of data.	Computationally more expensive.

Step 1: Estimate the Model

# Load necessary packages
library(margins)
library(sandwich)
library(lmtest)

# Simulate data
set.seed(123)
n <- 100
X <- rnorm(n)
Y <- rbinom(n, 1, plogis(0.5 + 0.8 * X))  # Logistic function

# Logistic regression
logit_model <- glm(Y ~ X, family = binomial(link = "logit"))

Step 2: Compute Marginal Effects at the Mean

# Compute mean of X
X_mean <- mean(X)

# Compute predicted probability at mean X
p_mean <-
    predict(logit_model,
            newdata = data.frame(X = X_mean),
            type = "response")

# Compute MEM for X
MEM <- coef(logit_model)["X"] * p_mean * (1 - p_mean)

# Display result
data.frame(Method = "MEM", Estimate = MEM)
#>   Method  Estimate
#> X    MEM 0.2146628

Interpretation

The MEM tells us the effect of $X$ on the probability of $Y=1$ at the mean value of $X$ .
It provides a simple interpretation but may not capture the variability in marginal effects across different values of $X$ .

A third approach is Marginal Effects at Representative Values (MER), where we calculate marginal effects at specific percentiles (e.g., median, quartiles). Below is a comparison:

Comparison of Marginal Effect Estimation Methods: MEM, AME, and MER
Method	Description	Pros	Cons
MEM	Marginal effect at mean values of predictors.	Simple to compute, useful if data is symmetric.	Not informative if mean is not representative.
AME	Average of marginal effects over all observations.	More generalizable.	More computationally expensive.
MER	Marginal effect at specific values (e.g., median, percentiles).	Captures effects at different levels of X.	Requires choosing relevant reference values.

When to Use Each Method

Use MEM when you need a quick, interpretable summary at an “average” individual.
Use AME when marginal effects vary widely across individuals.
Use MER when you need to understand effects at specific values of interest.

17.6.3 Marginal Effects at the Average

Marginal effects summarize how an independent variable influences the probability of an outcome in nonlinear models (e.g., logistic regression). We have already discussed:

Marginal Effects at the Mean (MEM): Marginal effects computed at the mean values of the independent variables.
Average Marginal Effects (AME): The mean of marginal effects computed at each observation.

A third approach is Marginal Effects at the Average(MAE), where we first average the independent variables across all observations, and then compute the marginal effect at that single averaged observation.

Let $p(\mathbf{X}, \beta)$ be the probability function of a model (e.g., a logistic regression). The Marginal Effect at the Average is computed as:

$\frac{\partial p(\bar{\mathbf{X}}, \beta)}{\partial X}$

where $\bar{\mathbf{X}}$ is the vector of averaged independent variables across all observations.

Key Differences Between AME and MAE

AME answers a general question: “How does $X$ affect $Y$ across the entire dataset?”
MAE answers a more specific question: “How does $X$ affect $Y$ for a typical (average) person in our dataset?”

MAE is particularly relevant when we want a single, interpretable effect for a representative individual.

Use Cases for MAE

Policy & Business Decision-Making: If policymakers or business leaders want to know the effect of a tax increase on a “typical” consumer, MAE gives an effect for a single representative individual.
Marketing Campaigns: If a marketing team wants to know how much increasing ad spend affects the purchase probability of an “average” customer, MAE provides this insight.
Simplified Reporting: AMEs vary across individuals, which can make reporting complex. MAE condenses everything into one easy-to-interpret number.

Comparison: MAE vs. MEM vs. AME
Method	Definition	Pros	Cons
MEM	Compute marginal effects at the mean values of $X$ .	Simple and interpretable.	Mean values may not represent actual observations.
AME	Compute marginal effects for each observation, then take the average.	More robust, accounts for variability.	Computationally more expensive.
MAE	Compute probability at averaged $X$ values, then compute the marginal effect.	Accounts for interactions better than MEM.	Less commonly used, can be misleading if $X$ values are skewed.

Intuition Behind MAE

Instead of computing individual marginal effects (as in AME), MAE computes the marginal effect for a single averaged observation.
This method is somewhat similar to MEM, but instead of taking the mean of each independent variable separately, it first computes a single averaged observation and then derives the marginal effect at that observation.

Step 1: Estimate the Model

# Load necessary packages
library(margins)

# Simulate data
set.seed(123)
n <- 100
X1 <- rnorm(n)  # Continuous variable
X2 <- rbinom(n, 1, 0.5)  # Binary variable
# Logistic function
Y <-
    rbinom(n, 1, plogis(0.5 + 0.8 * X1 - 0.5 * X2))  

# Logistic regression
logit_model <- glm(Y ~ X1 + X2, family = binomial(link = "logit"))

Step 2: Compute MAE

# Compute the average of independent variables
X_mean <- data.frame(X1 = mean(X1), X2 = mean(X2))

# Compute predicted probability at averaged X
p_mean <- predict(logit_model, newdata = X_mean, type = "response")

# Compute MAE for X1
MAE_X1 <- coef(logit_model)["X1"] * p_mean * (1 - p_mean)

# Compute MAE for X2
MAE_X2 <- coef(logit_model)["X2"] * p_mean * (1 - p_mean)

# Display results
data.frame(
    Method = "MAE",
    Variable = c("X1", "X2"),
    Estimate = c(MAE_X1, MAE_X2)
)
#>    Method Variable    Estimate
#> X1    MAE       X1  0.20280618
#> X2    MAE       X2 -0.06286593

The MAE for $X_1$ represents the change in probability when increasing $X_1$ at the average values of $X_1$ and $X_2$ .
The MAE for $X_2$ (a binary variable) represents the probability change when switching from $X_2 = 0$ to $X_2 = 1$ , holding all other variables at their average.

Comparison of MEM, AME, and MAE in Marginal Effect Computation
Method	Computes Marginal Effect...	Accounts for Variability?	Best for...
MEM	At the mean of each independent variable.	No	Quick interpretation at a reference point. When individual means are meaningful (e.g., symmetric data).
AME	At each observation, then averages.	Yes	Generalizable results.
MAE	At a single averaged observation.	No	Simple summary when interactions exist. When we need a single interpretable summary that accounts for interactions

When to Use MAE

When you want a single number summary that reflects a realistic scenario.
When there are interaction effects, and you want to account for the joint impact of predictors.
However, if predictor distributions are skewed, AME is usually preferred.