15 Marginal Effects

In cases without polynomials or interactions, it can be easy to interpret the marginal effect.

For example,

\[ Y = \beta_1 X_1 + \beta_2 X_2 \]

where \(\beta\) are the marginal effects.

Numerical derivation is easier than analytical derivation.

We need to choose values for all the variables to calculate the marginal effect of \(X\)

Analytical derivation

\[ f'(x) \equiv \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \]

E.g., \(f(x) = X^2\)

\[ \begin{aligned} f'(x) &= \lim_{h \to 0} \frac{(x+h)^2 - x^2}{h} \\ &= \frac{x^2 + 2xh + h^2 - x^2}{h} \\ &= \frac{2xh + h^2}{h} \\ &= 2x + h \\ &= 2x \end{aligned} \]

For numerically approach, we “just” need to find a small \(h\) to plug in our function. However, you also need a large enough \(h\) to have numerically accurate computation (Gould, Pitblado, and Poi 2010, chap. 1)

Numerically approach

One-sided derivative

\[ \begin{aligned} f'(x) &= \lim_{h \to 0} \frac{(x+h)^2 - x^2}{h} \\ & \approx \frac{f(x+h) -f(x)}{h} \end{aligned} \]

Alternatively, two-sided derivative

\[ f'_2(x) \approx \frac{f(x+h) - f(x- h)}{2h} \]

Marginal effects for

discrete variables (also known as incremental effects) are the change in \(E[Y|X]\) for a one unit change in \(X\)
continuous variables are the change in \(E[Y|X]\) for very small changes in \(X\) (not unit changes), because it’s a derivative, which is a limit when \(h \to 0\)

	Analytical derivation	Numerical derivation
Marginal Effects	Rules of expectations	Approximate analytical solution
Standard Errors	Rules of variances	Delta method using the asymptotic errors (vcov matrix)

15.1 Delta Method

approximate the mean and variance of a function of random variables using a first-order Taylor approximation
A semi-parametric method
Alternative approaches:
- Analytically derive a probability function for the margin
- Simulation/Bootstrapping
Resources:
- Advanced: modmarg
- Intermediate: UCLA stat
- Simple: Another one

Let \(G(\beta)\) be a function of the parameters \(\beta\), then

\[ var(G(\beta)) \approx \nabla G(\beta) cov (\beta) \nabla G(\beta)' \]

where

\(\nabla G(\beta)\) = the gradient of the partial derivatives of \(G(\beta)\) (also known as the Jacobian)

15.2 Average Marginal Effect Algorithm

For one-sided derivative \(\frac{\partial p(\mathbf{X},\beta)}{\partial X}\) in the probability scale

Estimate your model
For each observation \(i\)
1. Calculate \(\hat{Y}_{i0}\) which is the prediction in the probability scale using observed values
2. Increase \(X\) (variable of interest) by a “small” amount \(h\) (\(X_{new} = X + h\))
  - When \(X\) is continuous, \(h = (|\bar{X}| + 0.001) \times 0.001\) where \(\bar{X}\) is the mean value of \(X\)
  - When \(X\) is discrete, \(h = 1\)
3. Calculate \(\hat{Y}_{i1}\) which is the prediction in the probability scale using new \(X\) and other variables’ observed vales.
4. Calculate the difference between the two predictions as fraction of \(h\): \(\frac{\bar{Y}_{i1} - \bar{Y}_{i0}}{h}\)
Average numerical derivative is \(E[\frac{\bar{Y}_{i1} - \bar{Y}_{i0}}{h}] \approx \frac{\partial p (Y|\mathbf{X}, \beta)}{\partial X}\)

Two-sided derivatives

Estimate your model
For each observation \(i\)
1. Calculate \(\hat{Y}_{i0}\) which is the prediction in the probability scale using observed values
2. Increase \(X\) (variable of interest) by a “small” amount \(h\) (\(X_{1} = X + h\)) and decrease \(X\) (variable of interest) by a “small” amount \(h\) (\(X_{2} = X - h\))
  - When \(X\) is continuous, \(h = (|\bar{X}| + 0.001) \times 0.001\) where \(\bar{X}\) is the mean value of \(X\)
  - When \(X\) is discrete, \(h = 1\)
3. Calculate \(\hat{Y}_{i1}\) which is the prediction in the probability scale using new \(X_1\) and other variables’ observed vales.
4. Calculate \(\hat{Y}_{i2}\) which is the prediction in the probability scale using new \(X_2\) and other variables’ observed vales.
5. Calculate the difference between the two predictions as fraction of \(h\): \(\frac{\bar{Y}_{i1} - \bar{Y}_{i2}}{2h}\)
Average numerical derivative is \(E[\frac{\bar{Y}_{i1} - \bar{Y}_{i2}}{2h}] \approx \frac{\partial p (Y|\mathbf{X}, \beta)}{\partial X}\)

library(margins)
library(tidyverse)

data("mtcars")
mod <- lm(mpg ~ cyl * disp * hp, data = mtcars)
margins::margins(mod) %>% summary()
#>  factor     AME     SE       z      p    lower   upper
#>     cyl -4.0592 3.7614 -1.0792 0.2805 -11.4313  3.3130
#>    disp -0.0350 0.0132 -2.6473 0.0081  -0.0610 -0.0091
#>      hp -0.0284 0.0185 -1.5348 0.1248  -0.0647  0.0079

# function for variable
get_mae <- function(mod, var = "disp") {
    data = mod$model
    
    pred <- predict(mod)
    
    if (class(mod$model[[{
        {
            var
        }
    }]]) == "numeric") {
        h = (abs(mean(mod$model[[var]])) + 0.01) * 0.01
    } else {
        h = 1
    }
    
    data[[{
        {
            var
        }
    }]] <- data[[{
{
var
}
}]] - h

    pred_new <- predict(mod, newdata = data)

    mean(pred - pred_new) / h
}

get_mae(mod, var = "disp")
#> [1] -0.03504546

15.3 Packages

15.3.1 MarginalEffects

MarginalEffects package is a successor of margins and emtrends (faster, more efficient, more adaptable). Hence, this is advocated to be used.

A limitation is that there is no readily function to correct for multiple comparisons. Hence, one can use the p.adjust function to overcome this disadvantage.

Definitions from the package:

Marginal effects are slopes or derivatives (i.e., effect of changes in a variable on the outcome)
- margins package defines marginal effects as “partial derivatives of the regression equation with respect to each variable in the model for each unit in the data.”
Marginal means are averages or integrals (i.e., marginalizing across rows of a prediction grid)

To customize your plot using plot_cme (which is a ggplot class), you can check this post by the author of the MarginalEffects package

Causal inference with the parametric g-formula

Because the plug-in g estimator is equivalent to the average contrast in the marginaleffects package.

To get predicted values

library(marginaleffects)
library(tidyverse)
data(mtcars)

mod <- lm(mpg ~ hp * wt * am, data = mtcars)
predictions(mod) %>% head()
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>      22.5      0.884 25.4   <0.001 471.7  20.8   24.2
#>      20.8      1.194 17.4   <0.001 223.3  18.5   23.1
#>      25.3      0.709 35.7   <0.001 922.7  23.9   26.7
#>      20.3      0.704 28.8   <0.001 601.5  18.9   21.6
#>      17.0      0.712 23.9   <0.001 416.2  15.6   18.4
#>      19.7      0.875 22.5   <0.001 368.8  17.9   21.4
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am
# for specific reference values
predictions(mod, newdata = datagrid(am = 0, wt = c(2, 4)))
#> 
#>  am wt Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %  hp
#>   0  2     22.0       2.04 10.8   <0.001  87.4  18.0   26.0 147
#>   0  4     16.6       1.08 15.3   <0.001 173.8  14.5   18.7 147
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, am, wt
plot_cap(mod, condition = c("hp","wt"))

# Average Margianl Effects
mfx <- marginaleffects(mod, variables = c("hp", "wt"))
summary(mfx)
#> 
#>  Term    Contrast Estimate Std. Error     z Pr(>|z|)   2.5 % 97.5 %
#>    hp mean(dY/dX)  -0.0381     0.0128 -2.98  0.00291 -0.0631 -0.013
#>    wt mean(dY/dX)  -3.9391     1.0858 -3.63  < 0.001 -6.0672 -1.811
#> 
#> Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

# Group-Average Marginal Effects
marginaleffects::marginaleffects(mod, by = "hp", variables = "am")
#> 
#>  Term          Contrast  hp Estimate Std. Error      z Pr(>|z|)   S  2.5 %
#>    am mean(1) - mean(0)  52    3.976       5.20  0.764    0.445 1.2  -6.22
#>    am mean(1) - mean(0)  62   -2.774       2.51 -1.107    0.268 1.9  -7.68
#>    am mean(1) - mean(0)  65    2.999       4.13  0.725    0.468 1.1  -5.10
#>    am mean(1) - mean(0)  66    2.025       3.48  0.582    0.561 0.8  -4.80
#>    am mean(1) - mean(0)  91    1.858       2.76  0.674    0.500 1.0  -3.54
#>    am mean(1) - mean(0)  93    1.201       2.35  0.511    0.609 0.7  -3.40
#>    am mean(1) - mean(0)  95   -1.832       1.97 -0.931    0.352 1.5  -5.69
#>    am mean(1) - mean(0)  97    0.708       2.04  0.347    0.728 0.5  -3.28
#>    am mean(1) - mean(0) 105   -2.682       2.37 -1.132    0.258 2.0  -7.32
#>    am mean(1) - mean(0) 109   -0.237       1.59 -0.149    0.881 0.2  -3.35
#>    am mean(1) - mean(0) 110   -0.640       1.57 -0.407    0.684 0.5  -3.73
#>    am mean(1) - mean(0) 113    4.081       3.94  1.037    0.300 1.7  -3.63
#>    am mean(1) - mean(0) 123   -2.098       2.10 -0.998    0.318 1.7  -6.22
#>    am mean(1) - mean(0) 150   -1.429       1.90 -0.753    0.452 1.1  -5.15
#>    am mean(1) - mean(0) 175   -0.416       1.56 -0.266    0.790 0.3  -3.48
#>    am mean(1) - mean(0) 180   -1.381       2.47 -0.560    0.576 0.8  -6.22
#>    am mean(1) - mean(0) 205   -2.873       6.24 -0.460    0.645 0.6 -15.11
#>    am mean(1) - mean(0) 215   -2.534       6.95 -0.364    0.716 0.5 -16.16
#>    am mean(1) - mean(0) 230   -1.477       7.07 -0.209    0.835 0.3 -15.34
#>    am mean(1) - mean(0) 245    1.115       2.28  0.488    0.625 0.7  -3.36
#>    am mean(1) - mean(0) 264    2.106       2.29  0.920    0.358 1.5  -2.38
#>    am mean(1) - mean(0) 335    4.027       3.24  1.243    0.214 2.2  -2.32
#>  97.5 %
#>   14.18
#>    2.14
#>   11.10
#>    8.85
#>    7.26
#>    5.80
#>    2.02
#>    4.70
#>    1.96
#>    2.87
#>    2.45
#>   11.79
#>    2.02
#>    2.29
#>    2.64
#>    3.46
#>    9.36
#>   11.09
#>   12.39
#>    5.59
#>    6.59
#>   10.38
#> 
#> Columns: term, contrast, hp, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted

# Marginal effects at representative values
marginaleffects::marginaleffects(mod, 
                                 newdata = datagrid(am = 0, 
                                                    wt = c(2, 4)))
#> 
#>  Term Contrast am wt Estimate Std. Error      z Pr(>|z|)   S   2.5 %   97.5 %
#>    am    1 - 0  0  2   2.5465     2.7860  0.914   0.3607 1.5 -2.9139  8.00694
#>    am    1 - 0  0  4  -2.9661     3.0381 -0.976   0.3289 1.6 -8.9207  2.98852
#>    hp    dY/dX  0  2  -0.0598     0.0283 -2.115   0.0344 4.9 -0.1153 -0.00439
#>    hp    dY/dX  0  4  -0.0309     0.0187 -1.654   0.0981 3.3 -0.0676  0.00572
#>    wt    dY/dX  0  2  -2.6762     1.4194 -1.885   0.0594 4.1 -5.4582  0.10587
#>    wt    dY/dX  0  4  -2.6762     1.4199 -1.885   0.0595 4.1 -5.4591  0.10676
#> 
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, wt, predicted_lo, predicted_hi, predicted, mpg, hp

# Marginal Effects at the Mean
marginaleffects::marginaleffects(mod, newdata = "mean")
#> 
#>  Term Contrast Estimate Std. Error      z Pr(>|z|)    S  2.5 %  97.5 %
#>    am    1 - 0  -0.8086    1.52383 -0.531  0.59568  0.7 -3.795  2.1781
#>    hp    dY/dX  -0.0323    0.00956 -3.375  < 0.001 10.4 -0.051 -0.0135
#>    wt    dY/dX  -3.7959    1.21310 -3.129  0.00175  9.2 -6.174 -1.4183
#> 
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am

# counterfactual
comparisons(mod, variables = list(am = 0:1)) %>% summary()
#> 
#>  Term          Contrast Estimate Std. Error      z Pr(>|z|) 2.5 % 97.5 %
#>    am mean(1) - mean(0)  -0.0481       1.85 -0.026    0.979 -3.68   3.58
#> 
#> Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

15.3.2 margins

Marginal effects are partial derivative of the regression equation with respect to each variable in the model for each unit in the data

Average Partial Effects: the contribution of each variable the outcome scale, conditional on the other variables involved in the link function transformation of the linear predictor
Average Marginal Effects: the marginal contribution of each variable on the scale of the linear predictor.
Average marginal effects are the mean of these unit-specific partial derivatives over some sample

margins package gives the marginal effects of models (a replication of the margins command in Stata).

prediction package gives the unit-specific and sample average predictions of models (similar to the predictive margins in Stata).

library(margins)

# examples by the package's authors
mod <- lm(mpg ~ cyl * hp + wt, data = mtcars)
summary(mod)
#> 
#> Call:
#> lm(formula = mpg ~ cyl * hp + wt, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.3440 -1.4144 -0.6166  1.2160  4.2815 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 52.017520   4.916935  10.579 4.18e-11 ***
#> cyl         -2.742125   0.800228  -3.427  0.00197 ** 
#> hp          -0.163594   0.052122  -3.139  0.00408 ** 
#> wt          -3.119815   0.661322  -4.718 6.51e-05 ***
#> cyl:hp       0.018954   0.006645   2.852  0.00823 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.242 on 27 degrees of freedom
#> Multiple R-squared:  0.8795, Adjusted R-squared:  0.8616 
#> F-statistic: 49.25 on 4 and 27 DF,  p-value: 5.065e-12

In cases where you have interaction or polynomial terms, the coefficient estimates cannot be interpreted as the marginal effects of X on Y. Hence, if you want to know the average marginal effects of each variable then

summary(margins(mod))
#>  factor     AME     SE       z      p   lower   upper
#>     cyl  0.0381 0.5999  0.0636 0.9493 -1.1376  1.2139
#>      hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
#>      wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236

# equivalently 
margins_summary(mod)
#>  factor     AME     SE       z      p   lower   upper
#>     cyl  0.0381 0.5999  0.0636 0.9493 -1.1376  1.2139
#>      hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
#>      wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236

plot(margins(mod))

Marginal effects at the mean (MEM):

Marginal effects at the mean values of the sample
For discrete variables, it’s called average discrete change (ADC)

Average Marginal Effect (AME)

An average of the marginal effects at each value of the sample

Marginal Effects at representative values (MER)

margins(mod, at = list(hp = 150))
#>  at(hp)    cyl       hp    wt
#>     150 0.1009 -0.04632 -3.12

margins(mod, at = list(hp = 150)) %>% summary()
#>  factor       hp     AME     SE       z      p   lower   upper
#>     cyl 150.0000  0.1009 0.6128  0.1647 0.8692 -1.1001  1.3019
#>      hp 150.0000 -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
#>      wt 150.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236

15.3.3 mfx

Works well with Generalized Linear Models/glm package

For technical details, see the package vignette

Model	Dependent Variable	Syntax
Probit	Binary	`probitmfx`
Logit	Binary	`logitmfx`
Poisson	Count	`poissonmfx`
Negative Binomial	Count	`negbinmfx`
Beta	Rate	`betamfx`

library(mfx)
data("mtcars")
poissonmfx(formula = vs ~ mpg * cyl * disp, data = mtcars)
#> Call:
#> poissonmfx(formula = vs ~ mpg * cyl * disp, data = mtcars)
#> 
#> Marginal Effects:
#>                    dF/dx   Std. Err.       z  P>|z|
#> mpg           1.4722e-03  8.7531e-03  0.1682 0.8664
#> cyl           6.6420e-03  3.9263e-02  0.1692 0.8657
#> disp          1.5899e-04  9.4555e-04  0.1681 0.8665
#> mpg:cyl      -3.4698e-04  2.0564e-03 -0.1687 0.8660
#> mpg:disp     -7.6794e-06  4.5545e-05 -0.1686 0.8661
#> cyl:disp     -3.3837e-05  1.9919e-04 -0.1699 0.8651
#> mpg:cyl:disp  1.6812e-06  9.8919e-06  0.1700 0.8650

This package can only give the marginal effect for each variable in the glm model, but not the average marginal effect that we might look for.

14 Hypothesis Testing

16 Prediction and Estimation