6.21 Log-binomial regression to estimate a risk ratio or prevalence ratio

Logistic regression is a special case of a family of models known as generalized linear models. Each member of this family has an assumed distribution for the outcome and a link function that connects the mean outcome to a linear combination of predictors $\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_K X_K$ (the linear predictor). In logistic regression, the outcome is assumed to have a binomial distribution and the link function is the logit function $\ln(p/(1-p))$ . Linear regression is also a special case, with a normal distribution and an identity link function (the mean is assumed to be equal to the linear predictor).

Another special case of a generalized linear model is the log-binomial regression model which, like logistic regression, assumes a binomial distribution for a binary outcome but, unlike logistic regression, uses a log link function as shown in Equation (6.2).

$\begin{equation} \ln{p} = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_K X_K \tag{6.2} \end{equation}$

With logistic regression, the left-hand side is the log of the odds, whereas in log-binomial regression it is the log of the probability. Exponentiating a regression coefficient in logistic regression results in an odds ratio. Similarly, exponentiating a regression coefficient in log-binomial regression results in a risk ratio (RR) or prevalence ratio (PR). The model described by Equation (6.2) can be used to estimate an RR from incidence data or a PR from prevalence data. Thus, for a predictor $X_k$ , the RR or PR is $e^{\beta_k}$ .

A disadvantage of log-binomial regression is that the left-hand side $(\ln{p})$ is constrained to be positive, while the right-hand side can be anything from $-\infty$ to $\infty$ . This leads to convergence issues at times (Williamson, Eliasziw, and Fick 2013). One method for fitting a log-binomial model is to use glm() with family = binomial(link="log"). Alternatively, use the logbin() function in the logbin package (Donoghoe and Marschner 2018) which may converge even in cases where glm() fails.

Example 6.2 (continued): Logistic regression estimated an OR comparing lifetime marijuana use between males and females of 1.44. Use log-binomial regression to compute the corresponding prevalence ratio.

library(logbin)
fit.ex6.2.logbin <- logbin(mj_lifetime ~ demog_sex,
                           data = nsduh,
                           method = "em")

# Summary of model
round(summary(fit.ex6.2.logbin)$coef, 4)

##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)    -0.7629     0.0463 -16.479   0.0000
## demog_sexMale   0.1794     0.0620   2.894   0.0038

# PR, and 95% CI for PR
PR.CI <- cbind("PR" = exp(coef(fit.ex6.2.logbin)),
                      exp(confint(fit.ex6.2.logbin)))[-1,]
round(PR.CI, 3)

##     PR  2.5 % 97.5 % 
##  1.197  1.060  1.351

Although not needed for this example, if the predictor were categorical with more than two levels, then you can obtain a Type III multiple df test as usual.

# Type III test
car::Anova(fit.ex6.2.logbin, type = 3, test.statistic = "Wald")

Conclusion: Males are 1.20 times as likely to have ever used marijuana than females (PR = 1.20; 95% CI = 1.06, 1.35; p = .004).

In the interpretation, we used the phrase “times as likely” rather than “times the odds” because log-binomial regression models the log of the probability, not the log-odds. We could also say that the prevalence of marijuana use is 20% greater among males. If this were incidence data, we could say that males have 20% greater risk. To compute an adjusted RR or PR, simply add the confounding variables to the model formula.

NOTES:

If you use predict() or gmodels::estimable() to estimate a probability from a log-binomial model, use exp() rather than ilogit() when transforming the prediction to the probability scale.
logbin() does not allow interaction terms using the : notation. If glm() with family(link = "log") converges, then that is the simplest way to include an interaction since it does allow the : notation. To include an interaction with logbin, you must create variables corresponding to the interaction terms outside the model and then include those variables in the model (see Section 9.6.4.2 for an example, from a different context, of how to do this).

References

Donoghoe, Mark W., and Ian C. Marschner. 2018. “logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model.” Journal of Statistical Software 86.9: 1–22. https://doi.org/10.18637/jss.v086.i09.

Williamson, Tyler, Misha Eliasziw, and Gordon Hilton Fick. 2013. “Log-Binomial Models: Exploring Failed Convergence.” Emerging Themes in Epidemiology 10 (1): 14. https://doi.org/10.1186/1742-7622-10-14.