Chapter 2 Background and Motivation

2.1 The Problem

Loss-to-follow-up (LTFU) is ubiquitous in observational studies. To the extent that study participants who are LTFU differ systematically from those who remain in the sample at the end of the study period, complete case estimates of the causal effect may be biased (Howe et al. 2016; Westreich et al. 2012). Asymptotically unbiased causal effect estimates can be obtained using a range of imputation methods if outcome data is missing conditional only on observed data (i.e., when outcome data are missing at random [MAR]). However, it is unlikely that MAR mechanisms would be operating alone to produce missing outcome data. In reality, MAR mechanisms may operate alongside missing data that is missing completely at random (MCAR) or missing not at random (MNAR). The task at hand is to identify the mechanism that is most important in producing the LTFU that may bias causal effect estimates. This leads us to an even more fundamental challenge: how to assess the bias introduced by MNAR mechanisms when missingness is inherently caused by unmeasured variables outside of the observed data. While methods for correcting outcome data that are MAR rely on the observed data, methods to account for outcome data that are MNAR must rely on additional exogenous assumptions about the relationship between outcome missingness and unmeasured variables.

2.2 Why did we write this tutorial?

In the presence of LTFU, and for the reasons described above, sensitivity analyses are essential for assessing bias in causal effects. In particular, Stef van Buuren outlines three reasons for why researchers should be concerned about non-ignorable missing data that are MNAR (van Buuren et al. 2018):

1. If important variables that govern the missing data process are not available;

2. If there is reason to believe that responders differ from non-responders, even after accounting for the observed information;

3. If the data are truncated

Methods to identify and account for MCAR and MAR mechanisms are relatively straightforward and widely implemented (van Buuren et al. 2018; Hayati et al. 2015). Conversely, methods to assess the threat of MNAR missing data mechanisms are not widely applied. This is understandable given that modeling MNAR data is based on unmeasurable mechanisms, but is problematic if MNAR mechanisms represent the most important mechanism of missing data (even when we rule in that MAR mechanisms are present). A practical guide to sensitivity analysis for bias in causal effects in the presence of non-ignorable missing data is greatly needed in the field of epidemiology. Here, we focus on one particular method that can be used to assess the potential bias caused by missing outcome data due to LTFU: pattern-mixture modeling.