Chapter 5 Methods
5.1 Modeling framework
We propose a combination of data-driven and Bayesian approaches to modeling the impact of MNAR mechanisms on the causal effect of interest. Figure 2 details the inputs, processes, and outputs of this applied pattern-mixture modeling technique when combined with multiple imputation.
The user can implement the code supplied in this practical guide by first loading their data, which will supply the programs with the proportion of individuals missing outcome data. We then propose four analyses for each of the four scenarios described above. The first is a complete case analysis, which assumes that data are MCAR. The second is a multiple imputation (MI) analysis that corrects the missing outcome data based on observed predictors of missingness (exposure and/or covariates). The third and fourth analyses build upon this multiple imputation step. Since the third scenario assumes that the only MNAR mechanism is the outcome predicting missingness, the only parameter varied after the MI step is the degree of the association between the outcome and an indicator variable of LTFU. The fourth scenario is more complex because an unmeasured confounder (\(U\)) is assumed to cause LTFU. Thus, three parameters must be varied after the MI step: the associations between \(U\) and \(LTFU\), \(U\) and exposure, and \(U\) and outcome.
From all four analyses, we can obtain a causal effect, conditional on covariates, of the exposure on the outcome. We can also obtain the 95% confidence interval and the outcome prevalence. Uncertainty estimates from scenarios 2-4 will reflect additional variability introduced by MI. The results for scenarios 3 and 4 will differ from the rest in that we report a range of possible causal effect estimates under different assumptions about missingness in our post-MI pattern-mixture model.
5.2 Assumptions
The proposed technique and modeling framework require several assumptions, although some assumptions could be modified or removed if this technique were extended in future applications. In our worked example, we assume that:
Our exposure is continuous
Our outcome is continuous
We have only one unmeasured confounder
Missingness is only due to LTFU
Exposure and covariate data are complete
For MI, we assume:
- Causes of the outcome do not differ between those LTFU and those not LTFU
For pattern-mixture modeling, we assume:
Our DAG is correct, i.e., that there are no other causes of LTFU
Our original MI model was correctly specified
5.3 Application of pattern-mixture modeling
As shown in Figure 2, we first implement multiple imputation to correct for outcome data that are MAR. We then use an offset parameter \(δ\) or \(c\) to specify how the distribution of missing outcome values (\(Y_{miss}\)) differs from the conditional distribution of the observed data (\(Y_{obs}\)) and use this offset parameter to modify the multiply imputed data. This produces multiple versions of the MAR-imputed dataset that reflect a range of plausible MNAR scenarios, for example, by multiplying the imputed values by \(c\), or by adding \(δ\). We analyze the resulting datasets as one would a usual multiply imputed dataset, fitting the analysis model to each imputed dataset and combining the results using Rubin’s rules.
5.4 Example dataset
Data for this tutorial are based on the U.S. Health and Retirement Study, a nationally representative, longitudinal panel study of U.S. residents ≥50 years of age and their spouses. As of the 2016 data release, the HRS included data collected from 42,515 individuals in 26,600 households. For the present analysis, we use data from a subset of participants who provided blood samples in the 2016 wave of data collection.
We are interested in the relationship between total household wealth (measured in $10000s) and biological-age advancement (a measure of accelerated aging, denominated in years).
To construct our example dataset, we restrict our dataset to cases with complete exposure, outcome, and covariate data. We then randomly remove outcome data for 15% of our population. Note: This details the process of constructing a dataset for the purposes of demonstrating sensitivity analyses for missing data mechanisms. Researchers will generally begin their analyses with this step completed, having a dataset with missing outcome data already in hand.
Exposure: Total household wealth ($10,000s)
Outcome: Biological-age advancement (years)
Measured covariates: age, sex, race