Chapter 6 Discussion

We have proposed and developed new strategies and ad-hoc measures for dose–response meta-analysis, including tools for evaluating the goodness-of-fit, a new measure for quantifying the impact of heterogeneity, a strategy to deal with differences in the exposure range across studies, and a one-stage approach to estimate complex models without excluding relevant studies. The developed methodologies have been implemented in user-friendly R packages freely available on CRAN. Several codes for reproducing the results of this thesis and of the corresponding papers can be found on my website https://alecri.github.io/software and on GitHub at https://github.com/alecri.

6.1 Goodness-of-fit

An evaluation of the goodness-of-fit should be a natural step in a dose–response meta-analysis. In Paper II we discussed the relevant issue of how to evaluate the goodness-of-fit in a dose–response meta-analysis. Flexible parametric curves are estimated in order to summarize and represent the aggregated data in a synthetic format. It is important to check if the fitted meta-analytical model actually provides an adequate description of the data at hand.

The evaluation of the goodness-of-fit is usually carried out in practice by measuring the degree of agreement between the fitted and observed data. We have presented and discussed three tools (deviance, coefficients of determination, and decorrelated residuals versus exposure plot) specifically designed for assessing the goodness-of-fit in meta-analysis of aggregated dose–response data. In particular, the deviance can be employed for testing if the chosen meta-analytic model is properly specified, while the \(R^2\) can be useful for quantifying from a descriptive point of view the proportion of variability accounted by the dose–response model. The fit of the dose–response analysis can be visually checked by inspecting the scatter plot of the decorrelated residuals versus the quantitative exposure.

The practical examples in Paper II and Section 5.2 illustrated the use of the proposed tools in evaluating the fit of the candidate dose–response models. In particular, we have shown how they can be useful for identifying specific dose–response patterns, investigating possible sources of heterogeneity, and generally evaluating if the combined dose–response association can be an adequate summary of the observed data. Implementation of the proposed tools in applied works can strengthen the results or, on the contrary, raise doubts about the ability of the selected model in summarizing the available evidence.

As in the general case for the use of summary measures, one should be aware of the possible limitations of the developed tools. We have already seen that while a small \(p\) value for the deviance test for model specification is an indication that the posited model failed in accounting for the observed variation in the log relative risks, a large \(p\) value can not be interpreted as evidence that the model adequately explains the observed variability. In addition, a test based approach is generally unsatisfactory because it does not provide information about the actual fit of the analysis and suffers from low power due to the typically small number of data points in meta-analyses. Lastly, the \(p\) values for the global test of goodness-of-fit are not valid when the meta-analytical dose–response models are estimated driven by the observed data.

Possible explanations for a low value of the \(R^2\) may be multiple. In fact, an \(R^2\) close to zero may indicate that the selected model poorly fits the data, but also that there is no association between the quantitative exposure and the relative risk for the health outcome, or again that the model is correctly specified but the residual variability is still close to the overall variability. Finally, the visual inspection of the goodness-of-fit can reveal dose–response patterns in the modeled data but its judgment can be quite subjective. In case of sparse data, almost any patterns can be detected in the decorrelated residuals-versus-exposure plot.

More generally, the tools have been presented in a fixed-effect framework. The decorrelated residuals-versus-exposure plot can be directly extended to the case of a random-effects analysis by including the covariance matrix of the random-effects in the Cholesky decomposition. The other two measures do not have an explicit extension. Their usage as diagnostic tools, however, should be independent from the inclusion of the random-effects in the final model.

6.2 A new measure of heterogeneity

Another relevant aspect in a quantitative review, which is also related the assessment of goodness-of-fit, is the evaluation of the impact of heterogeneity. Indeed, a high variability in the reported effect sizes may undermine the appropriateness of presenting the combined effect as a summary measure. The common measures of heterogeneity have been developed under the unrealistic assumption of constant error variances. In Paper III we have proposed a new measure of heterogeneity, \(\hat R_b\), that overcomes the limitation of the previous measures.

The \(\hat R_b\) quantifies the impact of heterogeneity as the proportion of the variance of the combined effect due to the between-study variability. We have shown how \(\hat R_b\) satisfies the properties required for a measure of heterogeneity without making any assumptions about the distribution of the within-study error terms. It can be expressed as the average of the study-specific intraclass correlation terms, i.e. the ratios of the \(\tau^2\) to the overall study-specific variance \(\tau^2 + v_i\). Like \(I^2\) and \(\hat R_I\), the proposed measure tends to its upper limit 1 in case of meta-analysis of very precise estimates (small \(v_i\)). The between-study coefficient of variation can give additional information about the magnitude of heterogeneity compensating the shortcoming of the available measures. The proposed measure of heterogeneity requires an estimate of \(\tau^2\), the between-study variability. Thus, confidence intervals should accompany the point estimates of \(\hat R_b\) to reflect the uncertainty in the sample. We have proposed Wald type confidence intervals using the delta methods based on the relation between \(\hat R_b\) and \(Q\). The performances of the confidence intervals were tested throughout an extensive simulation study presented in Crippa, Khudyakov, Wang, Orsini, & Spiegelman (2016 a).

We have shown how to present and interpret the new measure of heterogeneity by reanalyzing both univariate meta-analyses (in the illustrative examples of Crippa et al. (2016 a)) and, more specifically, a dose–response meta-analysis (in Section 5.3). As expected, the \(\hat R_b\) provided similar results as compared to both \(I^2\) and \(\hat R_I\) in case of effect sizes with homogeneous distribution for the within-error terms. On the contrary, differences were more evident as the variability of the \(v_i\) increased, with values of \(\hat R_b\) generally lower than the corresponding \(I^2\) and \(\hat R_I\).

6.3 A point-wise approach

In Paper IV we have extended a point-wise approach originally presented for meta-analysis of individual patient data to the case of meta-analysis of aggregated dose–response data. The proposed strategy consists of combining the predicted log relative risks for a fine grid of exposure values arising from different study-specific dose–response analyses instead of combining the regression coefficients for a common dose–response model.

A point-wise approach has the potential advantage of improving the individual dose–response analyses since the study-specific models can be defined separately across the studies. Although the aim of a dose–response meta-analysis should be to estimate a common curve that uniformly fits the study-specific results, estimation of a single functional form may lower the fit of some individual analyses. We have illustrated in Section 4.4 and 5.4 the case of second degree fractional polynomials. In a two-stage approach, a single couple of power terms needs to be defined for all the studies so that the pooled dose–response curve can be derived by pooling the study-specific regression coefficients. In a point-wise approach, each study can choose a possibly different combination of power terms to better fit the observed data. In such a way, the predicted log relative risks will be closer to the observed ones. The combined curve can be then derived by pooling the individual predicted log relative risks pointwisely.

Another important advantage relates to the meta-analysis of heterogeneous exposure distributions where the quantitative exposure may differ not only in the definition and measurement but also in the range. The solution in a two-stage analysis could be to limit the prediction of the pooled curve to a subset of the observed exposure values. Depending of the extent of the diversity of the exposure ranges this might not be sufficient. We have illustrated this feature reanalyzing aggregated data on the association between milk consumption and all-cause mortality in the results of Crippa, Thomas, & Orsini (2018) and between red meat consumption and bladder cancer in Section 5.4. In the point-wise strategy, the predicted log relative risk can be limited to the observed exposure range. The combined curve is thus obtained by combining pointwisely a potential different number of log relative risks. Neglecting this type of heterogeneity may have important consequences both in terms of point and interval estimates for the combined dose–response association. We have seen in Section 5.4 how the results based on a two-stage analysis may provide overconfident results for moderate to high values of red meat consumption, whereas a point-wise strategy limited the number of studies participating in the corresponding prediction and thus produced wider confident intervals, reflecting the uncertainty associated with the lower number of results. Finally, additional results from the univariate meta-analytic models can also be presented pointwisely, providing a richer description of the quantities of interest over the exposure range.

A possible limitation of the proposed approach is that the combined curve is obtained by means of separate univariate meta-analyses which are based on a set of common study-specifics analyses. As a consequence, the standard errors and confidence intervals may no longer be valid. A potential remedy would be to incorporate the covariance matrix for the study-specific predictions in the multivariate meta-analytic model. However, the number and the nature of the multivariate predictions are typically too high for the estimation algorithms to converge.

6.4 A one-stage model

In Paper V we have formalized and presented a one-stage model for meta-analysis of heterogeneous non-linear curves. The two steps of a two-stage approach, dose–response and pooling, can be written as a single procedure in terms of a linear mixed-effects model. The mixed-effects framework is particularly suitable for inferential procedures, marginal and conditional predictions, quantification of heterogeneity, goodness-of-fit and model comparison. The same questions frequently answered in a two-stage approach can be similarly addressed using a one-stage methodology.

The technique was initially presented in a fixed-effect analysis as a more flexible alternative of the two-stage methodology. Extensions to random-effects meta-analysis of non-linear curves have been typically framed into a two-stage framework because of the developments related to multivariate meta-analysis and for simplicity in the implementation using common statistical software. A one-stage model has oftentimes been regarded as equivalent. Even if we proved that a one-stage and two-stage approach give the same point estimates and inference, the one-stage methodology is more flexible and allows one to answer more elaborate research questions. Flexible curves can also be estimated based on the results from studies reporting a limited number of relative risks. In a two-stage meta-analysis, on the other hand, a typical requirement is that each study provides enough data for the individual dose–response analyses. For example, using either second order fractional polynomials or restricted cubic splines with 3 knots, \(p = 2\) transformations are required for modeling non-linear associations. As a consequence, only studies providing at least 2 non-referent relative risks can be included in the non-linear analysis. The case where studies reported the results after dichotomizing the quantitative exposure are not rare. The data for these studies will be excluded in a two-stage meta-analysis. One important objective of a quantitative review, however, is to consider and analyze the whole body of evidence for a research question of interest. Systematic exclusion of studies because of insufficient number of data points will necessarily discard useful information and thus provide only a partial summary. Furthermore, the assessment and investigation of between-studies variability will be also distorted, so that residual heterogeneity might be undetected.

Another advantage of a one-stage model is that many methodological aspects are greatly facilitated by using a single linear mixed-effects model. The tools presented in Paper II, for instance, were developed using the equivalence between the one- and two-stage approach in a fixed-effects analysis. The comparison of the fit in different dose–response analyses is also greatly facilitated by using information criteria such as the AIC, which are based on a common comparable likelihood.

Multiple routines implement linear mixed-effects models in different statistical packages. However, several aspects are specific to dose–response meta-analysis and it may be cumbersome to specify them using general commands for mixed-effects model. Therefore, we have implemented the one-stage methodology in the updated version of the dosresmeta package. Several example data sets and codes are available in order to facilitate applications of the proposed methodology.