5 Current and Future Directions
Monte Carlo methodology is being actively developed. Indeed it is is likely that many of the students attending this module will themselves be working on aspects of computer intensive statistics. No module of this sort can hope to start from the beginning of the discipline and reach all of the frontiers of current research. This chapter contains a very few words about some current research directions and attempts to provide references so that the interested reader can easily find out more. It isn’t an exhaustive summary of interesting directions in this area, but we have attempted to provide a little information about at least the most widespread such topics (and, of course, those in which we are particularly interested).
5.1 Ensemble-based Methods and Sequential Monte Carlo
Ensemble- (or particle-) based methods use a collection of samples to approximate a distribution within an algorithm. Operations are performed on the ensemble rather than considering only a single sample at a time as for MCMC algorithms. Many of these methods come originally from the signal-processing literature, in which a class of algorithms known as particle filters were introduced by Gordon, Salmond, and Smith (1993) to approximate the solution of the discrete time filtering problem. Consider the problem of inferring the location of an object using noisy radar measurements taken at regular times. The filtering problem in this setting asks for the posterior distribution of the true location of the object (which is treated as a hidden or latent variable) at a time \(t\) given the noisy observations up to time \(t\). A particle filter approximates this distribution by simulating a collection of particles whose weighted empirical distribution hopefully provides a good approximation to this filtering distribution. Particles are evolved through time and reweighted in order to update the filtering distribution as new observations arrive. See Doucet and Johansen (2011) for a survey of these and some related techniques.
Amongst others, R. M. Neal (2001) and Chopin (2001) proposed algoriothms based around this type of methodology (from quite different perspectives) which are applicable to more general problems. The particle filtering framework can be generalised to a rather more abstract problem of sampling from a growing collection of measures, in which case the algorithm is known as Sequential Monte Carlo. See Del Moral, Doucet, and Jasra (2006) for an important special case known as Sequential Monte Carlo samplers, and Del Moral (2004), Del Moral (2013), for book-length studies of the theoretical behaviour of this type of algorithm.
5.2 Pseudomarginal Methods and Particle MCMC
One area which has attracted a lot of attention is that in which one has access to a joint distribution but is interested in inference for only a (relatively low-dimensional) marginal of that distribution. It was demonstrated in Beaumont (2003) that with a clever spatial-extension scheme one could justify an approximation to the ideal marginal scheme. Such an approach was further analysed by Andrieu and Roberts (2009) (and there is a trail of more recent work) who termed them pseudo-marginal methods. The idea is motivated as follows. Suppose you wanted to run an MCMC algorithm on a model parameter but you could not compute the acceptance probability at each step because the likelihoods appearing in the numerator and denominator of the Metropolis–Hastings ratio are intractable. However, at some additional cost you are able to obtain a Monte Carlo estimate for each likelihood. If you were to simply plug in those estimates into the Metropolis–Hastings ratio, you have no guarantees that the MCMC algorithm still has the correct target distribution. The essence of the pseudo-marginal method is to view the act of doing Monte Carlo estimation as an instance of data augmentation; the augmentation comes from all the extra randomness you need to do the Monte Carlo step. By writing down this augmentation explicitly and treating the randomness associated with the Monte Carlo steps carefully, one is able to show that the MCMC algorithm still has the correct target.
A closely related idea is the particle MCMC (PMCMC) approach of Andrieu, Doucet, and Holenstein (2010). Here, SMC algorithms are used within an MCMC algorithm to integrate out large collections of latent variables. A number of schemes can be justified based upon a common extended-space view of these algorithms.
5.3 Approximate Approximate Methods
Pseudomarginal and related methods are often referred to as exact approximate methods: they are approximate in the sense that they emulate an idealised algorithm by invoking additional randomness, but are exact in the sense that they retain the correct target distribution. In recent years there has been some interest in making still further approximations and considering approximation schemes which also affect the target distribution. There is some weak theoretical support that such methods can work under certain circumstances (Alquier et al. 2016; Medina-Aguayo, Lee, and Roberts 2016; Everitt et al. 2017 for example) but it remains difficult to justify their use in realistic settings.
5.4 Quasi-Monte Carlo
In Section 2.1.1 the idea of quasi-random number generators was briefly mentioned. The use of quasi-random numbers within simulation procedures has received a burst of recent attention in large part due to the paper of Mathieu Gerber and Chopin (2015) which presented an elegant approach to their incorporation within the SMC framework. See also M. Gerber, Chopin, and Whiteley (2019) for further connections between quasi-Monte Carlo and SMC.
5.5 Hamiltonian/Hybrid MCMC
Hamiltonian Monte Carlo (HMC) is another approach to constructing MCMC chains which gives the evolution of those chains a physical interpretation. We augment the state space to incorporate a momentum variable and use some clever numerical technology borrowed from the physics literature—like the HMC method itself—in order to produce transitions which can exhibit much better mixing than the small steps allowed by Metropolis–Hastings type algorithms in high dimensions. Techniques adapted to a statistical setting in order to make use of higher order spatial structure was developed by Girolami and Calderhead (2011). See R. Neal (2011) for an introduction to these methods.
5.6 Methods for Big Data
An enormous research effort is currently being dedicated to the development of methods which scale sufficiently well with the size of a set of data that they allow inference with truly enormous data sets. This is too large, and too specialised, an area to dedicate much space to here, but Bardenet, Doucet, and Holmes (2017) provide an excellent comparative summary of the current state of the art.
Standard MCMC algorithms suffer in high-dimensions partly because their reversible nature slows down exploration of the state space. There has been a flurry of recent work developing non-reversible MCMC algorithms with good scaling properties. Often they are best expressed as continuous-time algorithms in contrast to the discrete updates of standard MCMC. This has something of the flavour of HMC but they rely heavily on proposals driven by piecewise deterministic Markov processes; see Fearnhead et al. (2018) for an introduction.