2.4 Assessing Prior Sensitivity
Recall that inference under discrete-geographic models—where many parameters must be inferred from minimal information—is inherently prior sensitive; i.e., the posterior probability distributions of the discrete-geographic model parameters that we infer from our geographic data are apt to be influenced by the prior probability distributions that we assume for those parameters. In this section, we describe some of the features implemented in PrioriTree (with an emphasis on describing the theory underlying those features) that are intended to help you identify prior sensitivity in your biogeographic analyses.
2.4.1 Estimating the Prior
A simple but effective way to identify prior sensitivity is to compare the (specified) prior to the (inferred) posterior probability distributions for each parameter: if a parameter is prior sensitive, its inferred posterior probability distribution will be (virtually) identical to whatever prior probability distribution we specified for that parameter.
PrioriTree allows users to visualize the prior distributions for the geographic model parameters—including the number of dispersal routes, \(\Delta\), the average dispersal rate, \(\mu\), and the resulting prior distribution on the expected number of dispersal events—which we can then compare to their corresponding posterior distributions to assess prior sensitivity.
However, a possible limitation of this approach is related to induced priors caused by parameter interactions. Imagine, for example, that we specify (and visualize) a uniform prior for a hypothetical parameter, \(\theta\), in PrioriTree, but (unforeseen) parameter interactions induce an exponential prior for \(\theta\). That is, the independent uniform prior that we initially specified for \(\theta\)—when marginalized over the joint prior probability distribution of all model parameters—is marginally exponential. After performing our MCMC simulation, we observe that the inferred marginal posterior for \(\theta\) resembles an exponential distribution, which departs strongly from the uniform prior distribution that we specified for \(\theta\), leading us to incorrectly conclude that this parameter is unlikely to be prior sensitive.
Accordingly, it may be safer to estimate the joint prior probability distribution of our model parameters using MCMC, and then compare the inferred marginal prior probability distribution for each parameter to its corresponding inferred marginal posterior probability distribution.
To understand how we estimate the joint prior probability distribution using MCMC, first recall how the M–H algorithm estimates the joint posterior probability distribution. Central to the M–H algorithm is the acceptance probability, \(R\)—the probability that we accept a move to a proposed state (set of parameter values)—which is essentially based on the ratio of the posterior probabilities of the proposed (\(\theta^\prime\)) and current (\(\theta\)) states:
\[\begin{align*} R \propto \Bigg[ \frac{f(G \mid \theta^{\prime}) \cdot f(\theta^{\prime})} {f(G \mid \theta) \cdot f(\theta)} \Bigg ] = \underbrace{\frac{f(\theta^{\prime} \mid G)}{f(\theta \mid G)}}_{\text{posterior ratio}}. %= \frac{\text{posterior probability of proposed state}}{\text{posterior probability of current state}}. \end{align*}\]
Because we have replaced all of our geographic data, \(G\), with "?"
, the likelihood of any parameter value will be identical, such that the first term of the acceptance probability (the likelihood ratio of the proposed and current states) cancels out:
\[\begin{align*} \require{cancel} R \propto \Bigg[ \underbrace{\cancel{\frac{f(G \mid \theta^{\prime})}{f(G \mid \theta)}}}_\text{likelihood ratio} \cdot \underbrace{\frac{f(\theta^{\prime})}{f(\theta)}}_\text{prior ratio} \Bigg ] = \underbrace{\frac{f(\theta^{\prime})}{f(\theta)}}_{\text{prior ratio}}, \end{align*}\]
which makes it clear that the MCMC simulation will visit states (parameter values) proportional to their relative prior probability. We can then query the joint prior sample from the MCMC simulation to summarize the marginal prior probability distribution for any parameter: e.g., we might infer the marginal prior probability density for the average dispersal rate parameter, \(\mu\), by constructing a histogram (frequency distribution) of sampled values from the corresponding column in our log file. These inferred marginal prior probability distributions can then be compared to their corresponding marginal posterior probability distributions to assess prior sensitivity.
2.4.2 Robust Bayesian Inference
We can assess the prior sensitivity of our biogeographic inferences using an approach called robust Bayesian inference. The fancy name belies the simplicity of this approach; we perform a series of MCMC analyses—of the same dataset under the same inference model—where we iteratively change one (or more) (hyper)priors of our inference model for each separate analysis. We then compare the resulting series of marginal posterior probability distributions for a given parameter to assess whether (or how much) our estimates change under different priors. We usually make this comparison visually, by plotting distributions for a given parameter under the range of candidate priors that we explored.
If the inferred marginal posterior probability distributions are (more or less) identical under a range of corresponding priors, we can safely conclude that our estimates of this parameter are robust to the choice of prior. Conversely, if the marginal posterior probability distributions vary substantially (and resemble their corresponding marginal prior probability distributions), then we would conclude that this parameter exhibits prior sensitivity (i.e., that there is little information in our study data to estimate this parameter). The latter scenario indicates that we need to take further steps; for example, by removing this parameter from our inference model (if possible), or (if not) by making an effort to objectively choose among alternative priors (e.g., by assessing the relative and/or absolute fit of the data to alternative priors).
2.4.3 Data Cloning
We can also assess the prior sensitivity of our biogeographic inferences using an approach called data cloning (Robert 1993; Lele, Dennis, and Lutscher 2007; Jose Miguel Ponciano et al. 2009; José Miguel Ponciano et al. 2012). Under this approach, we perform a series of MCMC analyses—under the same inference model with identical priors—where we iteratively increment the number of copies (“clones”) of our original dataset used in each separate analysis. We then explore the resulting series of marginal posterior probability distributions for a given parameter to assess how our estimates change as the level of information in the data increases (i.e., as we increment the number of data clones).
We might think of data cloning as the inverse of robust Bayesian inference; as described above, robust Bayesian inference involves a series of analyses where we hold the inference model and data constant, but iteratively change the prior probability distribution for a parameter to explore how the choice of prior impacts the corresponding marginal posterior probability distribution. By contrast, data cloning involves a series of analyses where we hold the inference model and prior constant, but iteratively change the number of copies of the original data to explore how the level of information in the data impacts the inferred marginal posterior probability distribution.
A particular MCMC in a sequence of data clones is defined by the number of replicate copies of our original data, \({\beta_i \geq 1}\), with the resulting posterior distribution being:
\[\begin{align*} P(\theta \mid X)_{\beta_i} \propto P(X \mid \theta)^{\beta_i} P(\theta). \end{align*}\]
If we were to set \({\beta_i = 0}\), we would be targeting the joint prior probability distribution (i.e., we would be running the MCMC without data), when \({\beta_i = 1}\), we are targeting the joint posterior probability distribution (i.e., we would be running the MCMC using our original dataset). As \(\beta_i \rightarrow \infty\), the marginal posterior distribution for the parameter under consideration will converge to a point value that is identical to the maximum-likelihood estimate (MLE) for that parameter. 1
Of interest here is the relative rate at which the marginal posterior probability distribution for the parameter under scrutiny—given the prior specified for that parameter—converges to the MLE as we increase the clone number. If the prior is very informative (i.e., focused on a narrow range of parameter values) and the prior mean is far from the MLE value, the rate of convergence will be slow. Conversely, if a prior is more diffuse (i.e., spread over a relatively wide range of parameter values) and the prior mean is rather close to the MLE value, the rate of convergence will be relatively fast. When the information in the data is limited, we would generally prefer a prior that has a faster convergence rate. We usually assess the convergence rate visually, by plotting posterior distributions for a given parameter under the range of \({\beta_i}\) values that we explored.
Note that posterior probability density will converge to the MLE as we increase the amount of information in the data if: (1) our inference model is identifiable (i.e., that each unique set of parameter values has a corresponding unique likelihood value), and; (2) that we specify priors with soft bounds for all of our model parameters (i.e., that one or more of our priors does not assign zero prior probability [does not “box out”] the corresponding maximum-likelihood estimate for that parameter.)↩︎