4 Model Fitting
4.1 Introduction
What is a model in a fisheries context? It typically means one or more equations that describe a biological process of interest. There are two good reasons for developing a model. A model that simulates a field study can be used for planning purposes, before field work is conducted. Such a model might be used to determine the number of fish to tag or the level of electrofishing effort, in order for the field work to produce a useful result. A model can also be used to test an analytical approach. For example, we could generate simulated data and see whether our analysis agrees with the known true value(s). The other main use of a model is to analyze data from field work. One example from Chapter 2 is a two-sample mark-recapture study to estimate population size. Fish in the first sample are tagged, and the fraction of marked fish in the second sample provides the information needed to estimate population size. The model in that case is the equation for population size (Nhat), along with some assumptions about sampling. The equation is based on the assumption that the marked fraction of the second sample should (on average) be the same as the marked fraction in the underlying population. This method also requires some other assumptions that are more practical in nature, to help ensure that our field methods approximate the underlying model structure. For example, we assume that tagged and untagged fish are well mixed. In practice, this means that tagging effort should be well distributed throughout the study area so that our recapture effort does not encounter pockets with too many or too few tagged fish. We also assume that tags are not shed and that there are no deaths due to tagging. The mark-recapture model assumes that we know how many fish are in the tagged subpopulation. Tag loss or tagging mortality would affect that number and result in a biased estimate.
4.2 Using simulation
The equation for a two-sample mark-recapture study provides a point estimate of population size, but we could consider an improved version of the model that would allow us to more fully characterize the study results. For example, we could assume that the number of marked fish in the second sample comes from a binomial distribution. That distributional assumption allows us to generate estimates of uncertainty as well as the point estimate. The key to this sort of extended model is to think about the process generating the field data and determine which statistical distribution is appropriate. Let’s start with a simulation then consider an alternative (Bayesian) approach.
rm(list=ls()) # Clear Environment
# Two-sample population estimate
<- 400 # Arbitrary assumed population size
N.true <- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true * p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2 <- n2 * p.true # Number of marked fish in second sample
m2 <- n1*n2/m2 # Store estimate as variable N_hat N.hat
This example uses the same estimator as in Chapter 2. The two samples could be fixed values and could differ, but here we assume they are equal in size, based on the capture probability (p.true). The number of recaptures in the second sample equals the expected number based on the fraction of the population that is marked (p.true). Because n1, n2, and m2 take on their expected values based on p.true, the estimate (Nhat) equals the true value (see Environment window).
Now, let’s extend the model by introducing some randomness (stochasticity) in the number of recaptures. We assume that recaptures are the successes in a binomial trial. The size of the binomial trial is n2. The probability of success equals p.true here but we calculate it as n1/N.true in case n1 and n2 were given other fixed values.
# Extend model to allow for stochasticity
<- 1000
Reps <- rbinom(n=Reps, prob=n1/N.true, size=n2)
m2.vec <- n1 * n2 / m2.vec
N.hat hist(m2.vec, xlab="Number of recaptures", main="")
hist(N.hat, xlab="Population estimate", main="")
mean(N.hat)
quantile(N.hat, probs=c(0.025, 0.5, 0.975), na.rm=TRUE) # Empirical confidence bounds and median
The rbinom()
function produces a vector of recaptures, simulating a mark-recapture study repeated Reps times. Using that vector in the equation for Nhat now produces a vector of population estimates. Plotting those estimates shows us that estimates are generally close to the true value but tend to have a long tail to the right (overestimates Nhat when a low value is drawn for m2). The mean is higher than the true value (due to the tail of extreme values) but the median should be very close to N.true. There are less biased modifications to the basic two-sample estimator but this version is more intuitive and sufficient for our purposes. Also, the point estimate is less important than characterizing the uncertainty. Here we obtain an empirical estimate of the 95% bounds using the quantile()
function. We provide a vector of probabilities and the function returns the estimates at those points. The 0.025 and 0.975 points provide the empirical 95% bounds, and the 0.5 point is the median. When interpreting a field study of this sort, the confidence bounds and overall shape of the distribution should be given more emphasis than the point estimate (which reduces concern about any statistical bias of the estimator).
4.3 Bayesian approach
Next, we compare this simulation result to a formal statistical method. We use the same binomial distribution to describe the process of obtaining recaptures. Our results will be similar, and hopefully the comparison will provide a clearer picture of how statistical methods work. Our use of simulation in the remainder of the book will be limited to generating “sample” data. Model fitting will be done using formal statistical methods that work well for simple or complex models.
The two paradigms for statistical methods are frequentist and Bayesian. Frequentist methods are dominant in statistics courses. They are so named because the focus is on how frequently a result would be expected to be observed under the null hypothesis. For example, we might compare two groups that differ in mean size by 10 cm, and estimate that a difference of that magnitude occurs rarely (say two percent of the time) under the null hypothesis. That indication of frequency (2%) might lead us to reject the null hypothesis and to conclude that the difference is meaningful and not due to chance. Bayesian statistics differs from frequentist statistics in that it takes into account not only the new data but also what is known before the study is conducted (McCarthy 2007; Doll and Jacquemin 2018). This “prior” information is described using a statistical distribution, such as a uniform or normal. For example, we might know from a pilot study that the tag reporting rate is between 0.2 and 0.6. We could conduct the Bayesian analysis using a uniform prior distribution over that range rather than the default range of 0-1 (which would indicate no prior information). This aspect of Bayesian analysis generates a good bit of controversy, and it is often unclear that valid prior information can be found and characterized objectively. We avoid that controversy in this book by using uninformative prior distributions (e.g., uniform 0-1 for probabilities) to the extent possible. Typically the results we obtain will be very similar to results of a frequentist statistical approach. If there are situations where an informative prior distribution can be justified (e.g., Doll and Jacquemin 2018), then it is straightforward to include that in the Bayesian analysis.
It is reasonable to ask why a Bayesian approach would be recommended, if we are going to avoid informative prior distributions? The answer is a practical one – to take advantage of the “BUGS” family of software (Lunn et al. 2000; Lunn et al. 2009). This open-source software began with BUGS, followed by WinBUGS (for Windows PCs) then OpenBUGS (intended for multiple operating systems). There are several newer sources of Bayesian software that use the same or very similar code. Here we use R packages for JAGS (Plummer 2003), a popular and well-supported version. The code should transfer with little or no modification to other BUGS-family software versions. The code in the remainder of the book will generally follow the approach demonstrated by Kéry (2010) and Kéry and Schaub (2011), in that R simulation code will be used to generate sample data and JAGS Bayesian code will be used to fit the model.
The following code provides for a Bayesian model fit to the mark-recapture data:
rm(list=ls()) # Clear Environment
# Arbitrary 'observed' values for analysis
<- 400 # Population size
N.true <- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true * p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2 <- n2 * p.true # Number of marked fish in second sample
m2
# Load necessary library packages
library(rjags) # Package for fitting JAGS models from within R
library(R2jags) # Package for fitting JAGS models. Requires rjags
# JAGS code
sink("TwoSampleCR.txt")
cat("
model {
# Prior
N.hat ~ dlnorm(0, 1E-6) # Uninformative prior (N.hat>0)
MarkedFraction <- n1/N.hat
# Likelihood
# Binomial distribution for observed recaptures
m2 ~ dbin(MarkedFraction, n2)
}
",fill = TRUE)
sink()
# Bundle data
<- list("n1", "n2", "m2")
jags.data
# Initial values.
<- function(){ list(N.hat=runif(n=1, min=max(n1,n2), max=20000))}
jags.inits
<- 'TwoSampleCR.txt'
model.file
# Parameters monitored
<- c("N.hat", "MarkedFraction")
jags.params
# Call JAGS from R
<- jags(data=jags.data, inits=jags.inits, jags.params,
jagsfit n.chains = 3, n.thin = 1, n.iter = 2000, n.burnin = 1000,
model.file)print(jagsfit, digits=3)
plot(jagsfit)
The first section of code sets up the experiment. The “observed” values for n1, n2 and m2 could be replaced by real data from a field study; in that case, N would be unknown.
Before the first time of running JAGS code, you will need to download and install JAGS. Next, use RStudio to install the packages for running JAGS. There are several options; this book uses rjags (R-rjags?) and R2jags (R-R2jags?). Both of these steps are done only once. Then each time that you want to access JAGS from R, you load the two packages using the library()
function.
The cat()
function concatenates (merges together) the lines of JAGS code, which are written out as a text file using the sink()
function. The name of the external file is arbitrary but just needs to match the model.file code further below. It can be helpful to use an informative name for the text file rather than something generic that might get overwritten or be hard to locate at a later date. The JAGS code can be thought of as consisting of two main parts: prior information and analysis of the new data. These two parts can come in any order but it seems logical to put the prior information first. In this case, our prior distribution is an uninformative lognormal (McCarthy 2007) for the population estimate N.hat. This ensures that the population estimate cannot be negative. The analysis of new data uses the likelihood (see Section 4.3.3) and is calculated in the final line of JAGS code. The observed number of recaptures is assumed to be binomally distributed. The parameter N.hat is used to estimate the fraction of the population that is marked, which is the probability for the binomial function dbin()
.
The next few lines of R code provide settings for the JAGS analysis (data to pass in, initial values for parameters, file name for the JAGS code, parameters to return). JAGS is sometimes tolerant of a wide range of initial values, but in this case, we need to provide a large enough initial value to avoid a run-time JAGS error. Population size must be at least as large (and likely much larger than) the sizes of the two samples, so for simplicity we use a broad uniform prior distribution ranging from max(n1,n2)
to an arbitrarily large value (20,000). Parameters (true model parameters and functions of those parameters) for which we want JAGS to monitor and return results are listed in jags.params. The function call to jags()
links to the data, initial values, etc., and provides some settings for the estimation process.
BUGS software uses an iterative method of parameter estimation called Markov Chain Monte Carlo or MCMC (McCarthy 2007). The MCMC method produces a sequence of autocorrelated estimates. Each sequence or chain uses a different stream of pseudo-random numbers, so having multiple independent chains is necessary for judging convergence. Three chains is a good practical choice. Thinning (e.g., n.thin=10 to retain only every tenth MCMC result) is sometimes done to reduce the autocorrelation, but is not generally necessary (Link and Eaton 2012) and is not done here. The number of MCMC iterations is arbitrary but it is better to err on the side of having too many (and can provide a valuable opportunity to stretch legs or get coffee). R returns a measure of convergence (Rhat) that can be useful in deciding if a larger number of iterations is needed. Rhat values less than 1.05 can indicate acceptable convergence (Lunn et al. 2012), with values closer to 1 being preferred. More details about judging convergence are in Section 4.3.2. The final MCMC setting is for the “burn-in” phase, where initial values are discarded so that the retained values are considered to be unaffected by the initial values. The default if n.burnin is not specified is to discard half the total number of updates. The best way to develop experience and intuition about MCMC settings (and Bayesian analysis in general) is to run and re-run code using different settings. For example, are the results different if n.iter is set to 10000?
Results for this case are contained in the jagsfit object, which can be printed to the Console and viewed as plots. Summary statistics are produced for the two monitored parameters and deviance (See Section 4.3.4), which is a measure of fit. Convergence was very good based on estimated Rhat values less than 1.01. The 95% Bayesian credible interval (comparable to a frequentist confidence interval) from one run was 277 to 670, with a median of 410 and mean of 425. Your results will differ slightly because of the pseudo-randomness of the MCMC process. The credible interval was not too different from the estimate obtained through simulation (three simulation runs produced 278-711, 278-640, and 278-640). (The upper bound estimates of 640 and 711 occur with ten and nine recaptures, respectively.)
More information about Nhat can be obtained by plotting values from the jagsfit object:
hist(jagsfit$BUGSoutput$sims.list$N.hat, main="", xlab="Population estimates")
The jagsfit object is a data frame that can be inspected in the Environment window. Note that sims.list arrays contain 3000 elements, which is the combined length of 1000 retained values (2000-n.burnin) from each of three chains. (When entering a jagsfit object in the Source window or Console, typing $ after each level of the object will bring up an RStudio prompt for the next choice.)
The model results include pD, which is an estimate of the number of effective parameters in the model. In simple models, we can often count the number of nominal parameters, but the number of effective parameters may be less; for example, parameters may be correlated or constrained by an informative prior distribution (McCarthy 2007). JAGS uses numerical methods to estimate pD. For this model, we would expect pD to be close to 1 for the estimate of N.hat; MarkedFraction is a calculated value and not a model parameter.
JAGS also provides an estimate of the Deviance Information Criterion (DIC), which is a relative measure of how well a model fits the data (McCarthy 2007; Lunn et al. 2009). It is the Bayesian analog of the Akaike’s Information Criterion (AIC) that is widely used in a frequentist setting to compare alternative models. Like the AIC, the DIC results in a trade-off between model fit (likelihood) and complexity (number of parameters) (McCarthy 2007). The DIC score is calculated using a measure of fit (deviance at the mean of the posterior distribution) and a measure of model complexity (pD); lower values are better when comparing alternative models. DIC scores should be interpreted with caution because of the difficulty in estimating pD for some models (Lunn et al. 2009).
It is worth noting that this model may also be fit by using MarkedFraction as the model parameter. It is easy to specify the uninformative prior distribution (and initial values) in this case, as MarkedFraction is a proportion between 0 and 1. Now N.hat is a calculated value. One major advantage of Bayesian software is that we automatically get full information about parameters that are calculated functions of model parameters. In this case, the calculated value (N.hat) is of primary interest, rather than the actual model parameter MarkedFraction. Results for Nhat characterize its posterior distribution, which takes into account the prior information (none here) and the new data.
rm(list=ls()) # Clear Environment
# Load necessary library packages
library(rjags) # Package for fitting JAGS models from within R
library(R2jags) # Package for fitting JAGS models. Requires rjags
# Arbitrary 'observed' values for analysis
<- 400 # Population size
N.true <- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true * p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2 <- n2 * p.true # Number of marked fish in second sample
m2
# JAGS code
sink("TwoSampleCR.txt")
cat("
model {
# Priors
MarkedFraction ~ dunif(0, 1)
# Calculated value
N.hat <- n1 /MarkedFraction
# Likelihood
# Binomial distribution for observed recaptures
m2 ~ dbin(MarkedFraction, n2)
}
",fill = TRUE)
sink()
# Bundle data
<- list("n1", "n2", "m2")
jags.data
# Initial values.
<- function(){ list(MarkedFraction=runif(1, min=0, max=1))}
jags.inits
<- 'TwoSampleCR.txt'
model.file
# Parameters monitored
<- c("N.hat", "MarkedFraction")
jags.params
# Call JAGS from R
<- jags(data=jags.data, inits=jags.inits, jags.params,
jagsfit n.chains = 3, n.thin = 1, n.iter = 2000, n.burnin = 1000,
model.file)print(jagsfit, digits=3)
plot(jagsfit)
Another advantage of this way of parameterizing the model is that we avoid specifying an arbitrary upper bound for initial values for N.hat. Results of the two approaches appear to be comparable.
4.3.1 Debugging code
Anyone who develops new code for fisheries data analysis will wrestle with the inevitable coding errors, not only in JAGS but also the R code that surrounds it. Let’s begin by looking at a few examples of R coding errors:
rm(list=ls()) # Clear Environment
#1
<- 2
x lg(x) # Incorrect function name
#2
<- 4
y <- x+y # Typo (should have been x*y)
z
#3
<- rnorm(1, 10, 2) # Correct but poorly documented
RandLength <- rnorm(10,2) # Sample size omitted
RandLength <- rnorm(1,10) # sd omitted so default used (incorrectly)
RandLength <- rnorm(n=1, mean=10, sd=2) # Correct and fully documented
RandLength
#4
<- rnorm(n=1000, mean=10, sd=5)
x2 <- rnorm(n=1000, mean=20, sd=10)
x3 <- x2+x3
x4 mean(x4)
var(x4)
The first example is easy to resolve, as we get a helpful error message in the Console. Rechecking the code, or doing an internet search for the correct function name, are approaches for addressing this type of error. The second example, if our intent was to assign x*y to z, is much more dangerous. The code will run but will give the wrong answer because of a typing error. It is always essential to review new code, line by line. Having a colleague (fresh set of eyes) check code can also be productive. R code can often be executed line by line, and checking calculated values in the Environment window can help in finding this sort of logic error, at least for simpler analyses.
In the third example, we are simulating a random length observation, drawn from a normal distribution with a mean of 10 and standard deviation of 2. The first version of the code works correctly but is poorly documented. The second example runs but does not produce the intended result. We omitted the first argument and did not include parameter names, so R assumes incorrectly that the first argument is the sample size and the second one is the mean. The third argument is missing so R uses the default value (sd=1), which in this case is incorrect. Even if we wanted to use an sd of 1, it is better to specify it for clarity. RStudio is helpful in working with functions in that it displays the function arguments and any defaults, once the function name and “(” have been entered. The fourth version works as intended, and makes clear the values that are being used.
The fourth example is coded correctly but illustrates another way of checking code. Simulation makes it easy to use a large sample size, to check sample statistics for x4. We verify that the mean is close to the expected value (10+20). The variance of a sum equals the sum of the variances so we confirm that the sample variance is close to the expected value (25+100).
How many errors do you find in the following section of code? Review line by line and correct errors before executing the code. Once the code is working correctly, confirm that the sample median is very close to the expected population size:
rm(list=ls()) # Clear Environment
# Two-sample population estimate
<- 400 # Arbitrary assumed population size
N.true <-- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true X p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2
<- 1000
Reps <- rbinom(n=Rep, prob=n1/N.true, size=n2)
m2.vec <- n1 * n2 / m2.vec
N.hat hist(N.hat, xlab="Population estimate", main="")
mean(Nhat)
quantile(N.hat, probs=c(0.5), na.rm=TRUE)) # Sample median
Debugging JAGS code can be more challenging. There is no option for executing single lines of code as is possible with R. It is usually a matter of fixing one error at a time until the code runs, then checking and testing the code to be confident that it is working correctly. Consider the following (correctly coded) example for a tagging study to estimate the exploitation rate (fraction of the population harvested). Our simulation generates the single observation for the number of tags returned (e.g., over a year). This observation is drawn from a binomial distribution and we use the same JAGS function (dbin()
) as in the population estimate examples:
rm(list=ls()) # Clear Environment
# Arbitrary 'observed' values for analysis
<- 100
N.tagged <- 0.4
Exp.rate <- rbinom(n=1, prob=Exp.rate, size=N.tagged)
Tags.returned
# Load necessary library packages
library(rjags) # Package for fitting JAGS models from within R
library(R2jags) # Package for fitting JAGS models. Requires rjags
# JAGS code
sink("ExpRate.txt")
cat("
model {
# Priors
Exp.rate.est ~ dunif(0,1)
# Likelihood
Tags.returned ~ dbin(Exp.rate.est, N.tagged)
}
",fill = TRUE)
sink()
# Bundle data
<- list("N.tagged", "Tags.returned")
jags.data
# Initial values.
<- function(){ list(Exp.rate.est=runif(n=1, min=0, max=1))}
jags.inits
<- 'ExpRate.txt'
model.file
# Parameters monitored
<- c("Exp.rate.est")
jags.params
# Call JAGS from R
<- jags(data=jags.data, inits=jags.inits, jags.params,
jagsfit n.chains = 3, n.thin = 1, n.iter = 2000, n.burnin = 1000,
model.file)print(jagsfit, digits=3)
plot(jagsfit)
Begin by running the code as is, then try substituting each of the following (incorrect) lines, one at a time.
ExpRate.est ~ dunif(0,1)
Exp.rate.est ~ unif(0,1)
Tags.returned ~ dbin(Exp.rate.est)
The first error message points to line 8. JAGS correctly reports that the variable Exp.rate.est has not been previously defined. This is correct, but the error is actually on line 5, where the intended parameter name (Exp.rate.est) was mistyped. The second error is easy to diagnose because it specifies that the “unif” distribution on line 5 is unknown. The third error indicates that a discrete-valued parameter for function dbin()
is missing. This makes sense because the size of the experiment (N.tagged) was omitted.
As with R code, the first step is always to review the code line by line. Fitting a model to simulated data is always useful (even if there are field observations) because the correct answer is known. Another strategy is to start with the simplest possible analysis, then add complexity one piece at a time. For example, Kéry and Schaub (2011) discuss Cormack-Jolly-Seber models for estimating survival, beginning with constant parameters then gradually adding more complexity (e.g., models that allow for time and individual variation).
Skill in debugging comes with experience. It is hoped that the above strategies will be useful in working through the inevitable coding errors.
4.3.2 Judging convergence
The MCMC process is iterative so it is important to ensure that the number of updates (iterations) is sufficient to achieve convergence. One practical approach is to fit the model with some arbitrary (but reasonably large) number of updates, then refit the model using larger numbers of updates. Results that are stable suggest that convergence has been achieved. The convergence statistic Rhat (Gelman and Rubin 1992; Brooks and Gelman 1998; Lunn et al. 2012) is also generally reliable as a measure of convergence. It compares between- and within-chain variance; their ratio should be close to 1 once chains have converged. Especially for models for which convergence appears to be slow, it can also be helpful to view “trace” plots that show how estimates change as updating occurs (Kéry 2010). Add the following lines of code to the two-sample mark-recapture model used in Section 4.3:
par(mfrow=c(3,1)) # Multi-frame plot, 3 rows, 1 col
matplot(jagsfit$BUGSoutput$sims.array[,,1], type = "l", ylab="Marked fraction")
matplot(jagsfit$BUGSoutput$sims.array[,,2], type = "l", ylab="N hat")
matplot(jagsfit$BUGSoutput$sims.array[,,3], type = "l", ylab="Deviance")
<- as.mcmc(jagsfit) # Creates an MCMC object for plotting
jagsfit.mcmc plot(jagsfit.mcmc) # Trace and density plots
The first approach follows an example from Kéry (2010). We use the par()
graphical function to set up a multi-frame plot (by row), with three rows and one column. This results in a compact set of three plots within a single RStudio plot window. The next three lines use the built-in matplot()
function to plot columns of a matrix as overlaid lines. The Environment window can be viewed to see the structure of the sims.array, a matrix of type num[1:1000, 1:3, 1:3]. The first dimension is for the 1000 retained updates, the second is for chains 1-3, and the third is for the variable (MarkedFraction, Nhat, Deviance). Omitting the ranges for updates and chains results in a plot containing all updates, with a different colored line for each chain. We could also specify a subset of the range to see the initial pattern (e.g., …sims.array[1:100,,1], …). For this simple model, the estimates converge almost immediately so there is no obvious transitory pattern. Note that there is a lot of wasted space and it may be necessary to increase the size of the Plots window to see the plots adequately.
The next two of code provide a more automated way of generating trace and density plots, using methods contained in the coda (R-coda?) package (loaded automatically by rjags). The as.mcmc()
function converts the original output object (jagsfit) into an MCMC object. The plot function here produces a nicely formatted set of trace and density plots that are very helpful in confirming convergence. If convergence has been achieved, the trace plots should appear to be a “grassy lawn” or “fuzzy caterpillar” without any trend, with overlapping (fully mixed) chains. Density plots show the estimated posterior distributions, and given convergence, should have an essentially identical pattern for the three chains. Convergence occurs very quickly here but we will see in later chapters with more complex models that convergence can sometimes be more difficult to achieve.
4.3.3 What is a likelihood?
As mentioned above, Bayesian software uses the likelihood and prior information to estimate the posterior distribution. But what exactly is a likelihood? It is an expression or function that is used to estimate the likelihood of obtaining the sample observation(s), given some specific value for each model parameter. As an example, let’s return to the second version of the mark-recapture analysis, with the single model parameter MarkedFracton (the fraction of the population that is marked). We use the binomial distribution to model the number of recaptures. The likelihood function for the binomial is the same as the probability distribution: \(L(p | n, k) = {n \choose k} p^{k} (1-p)^{n-k}\), except that here, we estimate the likelihood of a certain value for p, given k successes in a trial of size n. We can look at likelihoods using the same parameter values as in the simulation:
rm(list=ls()) # Clear Environment
# Arbitrary 'observed' values for analysis
<- 400 # Population size
N.true <- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true * p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2 <- n2 * p.true # Number of marked fish in second sample
m2
# Calculate likelihood for specific value of p
<- 0.2
p choose(n2,m2) * p^m2 * (1-p)^(n2-m2) # Likelihood for single value of p
dbinom(x=m2, size=n2, prob=p) # x successes, built-in function to obtain same result
# Create vector of possible p values, determine likelihood for each
<- seq(from=0.02, to=1, by=0.02)
p <- choose(n2,m2) * p^m2 * (1-p)^(n2-m2)
l.vec par(mfrow=c(1,1)) # Reset plot frame
plot(p, l.vec, ylab="Likelihood", col="red")
# Examine change in likelihood at higher number of recaptures
<- 18
new.m2 <- choose(n2,new.m2) * p^new.m2 * (1-p)^(n2-new.m2)
new.l.vec points(p, new.l.vec, col="blue")
The likelihood calculation at p=0.2 uses the choose()
function, to determine how many ways m2=16 recaptures can be drawn from a sample of size 80. Type choose(n2,m2)
in the Console to see that it is a very large number! The remainder of the expression determines the probability of 16 successes and 80-16 failures. We can obtain the same result from the built-in function dbinom()
, although it seems more instructive to write out the full expression.
The calculated likelihood (0.11) is really only of interest as a relative value, compared to other potential values for p. We use the seq()
function to create a vector of potential p values, and obtain a vector of likelihoods from the p vector. Plotting the points shows that the likelihood peaks sharply at 0.2 (not surprisingly since that is the value that generated the 16 recaptures) and decreases to near 0 at about 0.08 and 0.32.
We have assumed thus far that we obtained the expected number of recaptures (80*p), but in reality, the number of recaptures would be a binomially-distributed random variate. A different observed value for m2 would shift the likelihood function. The next lines of code calculate the likelihood for a slightly higher number of recaptures. The points function adds the new likelihood values to the plot, using the col=“blue” plot option. The new values are shifted to the right because the number of recaptured fish was higher.
If multiple independent samples are taken, the combined likelihood is a product. For our two assumed samples of 80 fish, with 16 recaptures in one and 18 in the other, the combined likelihood at each level of p would be the product l.vec * new.l.vec.
We wrote out the expression for the likelihood for this example, but that is not needed when using the BUGS language. We need only specify the function name for the distribution describing our sample data. JAGS does the behind-the-scenes work to provide the code for the likelihood function. Working at a higher level allows us to focus on the study design and assumptions, which guide us to our choice of sample distribution.
4.3.4 Deviance as a measure of model fit
Likelihood calculations were introduced in Section 4.3.3. Here we show two related measures of model fit (log-likelihood and deviance) using the same mark-recapture example. There can be computational advantages to working with the log-likelihood compared to the likelihood (McCarthy 2007).
rm(list=ls()) # Clear Environment
# Arbitrary 'observed' values for analysis
<- 400 # Population size
N.true <- 0.2 # Capture probability (fraction of population sampled)
p.true <- N.true * p.true # Number caught and marked in first sample
n1 <- N.true * p.true # Number caught and examined for marks in second sample
n2 <- n2 * p.true # Number of marked fish in second sample
m2
# Create vector of possible p values, determine likelihood for each
<- seq(from=0.02, to=1, by=0.02)
p <- choose(n2,m2) * p^m2 * (1-p)^(n2-m2)
l.vec plot(p, l.vec, ylab="Likelihood", col="red")
<- log(l.vec) # Log-likelihood
log_l.vec plot(p, log_l.vec, ylab="log-Likelihood", col="red")
<- -2 * log_l.vec # Deviance (-2 * ln_L)
dev.vec plot(p, dev.vec, ylab="Deviance", col="red")
Compared to the likelihood, the log-likelihood curve has a different shape but the same peak at p=0.2 (the most likely value given 16 successes in a trial size of 80). The deviance is obtained by multiplying the log-likelihood by -2. The plotted pattern is now inverted so that now the curve has a minimum at p=0.2; i.e., a larger value for deviance indicates a poorer fit (McCarthy 2007). JAGS uses the deviance to obtain parameter estimates, but it is nice to know that we would obtain the same estimates from either the maximum likelihood or log-likelihood or the minimum deviance (McCarthy 2007).
4.3.5 Model checking
It is easy to fit the correct model to simulated data, but that is not the case for real (field) data. Any model fit to field data should be viewed as an approximation of the underlying ecological processes. We cannot prove that the model is correct, but we can examine whether it is a useful approximation. One method for doing this is the posterior predictive check (Kéry 2010; Kéry and Schaub 2011; Gelman et al. 2013; Conn et al. 2018). This method compares a measure of fit for replicated data generated under the fitted model compared to observed data. If the measure of fit (e.g., sum of squares or chi-square) is markedly different (usually higher) for the observed data, that suggests that the observed data contain some features not captured by the fitted model.
Our first example uses simulated catch data from a Poisson distribution, as might be obtained by standardized electrofishing or trawling. Keep in mind that these are our “observed” catches, to be compared with replicate data obtained by simulation. We arbitrarily set the mean for the Poisson distribution at 7 and the sample size at 30. Before viewing the extended JAGS code that includes the posterior predictive check, let’s examine a version that simply fits the model:
# Fit model to catch data from Poisson distribution
rm(list=ls()) # Clear Environment
<- 30
N <- 7
mu <- rpois(n=N, lambda=mu) # Catches drawn from Poisson distribution
Catch <- table(Catch) # Distribution of simulated catches
Freq barplot(Freq, main="", xlab="Catch", ylab="Frequency")
# Load necessary library
library(rjags)
library(R2jags)
sink("PoissonFit.txt")
cat("
model {
# Prior
lambda.est ~ dunif(0, 100)
# Likelihood
for(i in 1:N) {
Catch[i] ~ dpois(lambda.est)
} #y
}
",fill=TRUE)
sink()
# Bundle data
<- list("N", "Catch")
jags.data
# Initial values
<- function(){ list(lambda.est=runif(n=1, min=0, max=100))}
jags.inits
<- 'PoissonFit.txt'
model.file
# Parameters monitored
<- c("lambda.est")
jags.params
# Call JAGS from R
<- jags(data=jags.data, jags.params, inits=jags.inits,
jagsfit n.chains = 3, n.thin=1, n.iter = 10000,
model.file)print(jagsfit)
plot(jagsfit)
There are only a few lines of JAGS code. We use an uninformative uniform prior distribution for the single Poisson parameter lambda.est. The catches are assumed (correctly) to be drawn from a Poisson distribution, and the likelihood calculations use the Poisson distribution function dpois()
. The initial value is obtained from the same uniform distribution as the prior. JAGS returns the estimated mean, which varies around the true value due to the relatively modest sample size. Try rerunning the code using a large sample size to see the improvement in the sample catch plot and the estimated mean.
Next we add a few lines of code to carry out the posterior predictive check:
# Fit model to catch data from Poisson distribution
# Includes code for posterior predictive check
rm(list=ls()) # Clear Environment
<- 30
N <- 7
mu <- rpois(n=N, lambda=mu) # Counts drawn from Poisson distribution
Catch <- table(Catch) # Distribution of simulated catches
Freq barplot(Freq, main="", xlab="Catch", ylab="Frequency")
# Load necessary library
library(rjags)
library(R2jags)
sink("PPC_Example.txt")
cat("
model {
# Prior
lambda.est ~ dunif(0, 100)
# Likelihood
for(i in 1:N) {
Catch[i] ~ dpois(lambda.est)
Catch.new[i] ~ dpois(lambda.est) # Generating replicated new data for PPC
chi.obs[i] <- pow(Catch[i]-lambda.est,2)/lambda.est # chi-square discrepancy for obs
chi.new[i] <- pow(Catch.new[i]-lambda.est,2)/lambda.est # chi-square for replicate new data
} #y
Total.obs <- sum(chi.obs[])
Total.new <- sum(chi.new[])
Bayesian.p <- step(Total.new - Total.obs)
}
",fill=TRUE)
sink()
# Bundle data
<- list("N", "Catch")
jags.data
# Initial values
<- function(){ list(lambda.est=runif(n=1, min=0, max=100))}
jags.inits
<- 'PPC_Example.txt'
model.file
# Parameters monitored
<- c("lambda.est", "Bayesian.p")
jags.params
# Call JAGS from R
<- jags(data=jags.data, jags.params, inits=jags.inits,
jagsfit n.chains = 3, n.thin=1, n.iter = 10000,
model.file)print(jagsfit)
plot(jagsfit)
Within the likelihood looping section, we draw new replicate observations from the posterior distribution (Catch.new), and calculate chi-square values for observed and replicate data. The Bayesian p value is obtained using the step()
function and the total chi-square values for the observed and replicate data sets. The Bayesian p values tend to be quite variable; ten trials produced values of 0.34, 0.42, 0.18, 0.16, 0.32, 0.30, 0.32, 0.67, 0.69, and 0.18. We would expect the total chi-square for observed and replicate data to be the same on average as the fitted model is known to be correct (i.e., our “observed” catches and the replicate data sets are drawn from Poisson distributions).
Next consider a case where the “observed” data are obtained from something other than a Poisson distribution. The negative binomial distribution is a good choice for ecological data because it can allow for a skewed distribution (e.g. mostly low but a few large catches). The two negative binomial parameters are mu (mean catch) and the overdispersion parameter k (Bolker 2008). A small value for k produces more clumping or aggregation, whereas extreme values of k (>10*mu) produce a Poisson-like distribution (Bolker 2008). Our JAGS code for model fitting remains unchanged but we replace the Poisson simulation code with the following:
# Fit model to negative binomial data
# Includes code for posterior predictive check
rm(list=ls()) # Clear Environment
<- 30
N <- 7
mu =1
k<- mu+(mu^2)/k
variance <- rnbinom(n=N, mu=mu, size=k) # Catches drawn from negative binomial
Catch <- table(Catch) # Distribution of simulated counts
Freq barplot(Freq, main="", xlab="Count", ylab="Frequency")
Now Bayesian p values are consistently extreme (0.00 for five replicate trials). The total chi-square for the replicate data sets (which are Poisson distributed) is always less than for the observed data (which are not). The fitted Poisson model lacks the flexibility to mimic the skew of the observed catch distribution. More intermediate Bayesian p values are only obtained if k takes on very large values (e.g., 1000), which results in a Poisson-like catch distribution.
Kéry and Schaub (2011) list some concerns about the posterior predictive check. One is that the observed data are used twice: in fitting the model and in producing the Bayesian p value. Another is that the method is qualitative, in that there is not an objective p-value range that indicates an acceptable fit. Despite those concerns, we agree with Kéry and Schaub (2011) that the approach is a useful way of detecting cases where a model fits poorly. Simulation is helpful in gaining experience with the approach, because it is straightforward to obtain Bayesian p values for correct and incorrect fitted models.
4.3.6 Model selection
It is often the case that more than one model may be considered as plausible. For example, survey catches could be modeled using either a Poisson or negative binomial distribution. A model for a tag-return study could include separate parameters for reporting rate of tags from commercial and recreational fishers, or just a single parameter if the sector reporting rates were similar. A third example is a model for survey catch rate, which might include covariates such as location, year, depth, or water temperature. Determining which covariates affected catch rate would make it possible to produce an improved survey index that was adjusted for environmental variation. For example, were low survey catches in the current year due to anomalously low water temperatures on sampling dates or to a real decline in abundance? The suite of candidate models should be chosen before data analysis begins (Burnham and Anderson 2004) to avoid the tendency to examine the study results and to choose candidate models that match observed patterns. For example, observing low recruitment in a recent year might spur us to search for environmental covariates with a similar temporal pattern.
Choosing a preferred model among the suite of candidates (model selection) is a vast and complex topic (Link and Barker 2010), and there is no consensus among statisticians as to the best approach (Kéry and Schaub 2011). Kéry and Schaub (2011) suggest that one practical approach is to decide on a model that is biologically plausible and stick to that. They also sometimes eliminate candidate parameters that have credible intervals that include zero, similar to a backward stepwise regression. Hooten and Hobbs (2015) recommend model selection based on predictive ability, either in-sample (data used in fitting the model) or out-of-sample (new data). Out-of-sample validation is considered the gold standard (Hooten and Hobbs 2015). The main disadvantage of that approach is the requirement for additional data not used in fitting the model.
We illustrate here a simple approach of separately fitting candidate models and comparing DIC scores. Our “observed” data are simulated counts from a negative binomial distribution. Next we fit Poisson and negative binomial candidate models to the same observed data and compare DIC scores. The negative binomial distribution in JAGS uses the probability of success (p) and size (our overdispersion parameter k). We return the estimated mean for the negative binomial distribution, which is calculated internally using p.est and k.est.
# DIC comparison for model selection, using "observed" data from a negative binomial distribution
rm(list=ls()) # Clear Environment
<- 30
N <- 7
mu =1
k<- mu+(mu^2)/k
variance <- rnbinom(n=N, mu=mu, size=k) # Counts drawn from negative binomial
Count <- table(Count) # Distribution of simulated counts
Freq barplot(Freq, main="", xlab="Count", ylab="Frequency")
# Load necessary library
library(rjags)
library(R2jags)
sink("PoissonFit.txt")
cat("
model {
# Prior
lambda.est ~ dunif(0, 100)
# Likelihood
for(i in 1:N) {
Count[i] ~ dpois(lambda.est)
} #y
}
",fill=TRUE)
sink()
# Bundle data
<- list("N", "Count")
jags.data
# Initial values
<- function(){ list(lambda.est=runif(n=1, min=0, max=100))}
jags.inits
<- 'PoissonFit.txt'
model.file
# Parameters monitored
<- c("lambda.est")
jags.params
# Call JAGS from R
<- jags(data=jags.data, jags.params, inits=jags.inits,
jagsfit n.chains = 3, n.thin=1, n.iter = 10000,
model.file)print(jagsfit)
#plot(jagsfit)
<- jagsfit$BUGSoutput$DIC
Poisson.DIC
sink("NBFit.txt")
cat("
model {
# Prior
p.est ~ dunif(0, 1) # JAGS uses p (probability of success) for negative binomial
k.est ~ dunif(0, 1000)
mu.est <- k.est*(1-p.est)/p.est
# Likelihood
for(i in 1:N) {
Count[i] ~ dnbinom(p.est, k.est)
} #y
}
",fill=TRUE)
sink()
# Bundle data
<- list("N", "Count")
jags.data
# Initial values
<- function(){ list(p.est=runif(n=1, min=1E-6, max=1),
jags.inits k.est=runif(n=1, min=0, max=1000))}
<- 'NBFit.txt'
model.file
# Parameters monitored
<- c("mu.est", "k.est")
jags.params
# Call JAGS from R
<- jags(data=jags.data, jags.params, inits=jags.inits,
jagsfit n.chains = 3, n.thin=1, n.iter = 10000,
model.file)print(jagsfit)
#plot(jagsfit)
<- jagsfit$BUGSoutput$DIC
NB.DIC - NB.DIC Poisson.DIC
The difference in DIC scores (printed to the Console for convenience) ranged from 50.4 to 90.4 in five trials using the default simulation settings. The substantially higher DIC score for the (incorrect) Poisson model is clear evidence of a poorer fit. Burnham and Anderson (2004) suggested that an alternative model with an AIC score within 1-2 of the best model merits consideration whereas a difference of 3-7 suggests much less support. Spiegelhalter et al. (2002) noted that the Burnham and Anderson (2004) criteria appear to work well for DIC scores.
Comparing DIC scores worked well for this case. Also, using simulated counts from a negative binomial distribution is convenient because we can adjust the overdispersion parameter (k) to obtain a more “Poisson-like” distribution. At what value for k do you obtain similar DIC scores for the two models? Observe the change in shape of the count frequency distribution as you vary k.
If considering this approach to model selection, keep in mind that DIC scores are not reliable for some more complex models; for example, ones with categorical parameters (Lunn et al. 2009). This is an advantage of starting with simulated data, in that the underlying distribution is known. Confirming that DIC scores reliably identify the correct model with simulated data is a useful first step before any analysis of field data.
4.4 Exercises
Estimate uncertainty around a two-sample mark-recapture estimate Nhat using the simulation and Bayesian approaches for a capture probability of 0.4. How similar are the results and how do they compare to results using the original capture probability of 0.2?
Robson and Regier (1964) provided practical advice on acceptable levels of accuracy and precision. They suggested that estimates within 10% of the true value be acceptable for research studies, versus 25% for management and 50% for preliminary studies. Bayesian credible intervals could be used as a proxy for their accuracy and precision targets. What capture probabilities (approximate) would correspond to those targets for the mark-recapture example?
How many errors can you locate before running the following code? Fix all errors that you spotted then run the code to fix any remaining errors.
rm(list=ls()) # Clear Environment
# Load necessary library packages
library(rjags) # Package for fitting JAGS models from within R
library(R2jags) # Package for fitting JAGS models. Requires rjags
# Arbitrary 'observed' values for analysis
<- 20
Reps <- 3
AveC <- rpois(n=Reps lambda=AveC)
Count
# JAGS code
sink("PoissonSim.txt")
cat("
model {
# Priors
AveC.est ~ dunif(0, MaxCount)
# Likelihood
for (i in 1:Rep){
Count[i] ~ dpois(AveCest)
} #i
}
",fill = TRUE)
sink()
# Bundle data
<- list("Count")
jags.data
# Initial values.
<- function(){ list(AveC.est=runif(n=1, min=0, max=100))}
jags.inits
<- 'PoisonSim.txt'
model.file
# Parameters monitored
<- c("AveC.est")
jags.params
# Call JAGS from R
<- jags(data=jags.data, inits=jags.inits, jags.params,
jagsfit n.chains = 3, n.thin = 1, n.iter = 2000, n.burnin = 1000,
model.file)print(jagsfit, digits=3)
plot(jagsfit)
Consider a mark-recapture study with n1 = 60 and two independent samples for recaptures (n2=90, m2=40; n3=55, m3=30). Use a vector for p at 0.01 intervals and produce a vector and plot of joint likelihoods (vector given the name “both”). Assume that only the initial sample (n1) is marked; samples n2 and n3 provide two independent snapshots of the marked fraction. Explain in detail how p[which.max(both)] works and what it produces.
Use the approach from Section @(Model-selection) to compare model fits for a more “Poisson-like” negative binomial distribution (e.g., k=10).