5 Abundance

Smith 1894. Information about abundance can be invaluable for making policy decisions. For fish populations that support commercial or recreational fishing, population estimates are highly useful for deciding about the appropriate level of harvest. For rare or declining populations, abundance data can guide the restoration process. In this chapter, we will consider a few simple field approaches for estimating population size. These methods can be used in freshwater or marine systems, but they do require that the study area be closed (no migration in or out, no recruitment (additions through reproduction), and no mortality for the duration of the study). Later chapters will consider open models that jointly estimate population abundance and mortality over longer time frames.

5.1 Removal

Removal models are a good option when a substantial fraction of the study population can be captured in a single sample. One common example is use of backpack electrofishing in streams that can be blocked on both ends to provide a temporarily closed system. Fish captured in each sample (or “pass”) are held out temporarily and the declining catches from the diminishing population provide the data for estimating abundance. Consider a case with a true (study area) population of 100 and a capture probability (fraction of the population captured in each sample) of 0.4. The expected sequence of catches would be 40 (100 * 0.4), 24 ((100-40) * 0.4), 14.4 ((60-24)*0.4), etc. A model fit to these data provides estimates of initial population size and capture probability that best mimic the pattern of catches. The fit would be very good if we used the above expected values, but it is more challenging (and realistic) when the catch sequence includes random variation.

We illustrate the removal method using a simulation to generate sample data. Here the true values are arbitrary, but roughly similar to estimates obtained by Bryant (2000) for Alaska streams. If planning a field study, your assumed population level should be based on prior work or literature review. The assumed capture probability should also be realistic, but it is a bit different from population size in that it is under the investigator’s control. Thus, a better estimate can be achieved by increasing sampling effort; for example, by using more electrofishing units or setting more traps. The other study design variable controlled by the investigator is the number of samples. Here we use five removal samples, which should provide high quality results.

rm(list=ls()) # Clear Environment

# Generate simulated removals, assuming constant catchability
N.removals <- 5 # Number of removals
N.true <- 100
CapProb <- 0.4
N.remaining <- N.true # Initialize N.remaining, before removals begin
Catch <- numeric() # Creates empty vector for storing catch data
for (j in 1:N.removals){
  Catch[j] <- rbinom(1, N.remaining, CapProb)
  N.remaining <- N.remaining - Catch[j]
  }

Running the above lines of code generates the sequence of simulated catches. (Note that these lines of simulation code would be replaced by an assignment statement (e.g., Catch <- c(14, 4, 6, 2, 3)) if we were analyzing data from an actual field study). A ‘for’ loop is used to step through the simulated sampling sequence. For each value of j, we generate a binomially-distributed random catch using the rbinom() function, then remove that number of fish from the remaining population. One example run produced a catch vector of 36, 25, 20, 6 and 6. The high assumed value for capture probability produces very good data; thus, these values are close to what would be expected. Try running the code using other values for capture probability. Is there a lower bound below which you would not expect the method to work well?

The next step is to fit the removal model using JAGS. The prior distribution for capture probability is straightforward (uniform 0-1) but the choice is a bit more subjective for population size (N.est). We can use the total catch as a lower bound and an arbitrary high value for the upper bound. That should ensure that the prior distribution does not affect the solution, but that could be confirmed by comparing fits using different upper bounds. Knowledge of your study system and the literature would be useful in setting the upper bound for analyzing a real data set.

# Load necessary library
library(rjags)   # Package for fitting JAGS models from within R
library(R2jags)  # Package for fitting JAGS models. Requires rjags

# JAGS code for fitting model
sink("RemovalModel.txt")
  cat("
model{

# Priors
 CapProb.est ~ dunif(0,1)
 N.est ~ dunif(TotalCatch, 2000)

 N.remaining[1] <- trunc(N.est)
 for(j in 1:N.removals){
      Catch[j]~dbin (CapProb.est, N.remaining[j]) # jth removal
      N.remaining[j+1] <- N.remaining[j]-Catch[j] # Remaining population after removal
  } #j

}
      ",fill=TRUE)
  sink()

     # Bundle data
  jags.data <- list("Catch", "N.removals", "TotalCatch")

  TotalCatch <- sum(Catch[])

  # Initial values
  jags.inits <- function(){ list(CapProb.est=runif(1, min=0, max=1),
                                  N.est=runif(n=1, min=TotalCatch, max=2000))}

  model.file <- 'RemovalModel.txt'

  # Parameters monitored
  jags.params <- c("N.est", "CapProb.est")

   # Call JAGS from R
  jagsfit <- jags(data=jags.data, jags.params, inits=jags.inits,
                  n.chains = 3, n.thin = 1, n.iter = 40000,
                  model.file)
  print(jagsfit)
  plot(jagsfit)

JAGS requires an integer size variable in dbin() so we use the trunc() function to truncate the starting value. The looping code is the same as in the simulation, except that JAGS does not allow a value to to be repeatedly changed in a loop, so we set up remaining population size as a vector. We provide to JAGS the simulated catch vector, the number of removal samples, and the total catch. Initial values are generated using the same distributions as the priors. We monitor the estimates of population size and capture probability.

Initial runs using fewer updates occasionally resulted in poor convergence so n.iter was increased to 40000. Removing n.burnin uses the default of discarding half the updates. Running the simulation and analysis code multiple times for these simulation parameters shows that the estimates are consistently good, with narrow credible intervals.

As a fishery biologist, you could use this combination of simulation and analysis to plan your field study. You could look at different choices for the two design variables (number of removal samples, capture probability) to decide what would produce reliable results. Try runs using the above code, modified to test different numbers of removal samples and assumed capture probabilities. It is important to acknowledge that this is an optimistic case, in that the data exactly follow binomial sampling. In a real field study, the model will be an approximation so any design choices should be viewed as lower bounds. For example, if four removal samples appear to be sufficient, consider that a minimum or perhaps low. Another consideration is that we have made the simplifying assumption that capture probability is constant across samples. Bryant (2000) found that capture probability was generally constant when using traps, but it may decline with disruptive methods such as electrofishing or seining, or if vulnerability varies among individuals (Mäntyniemi et al. 2005). If capture probability declines as additional samples are taken, a more complex function would be needed for capture probability.

5.2 Mark-recapture

We briefly revisit the two-sample mark-recapture experiment from Sections 2 and 4. In this section, we change a single line of code to produce a random number of recaptured fish rather than the expected value. Now the combined code (simulation plus analysis) can be rerun multiple times to build intuition about the reliability of the experimental results. This is especially useful when the assumed population size or capture probability is low. The true value can be compared to the 95% credible interval (2.5 and 97.5 percentiles, printed to the Console) or 80% credible interval shown in the plots. You can look at either the marked fraction or Nhat, as they covary (low estimate for marked fraction produces high estimate for Nhat). The last four lines of code illustrate that it can be instructive to examine not only the posterior distributions but also the underlying relationship(s) (see bivariate plot). Here it is clear that Nhat is inversely related to the model parameter for marked fraction, but it may be less obvious in situations with more complex models.

rm(list=ls()) # Clear Environment

# Simulate experiment
N.true <- 400  # Population size
p.true <- 0.2 # Capture probability (fraction of population sampled)
n1 <- N.true * p.true # Number caught and marked in first sample
n2 <- N.true * p.true # Number caught and examined for marks in second sample
m2 <- rbinom(1, n2, p.true) # Number of marked fish in second sample

# Load necessary library
library(rjags)   # Package for fitting JAGS models from within R
library(R2jags)  # Package for fitting JAGS models. Requires rjags

# JAGS code
sink("TwoSampleCR.txt")
cat("
    model {

    # Priors
    MarkedFraction ~ dunif(0, 1)

    # Calculated value
    Nhat <- n1 /MarkedFraction

    # Likelihood
    # Binomial distribution for observed recaptures
    m2 ~ dbin(MarkedFraction, n2)
    }

    ",fill = TRUE)
sink()

# Bundle data
jags.data <- list("n1", "n2", "m2")

# Initial values.
jags.inits <- function(){ list(MarkedFraction=runif(1, min=0, max=1))}

model.file <- 'TwoSampleCR.txt'

# Parameters monitored
jags.params <- c("Nhat", "MarkedFraction")

# Call JAGS from R
jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params,
                n.chains = 3, n.thin = 1, n.iter = 2000, n.burnin = 1000,
                model.file)
print(jagsfit, digits=3)
plot(jagsfit)

par(mfrow=c(3,1))
hist(jagsfit$BUGSoutput$sims.array[,,1], main="", xlab="Marked fraction")
hist(jagsfit$BUGSoutput$sims.array[,,2], main="", xlab="Population estimate")
plot(jagsfit$BUGSoutput$sims.array[,,1], jagsfit$BUGSoutput$sims.array[,,2],
     xlab="Marked fraction", ylab="Population estimate")

If you were managing the fishery for this lake population, how might you use the estimate of abundance? Try a few runs (simulation and model fitting) to see whether this level of uncertainty would be tolerable for management. In doing so, keep in mind that this assumed capture probability (0.2) is quite high and would be difficult to achieve in most field situations. Is there a rough lower bound for capture probability, below which the cost and effort of doing a study would not be justified?

5.3 Binomial-mixture

A population estimate can sometimes be obtained from count data when replicate survey samples are taken (Kéry and Schaub 2011). One fisheries example is the use of side-scan sonar to count adult Atlantic sturgeon, which are large and distinctive enough to be identifiable on side-scan images (Flowers and Hightower 2013, 2015). Sites were assumed to be closed (no movement in or out, no mortality) for the brief duration of the survey. Each site had an unknown number of sturgeon, but it was not assumed that every sturgeon was counted (detected). Counts could vary among survey passes due to a variety of factors, including a fish’s orientation, overlap with other fish or structure, or distortion in the image. Thus the counting process was modeled using a binomial distribution, where “size” is the number of sturgeon present at the site and “prob” is the probability of being detected on that survey pass. The probability is comparable to a capture probability but here sampling was unobtrusive. This type of model is referred to as a binomial mixture model (or N-mixture model). It attempts to separate the biological parameter (site abundance) from the observational parameter (detection probability) (Kéry and Schaub 2011).

Consider a site with twenty individuals and a detection probability of 0.5. We can simulate a large sample of replicates to get a smooth distribution of counts, although in an actual field survey, there would typically be only a few replicate passes per site.

rm(list=ls()) # Clear Environment

# Simulate experiment
N.true <- 20  # Number of individuals at site
p.true <- 0.5 # Detection probability (fraction of individuals detected)
Replicates <- 1000
Counts <- rbinom(n=Replicates, size=N.true, prob=p.true) # Vector of replicate counts

par(mfrow=c(1,1)) # Ensure plot window is reset to default setting
hist(Counts, main="")
table(Counts)

The expected count is 10 (N.true * p.true); the histogram and table show the frequency of different values. One run produced minimum and maximum counts of 3 and 16. Counting 20 would be possible but extremely unlikely. How might you estimate empirically the probability of counting 15 or more? Experiment with different detection probabilities to find a level that produced occasional counts of 20 individuals.

It is instructive to compare the default distribution (N.true=20 and p.true=0.5) to other combinations that would have the same expected count (e.g., N.true=40 and p.true=0.25). In the following code, we use even more replicates, and add upper and lower limits for the x axis in order to compare the two histograms. The differences in the shape of the distribution are subtle, illustrating that it can be challenging to get a reliable abundance estimate using this approach.

N.true <- 20  # Number of individuals at site
p.true <- 0.5 # Detection probability (fraction of individuals detected)
Replicates <- 10000
Counts <- rbinom(n=Replicates, size=N.true, prob=p.true) # Vector of replicate counts
table(Counts)

par(mfrow=c(2,1)) # Two rows, one column
hist(Counts, main="N.true=20", xlim=c(0,25))

N.true <- 40  # Number of individuals at site
p.true <- 0.25 # Detection probability (fraction of individuals detected)
Counts <- rbinom(n=Replicates, size=N.true, prob=p.true) # Vector of replicate counts
hist(Counts, main="N.true=40", xlim=c(0,25))
table(Counts)

Try a more extreme second case (N.true=100 and p.true=0.1) to see if there is a more pronounced difference compared to the default (remember to change the second plot title!). It may be necessary to change the x-axis limits if N.true is increased.

To simulate a full field study, we need to add code for multiple sites. We assume 30 sites and 3 replicate visits to each site. Kéry and Schaub (2011) recommended at least 10-20 sites and at least two observations per site. It is assumed that the number of sturgeon per site varies about an underlying mean, so having a large sample of sites will aid in estimating that mean. There are a variety of ways to model the variation among sites. Here we chose the simplest option of using a Poisson distribution, which has only one parameter to estimate:

rm(list=ls()) # Clear Environment

Sites <- 30
Replicates <- 3
lam.true <- 20 # Mean number per site
p.true <- 0.5 # Detection probability

Counts <- matrix(NA, nrow=Sites, ncol=Replicates)
Site.N <- rpois(n = Sites, lambda = lam.true) # True abundance at each site
for (i in 1:Sites){
  Counts[i,] <- rbinom(n = Replicates, size = Site.N[i], prob = p.true)
  } #i
head(Site.N)
head(Counts)

The matrix function creates an empty matrix, which will hold the replicate counts for each site. Next, the function rpois generates the Poisson-distributed random number of sturgeon at each site. We then loop over sites to generate the binomially-distribution counts at each site. This illustrates the hierarchical nature of this model in that site abundance is set first, then replicate observations are drawn, given the site abundance. We can use the head function to look at the relationship between true abundance (Site.N) and counts for the first few sites. In one example run, the first site had 25 sturgeon and produced replicate counts of 11, 14, and 13.

# Load necessary library
library(rjags)
library(R2jags)

# Specify model in BUGS language
sink("N-mix.txt")
cat("
model {

# Priors
  lam.est ~ dunif(0, 100)  # uninformative prior for mean abundance at each site
  p.est ~ dunif(0, 1)

# Likelihood
# Biological model for true abundance
for (i in 1:Sites) {
   N.est[i] ~ dpois(lam.est)
   # Observation model for replicated counts
   for (j in 1:Replicates) {
      Counts[i,j] ~ dbin(p.est, N.est[i])
      } # j
   } # i
totalN <- sum(N.est[])     # Calculated variable: total pop. size across all sites
}
",fill = TRUE)
sink()

# Bundle data
jags.data <- list("Counts", "Sites", "Replicates")

# Initial values
Nst <- apply(Counts, 1, max) + 1    # Kery and Schaub
jags.inits <- function(){ list(lam.est=runif(n=1, min=0, max=100),
                               N.est=Nst, p.est=runif(1, min=0, max=1))}

model.file <- 'N-mix.txt'

# Parameters monitored
jags.params <- c("lam.est", "p.est", "totalN")

# Call JAGS from R
jagsfit <- jags(data=jags.data, jags.params, inits=jags.inits,
                n.chains = 3, n.thin=1, n.iter = 100000,
                model.file)
print(jagsfit)
par(mfrow=c(1,1)) # Reset plot panel
plot(jagsfit)

# Estimated posterior distribution for total population size
hist(jagsfit$BUGSoutput$sims.list$totalN, xlab="Estimated total pop size", main="")
abline(v=sum(Site.N), col="red")

The JAGS code includes uninformative prior distributions for mean site abundance and the detection probability. The choice for mean abundance is more subjective but can be based on the observed counts and knowledge or experience about the possible range for true abundance or detection probability. The JAGS code is again structurally similar to the simulation code. The first step is to generate estimates of the true (unknown) abundance at each site. The observations are binomially distributed samples, given the estimated abundance at each site. It is sometimes of interest to look at total abundance over sites, so the site estimates are summed (totalN).

It can be helpful to provide initial values for site abundance. JAGS will terminate the run if a nonsensical event occurs (e.g., count greater than the current estimate of abundance at a site). Kéry and Schaub (2011) provide example code using the apply() function, which is a generic function for applying (hence the name) other functions. In this case, the max() function is applied to the Count matrix by row (argument=1). The returned maximum value for each row is incremented by 1, implying that there is likely to be at least one more sturgeon present than the maximum observed count. The remaining parameters are initialized using settings that match the prior distributions. The final lines of R code show the estimated posterior distribution for total abundance (summed over sites). The abline function is used to add to the plot the true total abundance (vertical red line).

Estimates are generally reliable for these assumed parameter values and study design settings, although convergence was sometimes slow so a large number of updates was used. Note that the model is hierarchical in structure, in that the overall (Poisson) mean is estimated using all site data, then site-specific estimates are drawn using that overall mean. It is worthwhile to run the combined code (simulation plus analysis) several times to see variation in the estimates and their credible intervals. Median estimates of mean site abundance and detection probability tend to be close to the assumed values although credible intervals are fairly wide. The posterior distribution for total abundance appears to be reliable although it is skewed and has a high upper bound for the credible interval (451-1294 for one example run).

This model is a good illustration of the potentially problematic nature of pD, the estimated number of effective parameters. One would expect an estimate around 32, to account for detection probability, mean site abundance, and the site-specific abundance estimates using that mean. However, estimates from several runs were around 60. Kéry and Schaub (2011) note that hierachical models (such as this one) can be problematic for estimating the number of effective parameters (and therefore for DIC). We can look at pD for the various models in this book, to get some experience about when to trust pD and DIC estimates.

As usual, keep in mind that this is a very optimistic scenario. We are estimating mean abundance and detection probability using 30 sites that function as spatial replicates. The data are generated using a detection probability that is high and constant over sites. In an actual field study, there would be additional variation in detection probability among sites, and a binomial distribution would only be an approximation to the field sampling process that generates the observed counts. We have also assumed that variation among sites follows a Poisson distribution. Other distributions that allow for more variation among sites, such as a negative binomial, are possible and can be much more challenging to fit.

5.4 Exercises

  1. Create a table of estimated credible intervals for population size, using different numbers of removal samples and a range of values for capture probability. You will need several replicate simulations to get a clear picture of results for each setting. What are the lower limits below which estimates are judged not to be reliable?

  2. For a two-sample mark-recapture experiment with a true population size of 1000, produce a table of credible intervals for Nhat for different assumed capture probabilities. What sampling intensity would produce an estimate with uncertainty close to the Robson and Regier target for management (+/- 25%)?

  3. Modify the binomial-mixture example to compare credible intervals for totalN using fewer sites but more replicates, preserving the same total number of observations (e.g., 10 sites and 9 replicate survey visits). Can you suggest potential advantages and disadvantages of using fewer sites?

References

Bryant, M. D. 2000. Estimating Fish Populations by Removal Methods with Minnow Traps in Southeast Alaska Streams. North American Journal of Fisheries Management 20(4):923–930.
Flowers, H. J., and J. E. Hightower. 2013. A novel approach to surveying sturgeon using side-scan sonar and occupancy modeling. Marine and Coastal Fisheries: Dynamics, Management, and Ecosystem Science 5(1):211–223.
Flowers, H. J., and J. E. Hightower. 2015. Estimating Sturgeon Abundance in the Carolinas Using Side-Scan Sonar. Marine and Coastal Fisheries 7(1):1–9.
Kéry, M., and M. Schaub. 2011. Bayesian Population Analysis using WinBUGS: A Hierarchical Perspective1st edition. Academic Press, Boston.
Mäntyniemi, S., A. Romakkaniemi, and E. Arjas. 2005. Bayesian removal estimation of a population size under unequal catchability. Canadian Journal of Fisheries and Aquatic Sciences 62:291–300.