15  Bayesian Analysis of Poisson Count Data

Example 15.1 Let \(y\) be the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game.

  1. In what ways is this like the Binomial situation? (What is a trial? What is “success”?)




  2. In what ways is this NOT like the Binomial situation?




Example 15.2 Suppose the number of home runs hit per game (by both teams in total) at a particular Major League Baseball park follows a Poisson distribution with parameter \(\theta\).

  1. Sketch your prior distribution for \(\theta\) and describe its features. What are the possible values of \(\theta\)? Does \(\theta\) take values on a discrete or continuous scale?




  2. Suppose \(y\) represents a home run count for a single game. What are the possible values of \(y\)? Does \(y\) take values on a discrete or continuous scale?




(a) Rate parameter 1, and different shape parameters.
(b) Shape parameter 3, and different rate parameters.
Figure 15.1: Gamma densities

Example 15.3 Figure 15.1 shows a few examples of Gamma distributions.

  1. The plot on the left above contains a few different Gamma densities, all with rate parameter \(\lambda=1\). Match each density to its shape parameter \(\alpha\); the choices are 1, 2, 5, 10.




  2. The plot on the right above contains a few different Gamma densities, all with shape parameter \(\alpha=3\). Match each density to its rate parameter \(\lambda\); the choices are 1, 2, 3, 4.




Example 15.4 Suppose the number of home runs hit per game (by both teams in total) at a particular Major League Baseball park follows a Poisson distribution with parameter \(\theta\). We’ll first use grid approximation to approximate the posterior distribution of \(\theta\) given sample data. We’ll start with a discrete prior for \(\theta\) to illustrate ideas. Suppose that \(\theta\) can only take values 0.5, 1.5, 2.5, 3.5, 4.5, with respective probabilities 0.13, 0.45, 0.28, 0.11, 0.03.

  1. Suppose a single game with 3 home runs is observed. Find the posterior distribution of \(\theta\). In particular, how do you determine the likelihood column?




  2. Suppose that we observe 4 games with a total of 6 home runs. Find the posterior distribution of \(\theta\). In particular, how do you determine the likelihood column?




Example 15.5 Now let’s consider some real data. Assume home runs per game at Citizens Bank Park (Phillies!) follow a Poisson distribution with parameter \(\theta\). Assume as a prior for \(\theta\) a Gamma distribution with shape parameter \(\alpha = 4\) and rate parameter \(\lambda = 2\).

Section 15.1.3 summarizes data for the 2020 season1. There were 97 home runs in 32 games.

Use grid approximation to compute the posterior distribution of \(\theta\). Plot the prior, (scaled) likelihood, and posterior. (Note: you will need to cut the grid off at some point. While \(\theta\) can take any value greater than 0, the interval [0, 8] accounts for 99.99% of the prior probability.)






Example 15.6 Continuing Example 15.5.

  1. Write an expression for the prior distribution of \(\theta\).




  2. Write an expression for the likelihood function given the sample data (97 home runs in 32 games).




  3. Write an expression for the posterior distribution of \(\theta\) given the sample data (97 home runs in 32 games). Identify by the name the posterior distribution and the values of relevant parameters. Compare to the results of the grid approximation.




  4. Find and interpret posterior 50%, 80%, and 98% credible intervals for \(\theta\).




  5. Express the posterior mean as a weighted average of the prior mean and the sample mean. How might you interpret the \(\alpha\) and \(\beta\) parameters in the Gamma prior distribution?




Prior Data Posterior
Total count \(\alpha\) \(n\bar{y}\) \(\alpha+n\bar{y}\)
Sample size \(\lambda\) \(n\) \(\lambda + n\)
Mean \(\frac{\alpha}{\lambda}\) \(\bar{y}\) \(\frac{\alpha+n\bar{y}}{\lambda+n}\)

\[ \frac{\alpha+n\bar{y}}{\lambda+n} = \frac{\lambda}{\lambda+n}\left(\frac{\alpha}{\lambda}\right) + \frac{n}{\lambda+n}\bar{y} \]

Example 15.7 Continuing Example 15.5. Continuing the previous example, assume home runs per game at Citizens Bank Park follow a Poisson distribution with parameter \(\theta\). Assume for \(\theta\) a Gamma prior distribution with shape parameter \(\alpha = 4\) and rate parameter \(\lambda = 2\). Consider the 2020 data in which there were 97 home runs in 32 games.

  1. Describe in full detail how you could use naive simulation to approximate the posterior distribution of \(\theta\).




  2. Write brms code to approximate the posterior distribution. Run the code and compare to the previous results.




Example 15.8 Continuing Example 15.5.

  1. How could you use simulation to approximate the posterior predictive distribution of home runs in a game?




  2. Use the simulation from the previous part to find and interpret a 95% posterior prediction interval.




  3. Is a Poisson model a reasonable model for the data? How could you use posterior predictive simulation to simulate what a sample of 32 games might look like under this model. Simulate many such samples. Does the observed sample seem consistent with the model?




  4. Regarding the appropriateness of a Poisson model, we might be concerned that there are no games in the sample with 0 home runs. Use simulation to approximate the posterior predictive distribution of the number of games in a sample of 32 with 0 home runs. From this perspective, does the observed value of the statistic seem consistent with the Gamma-Poisson model?




15.1 Notes

15.1.1 Discrete prior - single game

# grid
theta = 0.5 + 0:4

# prior
prior = c(0.13, 0.45, 0.28, 0.11, 0.03)
prior = prior / sum(prior)

# data
y = 3 # single observed value

# likelihood, using Poisson
likelihood = dpois(y, theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

# display table
bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |> kable_styling()
theta prior likelihood product posterior
0.5 0.13 0.0126 0.0016 0.0112
1.5 0.45 0.1255 0.0565 0.3848
2.5 0.28 0.2138 0.0599 0.4078
3.5 0.11 0.2158 0.0237 0.1617
4.5 0.03 0.1687 0.0051 0.0345
Total 1.00 0.7364 0.1468 1.0000

15.1.2 Discrete prior - multiple games

# grid
theta = 0.5 + 0:4

# prior
prior = c(0.13, 0.45, 0.28, 0.11, 0.03)
prior = prior / sum(prior)

# data
n = 4 # sample size 
y = 7 # total sample count

# likelihood, using Poisson(n * theta) for sample size n
likelihood = dpois(y, n * theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

# display table
bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |> kable_styling()
theta prior likelihood product posterior
0.5 0.13 0.0034 0.0004 0.0050
1.5 0.45 0.1377 0.0620 0.6915
2.5 0.28 0.0901 0.0252 0.2815
3.5 0.11 0.0174 0.0019 0.0214
4.5 0.03 0.0019 0.0001 0.0006
Total 1.00 0.2504 0.0896 1.0000

15.1.3 Sample data

data = read_csv("citizens-bank-hr-2020.csv")

head(data)
# A tibble: 6 × 1
     hr
  <dbl>
1     1
2     1
3     1
4     1
5     1
6     1
data |>
  count(hr) |>
  adorn_totals("row") |>
  kbl(col.names = c("Number of HRs",
                    "Number of games")) |>
  kable_styling()
Number of HRs Number of games
1 8
2 8
3 5
4 4
5 3
6 2
7 1
8 1
Total 32
sum(data$hr)
[1] 97
ggplot(data,
       aes(x = hr)) +
  geom_bar(aes(y = after_stat(prop)),
           col = "black",
           fill = "black",
           width = 0.1) +
  scale_x_continuous(breaks = 0:(max(data$hr) + 1),
                     limits = c(0, max(data$hr) + 1)) +
  labs(x = "Number of HRs",
       y = "Proportion of games") +
  theme_bw()

15.1.4 Grid approximation

# grid
theta = seq(0, 8, 0.001)

# prior
prior = dgamma(theta, shape = 4, rate = 2)
prior = prior / sum(prior)

# data
n = length(data$hr) # sample size (32)
y = sum(data$hr) # total sample count (97)

# likelihood, using Poisson(n * theta) for sample size n
likelihood = dpois(y, n * theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

15.1.5 Mathematical derivation of posterior

The prior Gamma(4, 2) distribution has pdf \[ \pi(\theta) \propto \theta^{4-1}e^{-2\theta}, \qquad \theta > 0. \]

By Poisson aggregation, the total number of home runs in 32 games follows a Poisson(\(32\theta\)) distribution. The likelihood is the probability of observing a value of 97 (for the total number of home runs in 32 games) from a Poisson(\(32\theta\)) distribution.

\[ f(\bar{y} = 97/32|\theta) \propto e^{-32\theta}\theta^{97}, \qquad \theta > 0 \]

The likelihood is centered at the sample mean of 97/32 = 3.03.

The posterior satisfies \[\begin{align*} \pi(\theta|\bar{y} = 97/32) & \propto \left(e^{-32\theta}\theta^{97}\right)\left(\theta^{4-1}e^{-2\theta}\right), \qquad \theta > 0,\\ & \propto \theta^{(4 + 97) - 1}e^{-(2+32)\theta}, \qquad \theta > 0. \end{align*}\]

We recognize the above as the Gamma density with shape parameter \(\alpha=4+97\) and rate parameter \(\lambda = 2 + 32\).

# prior
alpha_prior = 4
lambda_prior = 2

# data
n = length(data$hr)
y = sum(data$hr)

# posterior
alpha_posterior = alpha_prior + y
lambda_posterior = lambda_prior + n


# scaled likelihood, depends on data: n, y
likelihood_scaled <- function(theta) {
  likelihood <- function(theta) {
    dpois(x = y, n * theta)
  }
  scaling_constant <- integrate(likelihood, lower = 0, upper = 10)[[1]]
  likelihood(theta) / scaling_constant
}

# Plot
ggplot(data.frame(x = c(0, 6)),
       aes(x = x)) +
  # prior
  stat_function(fun = dgamma,
                args = list(shape = alpha_prior,
                            rate = lambda_prior),
                lty = bayes_lty["prior"],
                linewidth = 1,
                aes(color = "prior", linetype = "prior")) +
  # (scaled) likelihood
  stat_function(fun = likelihood_scaled,
                lty = bayes_lty["likelihood"],
                linewidth = 1,
                aes(color = "likelihood", linetype = "likeihood")) +
  # posterior
  stat_function(fun = dgamma,
                args = list(shape = alpha_posterior,
                            rate = lambda_posterior),
                lty = bayes_lty["posterior"],
                linewidth = 1,
                aes(color = "posterior", linetype = "posterior")) +
  # Define color and add a legend
  scale_color_manual(name = "",
                     breaks = c("prior", "likelihood", "posterior"),
                     values = bayes_col[c("prior", "likelihood", "posterior")]) +
  scale_linetype_manual(name = "",
                        breaks = c("prior", "likelihood", "posterior"),
                        values = bayes_lty[c("prior", "likelihood", "posterior")]) +
  labs(x = "theta",
       y = "") +
  theme_bw()
Warning: No shared levels found between `names(values)` of the manual scale and the
data's linetype values.

# posterior mean
alpha_posterior / lambda_posterior
[1] 2.970588
# posterior sd
sqrt(alpha_posterior / lambda_posterior ^ 2)
[1] 0.2955846
# posterior quantiles are endpoints of credible intervals
qgamma(c(0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99),
       shape = alpha_posterior, rate = lambda_posterior)
[1] 2.326468 2.598636 2.766239 2.960790 3.164259 3.355133 3.701137

15.1.6 Posterior simulation with brms

library(brms)
library(bayesplot)
library(tidybayes)

Next we fit the model to the data using the function brm. The three inputs are:

  • Input 1: the data
    data = data

  • Input 2: model for the data (likelihood)
    family = poisson(link = identity)
    y ~ 1 # linear model with intercept only

  • Input 3: prior distribution for parameters
    prior(gamma(4, 2), lb = 0, class = Intercept)

  • Other arguments specify the details of the numerical algorithm; our specifications below will result in 10000 \(\theta\) values simulated from the posterior distribution. (It will simulate 3500 values (iter), discard the first 1000 (warmup), and then repeat 4 times (chains).)

fit <-
  brm(data = data, 
      family = poisson(link = identity),
      formula = hr ~ 1,
      prior(gamma(4, 2), lb = 0, class = Intercept), # gamma(shape, rate)
      iter = 3500,
      warmup = 1000,
      chains = 4,
      refresh = 0)
Compiling Stan program...
Start sampling
plot(fit)

posterior = fit |>
  spread_draws(b_Intercept) |>
  rename(theta = b_Intercept)
posterior |>
  ggplot(aes(x = theta)) +
  geom_histogram(aes(y = after_stat(density)),
                 col = bayes_col["posterior"], fill = "white") +
  geom_density(linewidth = 1, col = bayes_col["posterior"]) +
  theme_bw()

# posterior mean
mean(posterior$theta)
[1] 2.976134
# posterior sd
sd(posterior$theta)
[1] 0.295366
# posterior quantiles are endpoints of credible intervals
quantile(posterior$theta, c(0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99))
      1%      10%      25%      50%      75%      90%      99% 
2.324613 2.608651 2.771067 2.967198 3.173287 3.364563 3.696229 

15.1.7 Posterior prediction

  1. brms has already simulated \(\theta\) from the posterior distribution.
  2. For each value of \(\theta\), simulate a \(y\) value from a Poisson(\(\theta\)) distribution.
  3. Repeat many times and summarize the simulated \(y\) values.
n_rep = length(posterior$theta)

y_sim = rpois(n_rep, posterior$theta)

ggplot(data.frame(y_sim),
       aes(x = y_sim)) +
  geom_bar(aes(y = after_stat(prop)),
           col = bayes_col["posterior_predict"],
           fill = bayes_col["posterior_predict"],
           width = 0.1) + 
  labs(x = "Predicted HRs") +
  theme_bw()

Posterior predictive interval, with a lower bound of 0.

quantile(y_sim, 0.95)
95% 
  6 

15.1.8 Posterior predictive checking: samples

Here is a plot of some simulated samples based on the posterior model, compared with the sample data.

pp_check(fit, type = "hist")
Using 10 posterior draws for ppc type 'hist' by default.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here is a bar plot of the observed data, with 95% intervals

pp_check(fit, type = "bars", prob = 0.95)
Using 10 posterior draws for ppc type 'bars' by default.

For count data, a rootogram can be helpful.

pp_check(fit, type = "rootogram")
Using all posterior draws for ppc type 'rootogram' by default.

15.1.9 Posterior predictive checking: count 0s

Posterior predictive distribution of number of games with 0 HRs in samples of size 32

  1. Simulate \(\theta\) from posterior distribution (brms already did this)
  2. Given \(\theta\) simulate a sample of size 32 from a Poisson(\(\theta\)) distribution
  3. Count the number of games (out of 32) in the sample with 0 HRs, call this \(T\)
  4. Repeat many times and summarize the values of \(T\), then compare with the observed \(T\) (which is 0).

Below we simulate 312 samples of size 32.

# Group the 10000 simulated values of y into 312 samples of size 32
y_sim_samples = matrix(y_sim,
                       ncol = n,
                       nrow = trunc(length(y_sim) / n))
Warning in matrix(y_sim, ncol = n, nrow = trunc(length(y_sim)/n)): data length
[10000] is not a sub-multiple or multiple of the number of rows [312]
# define function that counts the 0s in a sample
count0 <- function(u) sum(u == 0)

# posterior predictive check
ppc_stat(data$hr, y_sim_samples, stat = "count0") 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


  1. Source: https://www.baseball-reference.com/teams/PHI/2020.shtml↩︎