15 Bayesian Analysis of Poisson Count Data

We have covered in some detail the problem of estimating a population proportion for a binary categorical variable. In these situations we assumed a Binomial likelihood for the count of “successes” in the sample.
However, a Binomial model has several restrictive assumptions that might not be satisfied in practice.
Poisson models are more flexible models for count data.

Example 15.1 Let $y$ be the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game.

In what ways is this like the Binomial situation? (What is a trial? What is “success”?)
In what ways is this NOT like the Binomial situation?

Poisson distributions are often used to model the distribution of variables that count the number of “relatively rare” events that occur over a certain interval of time or in a certain location
A discrete random variable $Y$ has a Poisson distribution with parameter $\theta>0$ if its probability mass function satisfies \[\begin{align*} f(y|\theta) & = \frac{e^{-\theta}\theta^y}{y!}, \quad y=0,1,2,\ldots \end{align*}\]
If $Y$ has a Poisson($\theta$) distribution then \[\begin{align*} \text{E}(Y) & = \theta\\ \text{Var}(Y) & = \theta \end{align*}\]
Poisson aggregation. If $Y_1$ and $Y_2$ are independent, $Y_1$ has a Poisson($\theta_1$) distribution, and $Y_2$ has a Poisson($\theta_2$) distribution, then $Y_1+Y_2$ has a Poisson($\theta_1+\theta_2$) distribution.

Example 15.2 Suppose the number of home runs hit per game (by both teams in total) at a particular Major League Baseball park follows a Poisson distribution with parameter $\theta$.

Sketch your prior distribution for $\theta$ and describe its features. What are the possible values of $\theta$? Does $\theta$ take values on a discrete or continuous scale?
Suppose $y$ represents a home run count for a single game. What are the possible values of $y$? Does $y$ take values on a discrete or continuous scale?

Gamma distributions are can be used as prior distributions for parameters that take positive values, $\theta>0$.
A continuous RV $U$ has a Gamma distribution with shape parameter $\alpha>0$ and rate parameter $\lambda>0$ if its density satisfies \[ f(u) \propto u^{\alpha-1}e^{-\lambda u}, \quad u>0, \]
It can be shown that a Gamma($\alpha$, $\lambda$) density has \[\begin{align*} \text{Mean (EV)} & = \frac{\alpha}{\lambda}\\ \text{Variance} & = \frac{\alpha}{\lambda^2} \\ \text{Mode} & = \frac{\alpha -1}{\lambda}, \qquad \text{if $\alpha\ge 1$} \end{align*}\]

(a) Rate parameter 1, and different shape parameters.

Example 15.3 Figure 15.1 shows a few examples of Gamma distributions.

The plot on the left above contains a few different Gamma densities, all with rate parameter $\lambda=1$. Match each density to its shape parameter $\alpha$; the choices are 1, 2, 5, 10.
The plot on the right above contains a few different Gamma densities, all with shape parameter $\alpha=3$. Match each density to its rate parameter $\lambda$; the choices are 1, 2, 3, 4.

Example 15.4 Suppose the number of home runs hit per game (by both teams in total) at a particular Major League Baseball park follows a Poisson distribution with parameter $\theta$. We’ll first use grid approximation to approximate the posterior distribution of $\theta$ given sample data. We’ll start with a discrete prior for $\theta$ to illustrate ideas. Suppose that $\theta$ can only take values 0.5, 1.5, 2.5, 3.5, 4.5, with respective probabilities 0.13, 0.45, 0.28, 0.11, 0.03.

Suppose a single game with 3 home runs is observed. Find the posterior distribution of $\theta$. In particular, how do you determine the likelihood column?
Suppose that we observe 4 games with a total of 6 home runs. Find the posterior distribution of $\theta$. In particular, how do you determine the likelihood column?

Example 15.5 Now let’s consider some real data. Assume home runs per game at Citizens Bank Park (Phillies!) follow a Poisson distribution with parameter $\theta$. Assume as a prior for $\theta$ a Gamma distribution with shape parameter $\alpha = 4$ and rate parameter $\lambda = 2$.

Section 15.1.3 summarizes data for the 2020 season¹. There were 97 home runs in 32 games.

Use grid approximation to compute the posterior distribution of $\theta$. Plot the prior, (scaled) likelihood, and posterior. (Note: you will need to cut the grid off at some point. While $\theta$ can take any value greater than 0, the interval [0, 8] accounts for 99.99% of the prior probability.)

Example 15.6 Continuing Example 15.5.

Write an expression for the prior distribution of $\theta$.
Write an expression for the likelihood function given the sample data (97 home runs in 32 games).
Write an expression for the posterior distribution of $\theta$ given the sample data (97 home runs in 32 games). Identify by the name the posterior distribution and the values of relevant parameters. Compare to the results of the grid approximation.
Find and interpret posterior 50%, 80%, and 98% credible intervals for $\theta$.
Express the posterior mean as a weighted average of the prior mean and the sample mean. How might you interpret the $\alpha$ and $\beta$ parameters in the Gamma prior distribution?

Gamma-Poisson model. Consider a measured variable $y$ which, given $\theta$, follows a Poisson$(\theta)$ distribution. Let $\bar{y}$ be the sample mean for a random sample of size $n$. Suppose $\theta$ has a Gamma$(\alpha, \lambda)$ prior distribution. Then the posterior distribution of $\theta$ given $\bar{y}$ is the Gamma$(\alpha+n\bar{y}, \lambda+n)$ distribution.
In a sense, you can interpret $\alpha$ as “prior total count” and $\lambda$ as “prior sample size”, but these are only “pseudo-observations”. Also, $\alpha$ and $\lambda$ are not necessarily integers.
Note that if $\bar{y}$ is the sample mean count is then $n\bar{y} = \sum_{i=1}^n y_i$ is the sample total count.

	Prior	Data	Posterior
Total count	$\alpha$	$n\bar{y}$	$\alpha+n\bar{y}$
Sample size	$\lambda$	$n$	$\lambda + n$
Mean	$\frac{\alpha}{\lambda}$	$\bar{y}$	$\frac{\alpha+n\bar{y}}{\lambda+n}$

The posterior total count is the sum of the “prior total count” $\alpha$ and the sample total count $n\bar{y}$.
The posterior sample size is the sum of the “prior sample size” $\lambda$ and the observed sample size $n$.
The posterior mean is a weighted average of the prior mean and the sample mean, with weights proportional to the “sample sizes”.

\[ \frac{\alpha+n\bar{y}}{\lambda+n} = \frac{\lambda}{\lambda+n}\left(\frac{\alpha}{\lambda}\right) + \frac{n}{\lambda+n}\bar{y} \]

Example 15.7 Continuing Example 15.5. Continuing the previous example, assume home runs per game at Citizens Bank Park follow a Poisson distribution with parameter $\theta$. Assume for $\theta$ a Gamma prior distribution with shape parameter $\alpha = 4$ and rate parameter $\lambda = 2$. Consider the 2020 data in which there were 97 home runs in 32 games.

Describe in full detail how you could use naive simulation to approximate the posterior distribution of $\theta$.
Write brms code to approximate the posterior distribution. Run the code and compare to the previous results.

Example 15.8 Continuing Example 15.5.

How could you use simulation to approximate the posterior predictive distribution of home runs in a game?
Use the simulation from the previous part to find and interpret a 95% posterior prediction interval.
Is a Poisson model a reasonable model for the data? How could you use posterior predictive simulation to simulate what a sample of 32 games might look like under this model. Simulate many such samples. Does the observed sample seem consistent with the model?
Regarding the appropriateness of a Poisson model, we might be concerned that there are no games in the sample with 0 home runs. Use simulation to approximate the posterior predictive distribution of the number of games in a sample of 32 with 0 home runs. From this perspective, does the observed value of the statistic seem consistent with the Gamma-Poisson model?

15.1 Notes

15.1.1 Discrete prior - single game

# grid
theta = 0.5 + 0:4

# prior
prior = c(0.13, 0.45, 0.28, 0.11, 0.03)
prior = prior / sum(prior)

# data
y = 3 # single observed value

# likelihood, using Poisson
likelihood = dpois(y, theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

# display table
bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |> kable_styling()

theta	prior	likelihood	product	posterior
0.5	0.13	0.0126	0.0016	0.0112
1.5	0.45	0.1255	0.0565	0.3848
2.5	0.28	0.2138	0.0599	0.4078
3.5	0.11	0.2158	0.0237	0.1617
4.5	0.03	0.1687	0.0051	0.0345
Total	1.00	0.7364	0.1468	1.0000

15.1.2 Discrete prior - multiple games

# grid
theta = 0.5 + 0:4

# prior
prior = c(0.13, 0.45, 0.28, 0.11, 0.03)
prior = prior / sum(prior)

# data
n = 4 # sample size 
y = 7 # total sample count

# likelihood, using Poisson(n * theta) for sample size n
likelihood = dpois(y, n * theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

# display table
bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |> kable_styling()

theta	prior	likelihood	product	posterior
0.5	0.13	0.0034	0.0004	0.0050
1.5	0.45	0.1377	0.0620	0.6915
2.5	0.28	0.0901	0.0252	0.2815
3.5	0.11	0.0174	0.0019	0.0214
4.5	0.03	0.0019	0.0001	0.0006
Total	1.00	0.2504	0.0896	1.0000

15.1.3 Sample data

data = read_csv("citizens-bank-hr-2020.csv")

head(data)

# A tibble: 6 × 1
     hr
  <dbl>
1     1
2     1
3     1
4     1
5     1
6     1

data |>
  count(hr) |>
  adorn_totals("row") |>
  kbl(col.names = c("Number of HRs",
                    "Number of games")) |>
  kable_styling()

Number of HRs	Number of games
1	8
2	8
3	5
4	4
5	3
6	2
7	1
8	1
Total	32

sum(data$hr)

[1] 97

ggplot(data,
       aes(x = hr)) +
  geom_bar(aes(y = after_stat(prop)),
           col = "black",
           fill = "black",
           width = 0.1) +
  scale_x_continuous(breaks = 0:(max(data$hr) + 1),
                     limits = c(0, max(data$hr) + 1)) +
  labs(x = "Number of HRs",
       y = "Proportion of games") +
  theme_bw()

15.1.4 Grid approximation

# grid
theta = seq(0, 8, 0.001)

# prior
prior = dgamma(theta, shape = 4, rate = 2)
prior = prior / sum(prior)

# data
n = length(data$hr) # sample size (32)
y = sum(data$hr) # total sample count (97)

# likelihood, using Poisson(n * theta) for sample size n
likelihood = dpois(y, n * theta) # function of theta

# posterior
product = likelihood * prior
posterior = product / sum(product)

# bayes table
bayes_table = data.frame(theta,
                         prior,
                         likelihood,
                         product,
                         posterior)

15.1.5 Mathematical derivation of posterior

The prior Gamma(4, 2) distribution has pdf \[ \pi(\theta) \propto \theta^{4-1}e^{-2\theta}, \qquad \theta > 0. \]

By Poisson aggregation, the total number of home runs in 32 games follows a Poisson($32\theta$) distribution. The likelihood is the probability of observing a value of 97 (for the total number of home runs in 32 games) from a Poisson($32\theta$) distribution.

\[ f(\bar{y} = 97/32|\theta) \propto e^{-32\theta}\theta^{97}, \qquad \theta > 0 \]

The likelihood is centered at the sample mean of 97/32 = 3.03.

The posterior satisfies \[\begin{align*} \pi(\theta|\bar{y} = 97/32) & \propto \left(e^{-32\theta}\theta^{97}\right)\left(\theta^{4-1}e^{-2\theta}\right), \qquad \theta > 0,\\ & \propto \theta^{(4 + 97) - 1}e^{-(2+32)\theta}, \qquad \theta > 0. \end{align*}\]

We recognize the above as the Gamma density with shape parameter $\alpha=4+97$ and rate parameter $\lambda = 2 + 32$.

# prior
alpha_prior = 4
lambda_prior = 2

# data
n = length(data$hr)
y = sum(data$hr)

# posterior
alpha_posterior = alpha_prior + y
lambda_posterior = lambda_prior + n


# scaled likelihood, depends on data: n, y
likelihood_scaled <- function(theta) {
  likelihood <- function(theta) {
    dpois(x = y, n * theta)
  }
  scaling_constant <- integrate(likelihood, lower = 0, upper = 10)[[1]]
  likelihood(theta) / scaling_constant
}

# Plot
ggplot(data.frame(x = c(0, 6)),
       aes(x = x)) +
  # prior
  stat_function(fun = dgamma,
                args = list(shape = alpha_prior,
                            rate = lambda_prior),
                lty = bayes_lty["prior"],
                linewidth = 1,
                aes(color = "prior", linetype = "prior")) +
  # (scaled) likelihood
  stat_function(fun = likelihood_scaled,
                lty = bayes_lty["likelihood"],
                linewidth = 1,
                aes(color = "likelihood", linetype = "likeihood")) +
  # posterior
  stat_function(fun = dgamma,
                args = list(shape = alpha_posterior,
                            rate = lambda_posterior),
                lty = bayes_lty["posterior"],
                linewidth = 1,
                aes(color = "posterior", linetype = "posterior")) +
  # Define color and add a legend
  scale_color_manual(name = "",
                     breaks = c("prior", "likelihood", "posterior"),
                     values = bayes_col[c("prior", "likelihood", "posterior")]) +
  scale_linetype_manual(name = "",
                        breaks = c("prior", "likelihood", "posterior"),
                        values = bayes_lty[c("prior", "likelihood", "posterior")]) +
  labs(x = "theta",
       y = "") +
  theme_bw()

Warning: No shared levels found between `names(values)` of the manual scale and the
data's linetype values.

# posterior mean
alpha_posterior / lambda_posterior

[1] 2.970588

# posterior sd
sqrt(alpha_posterior / lambda_posterior ^ 2)

[1] 0.2955846

# posterior quantiles are endpoints of credible intervals
qgamma(c(0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99),
       shape = alpha_posterior, rate = lambda_posterior)

[1] 2.326468 2.598636 2.766239 2.960790 3.164259 3.355133 3.701137

15.1.6 Posterior simulation with `brms`

library(brms)
library(bayesplot)
library(tidybayes)

Next we fit the model to the data using the function brm. The three inputs are:

Input 1: the data
data = data
Input 2: model for the data (likelihood)
family = poisson(link = identity)
y ~ 1 # linear model with intercept only
Input 3: prior distribution for parameters
prior(gamma(4, 2), lb = 0, class = Intercept)
Other arguments specify the details of the numerical algorithm; our specifications below will result in 10000 $\theta$ values simulated from the posterior distribution. (It will simulate 3500 values (iter), discard the first 1000 (warmup), and then repeat 4 times (chains).)

fit <-
  brm(data = data, 
      family = poisson(link = identity),
      formula = hr ~ 1,
      prior(gamma(4, 2), lb = 0, class = Intercept), # gamma(shape, rate)
      iter = 3500,
      warmup = 1000,
      chains = 4,
      refresh = 0)

Compiling Stan program...

Start sampling

plot(fit)

posterior = fit |>
  spread_draws(b_Intercept) |>
  rename(theta = b_Intercept)

posterior |>
  ggplot(aes(x = theta)) +
  geom_histogram(aes(y = after_stat(density)),
                 col = bayes_col["posterior"], fill = "white") +
  geom_density(linewidth = 1, col = bayes_col["posterior"]) +
  theme_bw()

# posterior mean
mean(posterior$theta)

[1] 2.976134

# posterior sd
sd(posterior$theta)

[1] 0.295366

# posterior quantiles are endpoints of credible intervals
quantile(posterior$theta, c(0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99))

      1%      10%      25%      50%      75%      90%      99% 
2.324613 2.608651 2.771067 2.967198 3.173287 3.364563 3.696229

15.1.7 Posterior prediction

brms has already simulated $\theta$ from the posterior distribution.
For each value of $\theta$, simulate a $y$ value from a Poisson($\theta$) distribution.
Repeat many times and summarize the simulated $y$ values.

n_rep = length(posterior$theta)

y_sim = rpois(n_rep, posterior$theta)

ggplot(data.frame(y_sim),
       aes(x = y_sim)) +
  geom_bar(aes(y = after_stat(prop)),
           col = bayes_col["posterior_predict"],
           fill = bayes_col["posterior_predict"],
           width = 0.1) + 
  labs(x = "Predicted HRs") +
  theme_bw()

Posterior predictive interval, with a lower bound of 0.

quantile(y_sim, 0.95)

95% 
  6

15.1.8 Posterior predictive checking: samples

Here is a plot of some simulated samples based on the posterior model, compared with the sample data.

pp_check(fit, type = "hist")

Using 10 posterior draws for ppc type 'hist' by default.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here is a bar plot of the observed data, with 95% intervals

pp_check(fit, type = "bars", prob = 0.95)

Using 10 posterior draws for ppc type 'bars' by default.

For count data, a rootogram can be helpful.

pp_check(fit, type = "rootogram")

Using all posterior draws for ppc type 'rootogram' by default.

15.1.9 Posterior predictive checking: count 0s

Posterior predictive distribution of number of games with 0 HRs in samples of size 32

Simulate $\theta$ from posterior distribution (brms already did this)
Given $\theta$ simulate a sample of size 32 from a Poisson($\theta$) distribution
Count the number of games (out of 32) in the sample with 0 HRs, call this $T$
Repeat many times and summarize the values of $T$, then compare with the observed $T$ (which is 0).

Below we simulate 312 samples of size 32.

# Group the 10000 simulated values of y into 312 samples of size 32
y_sim_samples = matrix(y_sim,
                       ncol = n,
                       nrow = trunc(length(y_sim) / n))

Warning in matrix(y_sim, ncol = n, nrow = trunc(length(y_sim)/n)): data length
[10000] is not a sub-multiple or multiple of the number of rows [312]

# define function that counts the 0s in a sample
count0 <- function(u) sum(u == 0)

# posterior predictive check
ppc_stat(data$hr, y_sim_samples, stat = "count0")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Source: https://www.baseball-reference.com/teams/PHI/2020.shtml↩︎

	Prior	Data	Posterior
Total count	\(\alpha\)	\(n\bar{y}\)	\(\alpha+n\bar{y}\)
Sample size	\(\lambda\)	\(n\)	\(\lambda + n\)
Mean	\(\frac{\alpha}{\lambda}\)	\(\bar{y}\)	\(\frac{\alpha+n\bar{y}}{\lambda+n}\)