5 September 1
5.1 Announcements
- Office hours 9:30 - 10:30 today or by appointment
- Lots of good questions are being asked and answered!
- Poll
- Questions about assignment 2?
5.2 Bayesian hierarchical models
- Review of the Bayesian hierarchical modeling framework
\[\text{Data model:} \;\;[\mathbf{z}|\mathbf{y},\boldsymbol{\theta}_{D}]\] \[\text{Process model:} \;\;[\mathbf{y}|\boldsymbol{\theta}_{P}]\] \[\text{Parameter model:} \;\;[\boldsymbol{\theta}]\]
5.2.1 Motivating data example
- Whooping cranes
Data set
Write out a Bayesian hierarchical model for the whooping crane data
5.2.2 Simulating data from the prior predictive distribution
- The prior predictive distribution is capable of providing predictions/forecasts without the use of any data
- Other fields call this “simulation modeling” or a “sensitivity analysis”
- It is basically data free statistics (i.e., prediction, forecasts, and inference is 100% assumption driven)
- Used as a form of model (assumption) checking in Bayesian statistics
- Super easy to do and helps us “prototype” our statistical model before we put in any more work
- Live demonstration in R for the whooping crane data example
- Warning about using the data twice
5.2.3 Model fitting
- Class discussion
- Markov chain Monte Carlo
- Algorithm 4.2 (see pg. 169 in Wikle et al. 2019)
- Next time in class
- Rejection sampling and Approximate Bayesian computation
- Live demonstration in R for the whooping crane data example
# Download and view data
url <- "https://www.dropbox.com/s/ihs3as87oaxvmhq/Butler%20et%20al.%20Table%201.csv?dl=1"
df1 <- read.csv(url)
head(df1)
## Winter N
## 1 1950 31
## 2 1951 25
## 3 1952 21
## 4 1953 24
## 5 1954 21
## 6 1955 28
# Required functions
calc.lambda <- function(t, t0, gamma, lambda0) {
lambda0 * exp(gamma * (t - t0))
}
rdunif <- function(n, min, max) {
sample(min:max, size = n, replace = TRUE)
}
# Algorithm settings
K.tries <- 10^6 # Number of simulated data sets to make
diff <- matrix(, K.tries, 1) # Vector to save the measure of discrepancy between simulated data and real data
error <- 1000 # Allowable difference between simulated data and real data
# Known random variables and parameters
t0 <- 1949
t <- df1$Winter
z <- df1$N
n <- length(z)
# Unkown random variables and parameters
posterior.samp.parameters <- matrix(, K.tries, 3) # Matrix to samples of save unknown parameters
colnames(posterior.samp.parameters) <- c("gamma", "lambda0", "p")
y <- matrix(, K.tries, n) # Matrix to samples of save unknown number of whooping cranes
for (k in 1:K.tries) {
# Simulate from the prior predictive distribution
gamma.try <- runif(1, 0, 0.1) # Parameter model
lambda0.try <- rdunif(1, 2, 50) # Parameter model
lambda.try <- calc.lambda(t = t, t0 = t0, gamma = gamma.try, lambda0 = lambda0.try) # Mathematical equation for process model
y.try <- rpois(n, lambda.try) # Process model
p.try <- runif(1, 0, 1) # Prior or data model
z.try <- rbinom(n, y.try, p.try) # Data model
# Record difference between draw of z from prior predictive distribution and
# observed data
diff[k, ] <- sum(abs(z - z.try))
# Save unkown random variables and parameters
y[k, ] <- y.try
posterior.samp.parameters[k, ] <- c(gamma.try, lambda0.try, p.try)
}
# Calculate acceptance rate
length(which(diff < error))/K.tries
## [1] 0.015703
# Plot approximate posterior distribution of parameters
library(latex2exp)
hist(posterior.samp.parameters[which(diff < error), 1], col = "grey", freq = FALSE,
xlim = c(0, 0.1), main = "", xlab = TeX("$\\gamma | \\mathbf{z}$"), ylab = TeX("$\\lbrack\\gamma | \\mathbf{z}\\rbrack$"))
curve(dunif(x, 0, 0.1), col = "deepskyblue", lwd = 3, add = TRUE)
hist(posterior.samp.parameters[which(diff < error), 2], col = "grey", freq = FALSE,
xlim = c(2, 50), main = "", breaks = seq(2, 50, by = 1), xlab = TeX("$\\lambda_0 | \\mathbf{z}$"),
ylab = TeX("$\\lbrack\\lambda_0 | \\mathbf{z}\\rbrack$"))
curve(dunif(x, 2, 50), col = "deepskyblue", lwd = 3, add = TRUE)
hist(posterior.samp.parameters[which(diff < error), 3], col = "grey", freq = FALSE,
xlim = c(0, 1), main = "", xlab = TeX("$\\p | \\mathbf{z}$"), ylab = TeX("$\\lbrack\\p | \\mathbf{z}\\rbrack$"))
curve(dunif(x, 0, 1), col = "deepskyblue", lwd = 3, add = TRUE)
# Plot data (z) and approximate posterior distribution of y
plot(df1$Winter, df1$N, xlab = "Year", ylab = "True population size (y)", xlim = c(1940,
2010), ylim = c(0, 1000), typ = "b", cex = 0.8, pch = 20, col = rgb(0.7,
0.7, 0.7, 0.9))
e.y <- colMeans(y[which(diff < error), ])
points(df1$Winter, e.y, typ = "l", lwd = 2)
lwr.CI <- apply(y[which(diff < error), ], 2, FUN = quantile, prob = c(0.025))
upper.CI <- apply(y[which(diff < error), ], 2, FUN = quantile, prob = c(0.975))
polygon(c(df1$Winter, rev(df1$Winter)), c(lwr.CI, rev(upper.CI)), col = rgb(0.5,
0.5, 0.5, 0.3), border = NA)
legend(x = 1940, y = 1000, cex = 1.3, legend = c(expression("E(" * y[pred] *
"|" * z * ")"), "95% CI"), bty = "n", lty = 1, lwd = 2, col = c("black",
rgb(0.5, 0.5, 0.5, 0.5)))
5.3 Summary and comments
- Bayesian hierarchical modeling framework is incredibly flexible
- Motivated by a data set or practical problem, you can build your own “custom” statistical models
- You can always “turn the Bayesian crank” for whatever model you develop (with some warnings of course!)
- Practice the process of model specification (what we did today in class) at home on a different data set