6 September 3

6.1 Announcements

  • Questions about assignment 2?
  • Presentations on Tuesday
    • What to expect
  • Reading assignment
  • Please talk to me or send a proposal for applied problems we can work on in class
    • Phase II and problem based learning!

6.2 Bayesian hierarchical models

6.2.1 Model fitting

##   Winter  N
## 1   1950 31
## 2   1951 25
## 3   1952 21
## 4   1953 24
## 5   1954 21
## 6   1955 28
# Required functions
calc.lambda <- function(t, t0, gamma, lambda0) {
    lambda0 * exp(gamma * (t - t0))
}
rdunif <- function(n, min, max) {
    sample(min:max, size = n, replace = TRUE)
}

# Algorithm settings
K.tries <- 10^6  # Number of simulated data sets to make
diff <- matrix(, K.tries, 1)  # Vector to save the measure of discrepancy between simulated data and real data
error <- 1000  # Allowable difference between simulated data and real data

# Known random variables and parameters
t0 <- 1949
t <- df1$Winter
z <- df1$N
n <- length(z)

# Unkown random variables and parameters
posterior.samp.parameters <- matrix(, K.tries, 3)  # Matrix to samples of save unknown parameters
colnames(posterior.samp.parameters) <- c("gamma", "lambda0", "p")

y <- matrix(, K.tries, n)  # Matrix to samples of save unknown number of whooping cranes

for (k in 1:K.tries) {
    # Simulate from the prior predictive distribution
    gamma.try <- runif(1, 0, 0.1)  # Parameter model
    lambda0.try <- rdunif(1, 2, 50)  # Parameter model 
    lambda.try <- calc.lambda(t = t, t0 = t0, gamma = gamma.try, lambda0 = lambda0.try)  # Mathematical equation for process model
    y.try <- rpois(n, lambda.try)  # Process model
    p.try <- runif(1, 0, 1)  # Prior or data model
    z.try <- rbinom(n, y.try, p.try)  # Data model
    
    # Record difference between draw of z from prior predictive distribution and
    # observed data
    diff[k, ] <- sum(abs(z - z.try))
    
    # Save unkown random variables and parameters
    y[k, ] <- y.try
    posterior.samp.parameters[k, ] <- c(gamma.try, lambda0.try, p.try)
}

# Calculate acceptance rate
length(which(diff < error))/K.tries
## [1] 0.015888

  • Recall the goals of the whooping crane study
    • Predictions and forecasts of the true population size
    • Statistical inference on the date when the population will be larger than 1000 individuals
      • The posterior predictive distribution
      • Derived quantity (function of the posterior distribution)
      • Live demonstration in R for the whooping crane data example

  • Markov chain Monte Carlo
    • One of the most common algorithms used for fitting statistical models in this class
    • What to expect in this class
    • Algorithm 4.2 (see pg. 169 in Wikle et al. 2019)
    • Next time in class
  • Other popular approaches for fitting spatio-temporal models
    • Computing Bayes: Bayesian Computation from 1763 to the 21st Century (see [Martin et al. 2020] [https://arxiv.org/pdf/2004.06425.pdf])
    • Maximum likelihood (non-Bayesian)
      • Most spatio-temporal models have to many parameters
      • Parameters have discrete and continuous support can be difficult to estimate
      • Nuisance parameter are difficult to deal with
    • Integrated nested Laplace approximation
      • Limited to models that use a Gaussian process as the “process model”
      • Rue et al. (2009)
    • Hamiltonian Monte Carlo
      • Just a different flavor of MCMC
      • Stan
    • Particle filtering or sequential Monte Carlo
    • Statistical emulators
    • Approaches that (usually) will not work
      • least squares (or similar loss function based estimation approaches)
      • machine learning

6.3 The path forward!

  • What we have covered so far
    • Review of matrix algebra and distribution theory
    • Philosophy of statistical modeling
    • Hierarchical modeling framework
      • Technical note 1.1 on pg. 13 of Wikle et al. (2019)
    • Building our first statistical model!
      • Whooping crane data example
      • The model building process:
        • 1). Choose appropriate PDFs or PMFs for the data, process, and parameter models
        • 2). Choose appropriate mathematical models for the “parameters” or moments of the PDFs/PMFs in one.
        • 3). Choose a feasible algorithm to obtain samples from the posterior distribution
        • 4). Make statistical inference (e.g., calculate derived quantities and summarize the posterior distribution)
      • What are most important skills you need for the model building process
  • What is next
    • Intro to spatial statistics
      • Motivated by data from assignment 2
      • This will rely heavily on chs 3 and 4 and lightly on ch 2 of Wikle et al. (2019)
    • First spatio-temporal example
      • Feel free to suggest ideas/data sets