14 Day 14 (March 9)
14.1 Announcements
Class projects
- Proposal (submitted Feb. 17; All proposals have been retured)
- Presentations will occur between May 2 and May 9
- Peer review is due May 5
- Report and reproducible analysis is due May 10
Assignment 7 is posted and due Wednesday (March 22)
- Short in-class demonstration
Read Ch. 17 on spatial models
Question about the Bayesian view of missing data
14.2 Bayesian Prediction
Review how to do prediction using linear model
- Difference between prediction intervals and confidence intervals?
- Do we need an example using R?
Derived derived quantities
- Functions of one or more posterior distribution
Predictive/forecast distributions vs. point estimate
- Popular measures of predictive accuracy
Posterior prediction vs. prior prediction
- Remember the retirement example? (see here)
What do we mean and what do we want to show when we make predictions?
Posterior prediction
- Use data/process model to determine the full-conditional distribution of the unobserved data given the parameters and observed data
- Use composition sampling to perform integration in Eq. 12.2 to BBM2L
# Preliminary steps library(mvnfast) # Required package to sample from multivariate normal distribution <- "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_annmean_mlo.txt" url <- read.table(url,header=FALSE) df colnames(df) <- c("year","meanCO2","unc") <- df$meanCO2 # Response variable y <- model.matrix(~year, data=df) # Design matrix X <- length(y) # Number of observations n <- 2 # Dimensions of beta p <- 5000 #Number of MCMC samples to draw K <- as.data.frame(matrix(,K,3)) # Samples of the parameters we will save samples colnames(samples) <- c(colnames(X),"sigma2") <- 1 # Starting values for sigma2 sigma2.e <- 10^6 #Prior variance for beta sigma2.beta <- 2.01 #Inverse gamma prior with E() = 1/(r*(q-1)) and Var()= 1/(r^2*(q-1)^2*(q-2)) q <- 1 r # MCMC algorithm for(k in 1:K){ # Sample beta from full-conditional distribution <- solve(t(X)%*%X + diag(1/sigma2.beta,p)) H <- t(X)%*%y h <- t(rmvn(1,H%*%h,sigma2.e*H)) beta # Sample sigma from full-conditional distribution <- q+n/2 q.temp <- (1/r+t(y-X%*%beta)%*%(y-X%*%beta)/2)^-1 r.temp <- 1/rgamma(1,q.temp,,r.temp) sigma2.e # Save samples of beta and sigma <- c(beta,sigma2.e) samples[k,] } <- 1000 burn.in # Use composition sampling to take samples from # the posterior predictive distribution <- 2050 year <- dim(samples)[1] K <- samples[,1] beta0.y <- samples[,2] beta1.y <-samples[,3] sigma2.y <- rnorm(K,beta0.y + beta1.y*year, sqrt(sigma2.y)) ppd.samples # Histogram of samples from the posterior predictive distribution hist(ppd.samples[-c(1:burn.in)],,xlab=expression(tilde(y)*"|"*bold(y)),ylab=expression("["*tilde(y)*"|"*bold(y)*"]"),freq=FALSE,col="grey",main="",breaks=30)
# Expected value of the posterior predictive distribution mean(ppd.samples[-c(1:burn.in)])
## [1] 455.0655
# 95 % equal-tailed credible interval quantile(ppd.samples[-c(1:burn.in)],prob=c(0.025,0.975))
## 2.5% 97.5% ## 446.2166 464.1382
# Compare to results obtained from lm function <- lm(meanCO2~year,data=df) m1 predict(m1,newdata=data.frame(year=2050),interval="prediction",level=0.95)
## fit lwr upr ## 1 455.1518 446.2043 464.0992