11 Day 11 (February 27)

11.1 Announcements

  • I have to leave at 9:15 today!
    • I will have office hours today in Dickens Hall room 109 from 2-3pm.

11.2 Extreme precipitation in Kansas

  • On September 3, 2018 there was an extreme precipitation event that resulted in flooding in Manhattan, KS and the surrounding areas. If you would like to know more about this, check out this link and this video.

  • What we may need to learn

    • How to use R as a geographic information system
    • New general tools from statistics
      • Gaussian process
    • How to use the hierarchical modeling framework to describe Kriging
      • Hierarchical Bayesian model vs. “empirical” hierarchical model
    • Specialized language used in spatial statistics (e.g., range, nugget, variogram)

11.3 Gaussian process

  • See bottom of pg. 139 in Wikle et al. (2019)
  • A Gaussian process is a probability distribution over functions
  • If the function is observed at a finite number of points or “locations,” then the vector of values follows a multivariate normal distribution.

11.3.1 Multivariate normal distribution

  • Multivariate normal distribution

  • \(\boldsymbol{\eta}\sim\text{N}(\mathbf{0},\sigma^{2}\mathbf{R})\)

    • Definitions
  • Correlation matrix – A positive semi-definite matrix whose elements are the correlation between observations

  • Correlation function – A function that describes the correlation between observations

  • Example correlation matrices

  • Compound symmetry \[\mathbf{R}=\left[\begin{array}{cccccc} 1 & 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 0 & 1 & 1 \end{array}\right]\]

    • AR(1) \[\mathbf{R(\phi})=\left[\begin{array}{ccccc} 1 & \phi^{1} & \phi^{2} & \cdots & \phi^{n-1}\\ \phi^{1} & 1 & \phi^{1} & \cdots & \phi^{n-2}\\ \phi^{2} & \phi^{1} & 1 & \vdots & \phi^{n-3}\\ \vdots & \vdots & \vdots & \ddots & \ddots\\ \phi^{n-1} & \phi^{n-2} & \phi^{n-3} & \ddots & 1 \end{array}\right]\:\]

      • Example simulating from \(\boldsymbol{\eta}\sim\text{N}(\mathbf{0},\sigma^{2}\mathbf{R})\) in R
        n <- 200
        x <- 1:n
        I <- diag(1, n)
        sigma2 <- 1
      
        library(MASS)
        set.seed(2034)
        eta <- mvrnorm(1, rep(0, n), sigma2 * I)
        cor(eta[1:(n - 1)], eta[2:n])
      ## [1] -0.06408623
        acf(eta)

        par(mfrow = c(2, 1))
        par(mar = c(0, 2, 0, 0), oma = c(5.1, 3.1, 2.1, 2.1))
        plot(x, eta, typ = "l", xlab = "", xaxt = "n")
        abline(a = 0, b = 0)
      
        n <- 200
        x <- 1:n # Must be equally spaced
        phi <- 0.7
        R <- phi^abs(outer(x, x, "-"))
        sigma2 <- 1
        set.seed(1330)
        eta <- mvrnorm(1, rep(0, n), sigma2 * R)
        cor(eta[1:(n - 1)], eta[2:n])
      ## [1] 0.7016481
        plot(x, eta, typ = "l")
        abline(a = 0, b = 0)

        acf(eta)

    • Example correlation functions

    • Gaussian correlation function \[r_{ij}(\phi)=e^{-\frac{d_{ij}^{2}}{\phi}}\] where \(d_{ij}\) is the “distance” between locations i and j (note that \(d_{ij}=0\) for \(i=j\)) and \(r_{ij}(\phi)\) is the element in the \(i^{\textrm{th}}\) row and \(j^{\textrm{th}}\) column of \(\mathbf{R}(\phi)\).

        library(fields)
        n <- 200
        x <- 1:n
        phi <- 40
        d <- rdist(x)
        R <- exp(-d^2/phi)
        sigma2 <- 1
      
        set.seed(4673)
        eta <- mvrnorm(1, rep(0, n), sigma2 * R)
        plot(x, eta, typ = "l")
        abline(a = 0, b = 0)

        cor(eta[1:(n-1)], eta[2:n])
      ## [1] 0.9717508
  • Linear correlation function \[r_{ij}(\phi)=\begin{cases} 1-\frac{d_{ij}}{\phi} &\text{if}\ d_{ij}<0\\ 0 &\text{if}\ d_{ij}>0 \end{cases}\]

      library(fields)
      n <- 200
      x <- 1:n
      phi <- 40
      d <- rdist(x)
      R <- ifelse(d<phi,1-d/phi,0)
      sigma2 <- 1
    
      set.seed(4803)
      eta <- mvrnorm(1, rep(0, n), sigma2 * R)
      plot(x, eta, typ = "l")
      abline(a = 0, b = 0)

      cor(eta[1:(n-1)], eta[2:n])
    ## [1] 0.9778878

11.4 Linear models for correlated errors

  • Motivating example
ISIT <- read.table("https://www.dropbox.com/scl/fi/n5ku9crpsr384p9z2rc8b/bioluminescence.txt?rlkey=airzs2ukql4g0hqxjr72re53j&dl=1", header = TRUE)

Sources16 <- ISIT$Sources[ISIT$Station == 16]
Depth16 <- ISIT$SampleDepth[ISIT$Station == 16]
Sources16 <- Sources16[order(Depth16)]
Depth16 <- sort(Depth16)
plot(Depth16, Sources16, las = 1, ylim = c(0, 65), col = rgb(0, 0, 0, 0.25),
pch = 19,ylab="Density of bioluminescence",xlab="Depth in meters")

  • Linear model with correlated errors (aka Kriging)

    • \(\mathbf{\textbf{y}}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\eta}+\boldsymbol{\varepsilon}\) where \(\boldsymbol{\eta}\sim\text{N}(\mathbf{0},\sigma_{\eta}^{2}\mathbf{R}(\phi))\) and \(\boldsymbol{\varepsilon}\sim\text{N}(\mathbf{0},\sigma_{\varepsilon}^{2}\mathbf{I})\)
    • The model above is the same as \(\mathbf{y}\sim\text{N}(\mathbf{X}\boldsymbol{\beta},\sigma_{\eta}^{2}\mathbf{R}(\phi)+\sigma^{2}\mathbf{I})\)
    • Estimation
      • Regression coefficients \[\hat{\boldsymbol{\beta}}=(\mathbf{X}^{\prime}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{V}^{-1}\mathbf{y}\] where \(\mathbf{V=}\sigma_{\eta}^{2}\mathbf{R}(\phi)+\sigma_{\varepsilon}^{2}\mathbf{I}\)
      • Numerical maximization to estimate \(\sigma_{\eta}^{2}\), \(\sigma^{2}\), and \(\phi\)
    • Error terms \[\hat{\boldsymbol{\eta}}=\hat{\sigma}_{\eta}^{2}\mathbf{R}(\hat{\phi})(\hat{\sigma}_{\varepsilon}^{2}\mathbf{I}+\hat{\sigma}_{\eta}^{2}\mathbf{R}(\hat{\phi}))^{-1}(\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})\] \[\hat{\boldsymbol{\varepsilon}} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\eta}}\]
  • Bioluminescence example

    • Intercept only linear model
    ISIT <- read.table("https://www.dropbox.com/scl/fi/n5ku9crpsr384p9z2rc8b/bioluminescence.txt?rlkey=airzs2ukql4g0hqxjr72re53j&dl=1", header = TRUE)
    
    Sources16 <- ISIT$Sources[ISIT$Station == 16]
    Depth16 <- ISIT$SampleDepth[ISIT$Station == 16]
    Sources16 <- Sources16[order(Depth16)]
    Depth16 <- sort(Depth16)
    
    m1 <- lm(Sources16~1)
    e.hat.m1 <- residuals(m1)
    
    plot(Depth16, e.hat.m1, 
     xlab="Depth (m)",ylab=expression(hat(epsilon)),ylim=c(-15,60),las=1,col=rgb(0,0,0,0.25))
    abline(a=0,b=0)

    • Intercept only linear model with spatial random effect
    library(nlme)
    m2 <- gls(Sources16~1,correlation=corGaus(form=~Depth16,nugget=TRUE,value=c(100,0.05),fixed=FALSE),method="ML")
    
    summary(m2)
    ## Generalized least squares fit by maximum likelihood
    ##   Model: Sources16 ~ 1 
    ##   Data: NULL 
    ##        AIC      BIC    logLik
    ##   296.3003 304.0276 -144.1502
    ## 
    ## Correlation Structure: Gaussian spatial correlation
    ##  Formula: ~Depth16 
    ##  Parameter estimate(s):
    ##        range       nugget 
    ## 610.73574394   0.01300607 
    ## 
    ## Coefficients:
    ##                Value Std.Error  t-value p-value
    ## (Intercept) 17.73824  8.981483 1.974978  0.0538
    ## 
    ## Standardized residuals:
    ##        Min         Q1        Med         Q3        Max 
    ## -0.8937320 -0.8247052 -0.6866517  0.3991341  1.7952883 
    ## 
    ## Residual standard error: 19.84738 
    ## Degrees of freedom: 51 total; 50 residual
    # estimated range & nugget parameters from gls
    phi <- exp(m2$model[1]$corStruct[1])^2      
    nugget <- 1/(1+exp(-m2$model[1]$corStruct[2])) 
    
    
    X <- rep(1,length(Sources16))
    y <- Sources16
    
    sigma2.eta <- (m2$sigma^2)*(1-nugget)
    R <- exp(-as.matrix(dist(Depth16,diag=TRUE,upper=TRUE))^2/phi)                                                                                      
    sigma2.e <- (m2$sigma^2)*nugget  
    I <- diag(1,length(y))
    V <- sigma2.eta*R + sigma2.e*I                       
    beta <- solve(t(X)%*%solve(V)%*%X)%*%t(X)%*%solve(V)%*%y 
    
    eta <- sigma2.eta*R%*%solve(V)%*%(y-X%*%beta)
    
    par(mar=c(5.1,4.1,2.1,2.1),oma=c(0,0,0,0))  # reset plot margins
    par(mfrow=c(1,2))  # plot layout
    par(pch=19)  # plotting symbol
    
    # plot predictions for second-order spatial model
    plot(Depth16, Sources16, 
     xlab="Depth (m)",ylab="Density of bioluminescence",ylim=c(-15,60),las=1,col=rgb(0,0,0,0.25))
    lines(Depth16,X%*%beta + eta,col="black")
    lines(Depth16,X%*%beta,col="purple")
    
    e.hat.m2 <- y - (X%*%beta + eta)
    plot(Depth16, e.hat.m2, 
     xlab="Depth (m)",ylab=expression(hat(epsilon)),las=1,col=rgb(0,0,0,0.25))

  • Homework assignment!

    • Go back to the R code for the bioluminescence example and allow the expected value of bioluminescence density to depend on depth (i.e., use a linear model with a slope parameter instead of just an intercept)

11.5 Extreme precipitation in Kansas