第 2 章 空间统计学

在过去的几十年当中,SGLMM 模型的似然分析一直是主要的研究内容,而 N. E. Breslow 和 D. G. Clayton (1993年) (Breslow and Clayton 1993) 关于惩罚拟似然和边际拟似然的研究基本是 SGLMM 模型的似然分析的初始点。Charles J. Geyer (1994年) 用极大似然法估计指数族中的参数,以蒙特卡罗积分近似对数似然函数, 提出并证明了蒙特卡罗极大似然算法的收敛性和参数估计的相合性和渐近正态性 (Geyer 1994),这为算法的应用打开了局面。随后,似然估计的统计性质和随机模拟算法的收敛性和应用成为研究的重点。

近年来,在大数据的背景下,寻求高效的算法成为一个新的方向,2009 年 Rue 等人提出基于近似贝叶斯推断的集成嵌套拉普拉斯算法,简称 INLA (Rue et al. 2009), 并将其应用于空间数据建模(Lindgren and Rue 2015),还推广到一般的贝叶斯计算 (Rue et al. 2017)。2013年,Liang 等人将重抽样的技术用于大规模地质统计数据分析,相比贝叶斯方法 (Diggle, Tawn, and Moyeed 1998),它可以更加快速地获得准确的结果 (Liang et al. 2013)。同时, 涉及空间数据分析和建模的书籍也越来越多,用于空间数据分析的分层模型 (Banerjee, Carlin, and Gelfand 2015) 和基于 R-INLA 软件的空间和时空贝叶斯模型 (Marta Blangiardo 2015)。2016 年 Bonat 和 Ribeiro Jr. 综合比较了 MCML、贝叶斯 MCMC 和近似拉普拉斯算法方法 (Bonat and Ribeiro 2016)

贝叶斯方法构造马尔科夫链,需要很多次迭代,收敛速度慢,需要花费很多计算时间,凡是贝叶斯推断,必然用到 MCMC 算法,常见的有 Metropolis-Hastings 算法 (Diggle, Tawn, and Moyeed 1998; Diggle et al. 2002)、 Langevin-Hastings 算法(geoRglm/geoCount 实现)和汉密尔顿蒙特卡罗算法 (Stan/PrevMap 实现)。 极大似然和重要性采样相结合的方法出现了,称之为蒙特卡罗极大似然法,简称 MCML。 1994 年 Charles J. Geyer 首先从理论证明 MCML 方法的收敛性及相关似然估计的正态性, 为后续算法的开发、改进以及应用提供了理论支持 (Geyer 1994)。 2002 年 Hao Zhang 在做模型的估计和预测的时候提出蒙特卡罗期望极大梯度 (Monte Carlo EM Gradient) 算法 (Zhang 2002)。2016 年 Fatemeh Hosseini 在 MCEMG 的基础上提出近似蒙特卡罗期望极大梯度 (Approximate Monte Carlo EM Gradient) 算法(Hosseini 2016)。2004 年 Ole F Christensen 将 MCML 方法用于 SGLMM 模型 (Christensen 2004),2016 年 Peter J. Diggle 和 Emanuele Giorgi 将 MCML 方法应用于分析西非流行病调查数据(Diggle and Giorgi 2016)。低秩近似的想法源于 1996 年 Trevor Hastie 在 (Hastie 1996)

在提出 SGLMM 模型之前,先介绍空间高斯过程,然后是平稳空间高斯过程和 SGLMM 模型结构,以步步推进的方式引入各成分的假设条件,其后着重阐述了空间效应的自相关函数,它决定空间效应,包含模型中的待估参数。

高斯过程是连续的随机过程,可解释为连续随机变量函数上的概率分布,可看作不可数无穷多个随机变量的集合。我们知道简单的多元正态分布由均值向量和协方差矩阵决定,与此不同的是高斯过程由均值函数和协方差函数决定。

空间高斯过程在模型 (2.1) 中作为随机效应存在,不同于一般随机效应的地方在于它与样本的空间位置有关,一般地,假定 \(S(x)\) 服从 \(N\) 元高斯分布 \(N(\mu_{S},G)\)\(x = (d_1,d_2) \in \mathbb{R}^2\)\(\mu_{S} = \mathbf{0}_{N\times1}\)\(G_{(ij)} = \mathrm{Cov}(S(x_i),S(x_j))=\sigma^2*\rho(u_{ij})\)\(S(x)\) 的相关函数 \(\rho(u_{ij}) = \exp(-u_{ij}/\phi), u_{ij} \equiv \|x_{i}-x_{j}\|_2\),其中 \(\phi\)\(\sigma^2\) 都是未知待估参数。可见采样点的位置 \((d_1,d_2)\) 和相关函数 \(\rho(u)\) 一起决定空间高斯过程的形式,并且 \(S(x)\) 的维度就是采样点的数目 \(N\)。这样通常导致空间效应的维度比协变量的数目大很多,模型 (2.1) 的估计问题也比一般的广义线性混合效应模型困难。

\[\begin{equation} g(\mu_i) = d(x_i)'\beta + S(x_i) + Z_i \tag{2.1} \end{equation}\]

一般地,我们假定在给定 \(\beta\)\(S(x_i)\) 的条件下,响应变量 \(Y_i\) 服从指数族,这里不失一般性地讨论空间广义线性混合效应模型的极大似然估计。条件概率密度函数 \(Y_i|\beta,\mu_i\)\[f(y_i|\beta,S(x_i))=\exp\{ [y_{i}\mu_i-b(\mu_i)] + c(y_i) \}\] 其中,\(\mu_i = \mathrm{E}\big[Y_i|\beta,S(x_i)\big]\) 是典则参数,\(Y_i \sim N(\mu,\sigma^2)\) 时,

\(T = (T_1,T_2,\ldots,T_n)'\) 是多元正态分布 \(N(D\beta,\Sigma(\boldsymbol{\theta}))\),参数 \(\beta,\boldsymbol{\theta}\) 的似然函数

\[\begin{equation} L(\beta',\boldsymbol{\theta}';y) = \int \prod_{i=1}^{n}f(y_i|t_i)f(T|\beta',\boldsymbol{\theta}')dt \tag{2.2} \end{equation}\]

(Bonat and Ribeiro 2016) 注意到参数 \(\tau\)\(\psi\) 是不可识别的。 \(S(x_i)\) 表示潜变量 (latent variable,又称随机效应,random effect)

R 语言在空间数据分析与可视化方面呈现越来越流行的趋势,从早些年的 lattice 图形 (Sarkar 2008) 到如今的 ggplot2 图形 (Wickham 2016),操作空间数据的 sp 对象也发展为 sf 对象,还整合了不少第三方软件和服务,如基于 Google Maps 和 Google Earth 的交互式空间可视化。gstat 包迁移自 S 语言,提供了克里金插值方法 (Gräler, Pebesma, and Heuvelink 2016),它其实是空间线性混合效应模型的特例。

2.1 随机过程

以高斯过程为例介绍随机过程的有关知识。

2.1.1 从随机变量过渡到随机过程

考虑一个来自概率分布的样本,连续分布,如正态分布,样本观察值以均值 \(\mu\) 为中心,以标准差 \(\sigma\) 为分散程度随机分布。事实上,样本来自以 \(\mu\)\(\sigma^2\) 为条件的概率分布,对连续型概率分布,所有的样本组成了一个不可数无穷的集合。现在,我们往前推广一下这个定义,想像我们不是获取随机数,而是随机函数。类似给定均值和方差的条件下,我们给定随机函数的空间,我们称之为超参数,随机函数的集合。超参数由协方差函数表示,即常说的核,它定义了给定数据集上两个观测之间的关系。

本章,我们都在使用 MCMC 方法去估计随机函数的集合用来描述我们的数据。

为何随机过程优于其它非线性建模技术

2.1.2 高斯过程的表示

许多数学分支都自备一套符号表示,这里的符号表示采纳自 Rasmussen and Williams (2006)

高斯过程主要集中在近似如下目标积分,也是我们所说的边际似然

\[p(y|X) = \int p(y|f,X)p(f|X,\theta)p(\theta)df\]

其中 \(\theta\) 表示核包含的超参数的集合, X 表示设计矩阵。比如,幂二次指数核 (squared exponential kernel),也是熟知的 Radial Basis Function,在 Stan 中,我们用 gp_exp_quad_cov 来表示高斯过程幂二次指数协方差函数

\[k(x',x) = \sigma^2\exp\big(\frac{d(x',x)^2}{2\ell^2}\big)\] 在这里 \(\theta \triangleq \{\sigma,\ell\}\)\(d(x',x)^2\) 通常是欧式距离,我们当然也可以采用其他度量。 \(x'\)\(x\) 表示观测 \(X\) 所在的不同位置。这个核会生成非常光滑的函数。我稍后会生成来自不同核的函数,如果你想先一睹为快,看看这个核函数生成的一些函数,请跳转至第 2.1.3 节。

联合高斯过程的协方差矩阵定义如下

\[ K_{joint} = \begin{pmatrix} K_{X,X} & K_{X,X_{\star}} \\ K_{X_{\star},X} & K_{X_{\star},X_{\star}} \end{pmatrix} \]

\(K_{joint}\) 左上和右下的子矩阵是方阵,它们的大小由训练集和测试集决定,即依赖于数据集的拆分。

在给定核函数的情况下,高斯过程(简写为 GP)的观察值可以从多元高斯分布中产生,下面的高斯过程, \(\mu\) 表示均值函数, \(K_{\theta}\) 表示给定超参数 \(\theta\) 的情况下的核或协方差

\[p(y|f) \sim N(f,\sigma)p(f|X,\theta) \sim GP(\mu,K_{\theta})\]

我们生成样本外的均值预测,估计预测均值

\[ f_{\star} = K_{X_{\star},X}K_{X,X}f \]

生成带噪声的预测

\[ f_{\star} = K_{X_{\star},X}(K_{X,X} + \sigma^2_{n}I)f \]

其中 \(\sigma^2_n\) 表示长度为 \(n\) 的向量,表示噪声的强度,\(I\) 表示单位矩阵。

类似地,通过多元高斯分布的协方差,我们可以获得高斯过程的协方差矩阵,不带噪声的版本

\[\mathrm{Cov}(f_{\star},f_{\star}) = K_{f_{\star},f_{\star}} - K_{f_{\star},f}K_{f,f}^{-1}K_{f,f_{\star}}\]

带噪声的版本

\[ \mathrm{Cov}(f_{\star},f_{\star}) = K_{f_{\star},f_{\star}} - K_{f_{\star},f}(K_{f,f}+\sigma^{2}_{n}I)^{-1}K_{f,f_{\star}} \]

2.1.3 不同的核函数

2.2 高斯分布

library(geoR)

以数据集 s100 为例,展示空间数据建模过程

研究测量误差/块金效应的估计的相合性(一致性) https://github.com/LuZhangstat/nugget_consistency

2.2.1 探索性分析

参考材料来自 http://www.leg.ufpr.br/geoR/

geoR 包内的数据集都被打包成 geodata 类型的对象, s100 是一个模拟的数据集

data(s100)
class(s100)
## [1] "geodata"
summary(s100)
## Number of data points: 100 
## 
## Coordinates summary
##         Coord.X    Coord.Y
## min 0.005638006 0.01091027
## max 0.983920544 0.99124979
## 
## Distance summary
##         min         max 
## 0.007640962 1.278175109 
## 
## Data summary
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.1676955  0.2729882  1.1045936  0.9307179  1.6101707  2.8678969 
## 
## Other elements in the geodata object
## [1] "cov.model" "nugget"    "cov.pars"  "kappa"     "lambda"

geoR 包提供了 geodata 对象的绘图函数 plot.geodata

plot(s100)

geoR 还为 geodata 对象写了 points.geodata 函数

par(mfrow = c(2, 2), mar = c(3, 3, 1, 1), mgp = c(2, 0.8, 0))
points(s100, xlab = "Coord X", ylab = "Coord Y")
points(s100, xlab = "Coord X", ylab = "Coord Y", pt.divide = "rank.prop")
points(s100,
  xlab = "Coord X", ylab = "Coord Y", cex.max = 1.7,
  col = gray(seq(1, 0.1, l = 100)), pt.divide = "equal"
)
points(s100, pt.divide = "quintile", xlab = "Coord X", ylab = "Coord Y")

经验变差 Empirical variograms

cloud1 <- variog(s100, option = "cloud", max.dist = 1)
cloud2 <- variog(s100, option = "cloud", estimator.type = "modulus", max.dist = 1)
bin1 <- variog(s100, uvec = seq(0, 1, l = 11))
bin2 <- variog(s100, uvec = seq(0, 1, l = 11), estimator.type = "modulus")
par(mfrow = c(2, 2), mar = c(3, 3, 2, 2), mgp = c(2, 0.8, 0))
plot(cloud1, main = "classical estimator")
plot(cloud2, main = "modulus estimator")
plot(bin1, main = "classical estimator")
plot(bin2, main = "modulus estimator")

bin1 <- variog(s100, uvec = seq(0, 1, l = 11), bin.cloud = T)
bin2 <- variog(s100, uvec = seq(0, 1, l = 11), estimator.type = "modulus", bin.cloud = T)
par(mfrow = c(1, 2), mar = c(3, 3, 2, 2), mgp = c(2, .8, 0))
plot(bin1, bin.cloud = T, main = "classical estimator")
plot(bin2, bin.cloud = T, main = "modulus estimator")

bin1 <- variog(s100, uvec = seq(0, 1, l = 11))
par(mar = c(3, 3, .2, .2), mgp = c(2, .8, 0))
plot(bin1)
lines.variomodel(
  cov.model = "exp", cov.pars = c(1, 0.3),
  nugget = 0, max.dist = 1, lwd = 3
)
smooth <- variog(s100,
  option = "smooth", max.dist = 1,
  n.points = 100, kernel = "normal", band = 0.2
)
lines(smooth, type = "l", lty = 2)
legend(0.4, 0.3, c("empirical", "exponential model", "smooth"),
  lty = c(1, 1, 2), lwd = c(1, 3, 1), cex = 0.7
)

vario60 <- variog(s100, max.dist = 1, direction = pi / 3)
vario.4 <- variog4(s100, max.dist = 1)
par(mfrow = c(1, 2), mar = c(3, 3, 1, 1), mgp = c(2, .8, 0))
plot(vario60)
title(main = expression(paste("directional, angle = ", 60 * degree)))
plot(vario.4)

2.2.2 参数估计

par(mar = c(3.5, 3.5, .2, .2), mgp = c(2, .8, 0))
plot(variog(s100, max.dist = 1))
lines.variomodel(cov.model = "exp", cov.pars = c(1, .3), nug = 0, max.dist = 1)
lines.variomodel(cov.model = "mat", cov.pars = c(.85, .2), nug = 0.1, 
                 kappa = 1, max.dist = 1, lty = 2)
lines.variomodel(cov.model = "sph", cov.pars = c(.8, .8), nug = 0.1, 
                 max.dist = 1, lwd = 2)

块金效应固定为 0

ml <- likfit(s100, ini = c(1, 0.5), fix.nugget = T)
reml <- likfit(s100, ini = c(1, 0.5), fix.nugget = T, lik.method = "RML")
ols <- variofit(bin1, ini = c(1, 0.5), fix.nugget = T, weights = "equal")
wls <- variofit(bin1, ini = c(1, 0.5), fix.nugget = T)

块金效应固定为某一个确定的值

ml.fn <- likfit(s100, ini = c(1, 0.5), fix.nugget = T, nugget = 0.15)
reml.fn <- likfit(s100, ini = c(1, 0.5), fix.nugget = T, nugget = 0.15, lik.method = "RML")
ols.fn <- variofit(bin1, ini = c(1, 0.5), fix.nugget = T, nugget = 0.15, weights = "equal")
wls.fn <- variofit(bin1, ini = c(1, 0.5), fix.nugget = T, nugget = 0.15)

使用估计的块金效应值

ml.n <- likfit(s100, ini = c(1, 0.5), nug = 0.5)
reml.n <- likfit(s100, ini = c(1, 0.5), nug = 0.5, lik.method = "RML")
ols.n <- variofit(bin1, ini = c(1, 0.5), nugget = 0.5, weights = "equal")
wls.n <- variofit(bin1, ini = c(1, 0.5), nugget = 0.5)

不同方法拟合经验变差

par(mfrow = c(1, 3), mar = c(3, 3, 2, 0.5), mgp = c(2, .8, 0), cex = 1)
plot(bin1, main = expression(paste("fixed ", tau^2 == 0)))
lines(ml, max.dist = 1)
lines(reml, lwd = 2, max.dist = 1)
lines(ols, lty = 2, max.dist = 1)
lines(wls, lty = 2, lwd = 2, max.dist = 1)
legend(0.55, 0.3, legend = c("ML", "REML", "OLS", "WLS"), 
       lty = c(1, 1, 2, 2), lwd = c(1, 2, 1, 2), cex = 0.7)
##
plot(bin1, main = expression(paste("fixed ", tau^2 == 0.15)))
lines(ml.fn, max.dist = 1)
lines(reml.fn, lwd = 2, max.dist = 1)
lines(ols.fn, lty = 2, max.dist = 1)
lines(wls.fn, lty = 2, lwd = 2, max.dist = 1)
legend(0.55, 0.3, legend = c("ML", "REML", "OLS", "WLS"), 
       lty = c(1, 1, 2, 2), lwd = c(1, 2, 1, 2), cex = 0.7)
##
plot(bin1, main = expression(paste("estimated  ", tau^2)))
lines(ml.n, max.dist = 1)
lines(reml.n, lwd = 2, max.dist = 1)
lines(ols.n, lty = 2, max.dist = 1)
lines(wls.n, lty = 2, lwd = 2, max.dist = 1)
legend(0.55, 0.3, legend = c("ML", "REML", "OLS", "WLS"), 
       lty = c(1, 1, 2, 2), lwd = c(1, 2, 1, 2), cex = 0.7)

两类变差函数的包络

env.mc <- variog.mc.env(s100, obj.var = bin1)
env.model <- variog.model.env(s100, obj.var = bin1, model = wls)

par(mfrow = c(1, 2), mar = c(3, 3, 0.2, 0.2), mgp = c(2, .8, 0))
plot(bin1, envelope = env.mc)
plot(bin1, envelope = env.model)

Profile likelihoods (1-D and 2-D) 剖面似然方法估计模型的协方差参数,无块金效应

# 这一步可能花费较多时间
prof <- proflik(ml,
  geodata = s100, sill.val = seq(0.48, 2, l = 11),
  range.val = seq(0.1, 0.52, l = 11), uni.only = FALSE
)
par(mfrow = c(1, 3), mar = c(3, 3, 1, 0.2), mgp = c(2, .8, 0), cex = 1.2)
plot(prof, nlevels = 16)

2.2.3 交叉验证

xv.ml <- xvalid(s100, model = ml)
xv.wls <- xvalid(s100, model = wls)
par(mfcol = c(5, 2), mar = c(2.5, 2.5, .5, .5), mgp = c(1.8, 0.8, 0))
plot(xv.wls)

# 这一步时间花费会比较多
xvR.ml <- xvalid(s100, model = ml, reest = TRUE)
xvR.wls <- xvalid(s100, model = wls, reest = TRUE, variog.obj = bin1)

2.2.4 空间插值

  • Simple kriging
  • Ordinary kriging
  • Trend (universal) kriging
  • External trend kriging
par(mar = c(3, 3, 0.2, 0.2), mgp = c(2, .8, 0))
plot(s100$coords, xlim = c(0, 1.2), ylim = c(0, 1.2), 
     xlab = "Coord X", ylab = "Coord Y")
loci <- matrix(c(0.2, 0.6, 0.2, 1.1, 0.2, 0.3, 1.0, 1.1), ncol = 2)
text(loci, as.character(1:4), cex = 1.3, col = "red")
polygon(x = c(0, 1, 1, 0), y = c(0, 0, 1, 1), lty = 2)

ordinary kriging 块金效应为 0

kc4 <- krige.conv(s100, locations = loci, krige = krige.control(obj.m = wls))

这里仍然使用 ordinary kriging 插值

pred.grid <- expand.grid(seq(0, 1, l = 51), seq(0, 1, l = 51))
# 可以考虑更加精细的网格划分
# pred.grid <- expand.grid(seq(0, 1, l = 101), seq(0, 1, l = 101)) 
kc <- krige.conv(s100, loc = pred.grid, krige = krige.control(obj.m = ml))

插值预测的结果

par(mar = c(3, 3, .5, .5), mgp = c(2, .8, 0))
image(kc, loc = pred.grid, col = gray(seq(1, 0.1, l = 30)), 
      xlab = "Coord X", ylab = "Coord Y")

2.2.5 贝叶斯分析

贝叶斯分析需要用 MCMC 算法,下面的计算可能花费较多时间

pr <- prior.control(phi.discrete = seq(0, 5, l = 101), phi.prior = "rec")
bsp4 <- krige.bayes(s100, loc = loci, prior = pr, output = output.control(n.post = 10000))

使用直方图来观察模型参数的后验分布的样本

par(mfrow = c(1, 3), mar = c(3, 3, 1, 0.2), mgp = c(2, .8, 0), cex = 1.2)
hist(bsp4$posterior$sample$beta, main = "", xlab = expression(beta), prob = T)
hist(bsp4$posterior$sample$sigmasq, main = "", xlab = expression(sigma^2), prob = T)
hist(bsp4$posterior$sample$phi, main = "", xlab = expression(phi), prob = T)

比较估计的贝叶斯变差和样本的经验变差

par(mar = c(3, 3, .5, .5), mgp = c(2, .8, 0))
plot(bin1, ylim = c(0, 1.5))
lines(bsp4, max.dist = 1.2, summ = mean)
lines(bsp4, max.dist = 1.2, summ = median, lty = 2)
lines(bsp4, max.dist = 1.2, summ = "mode", post = "par", lwd = 2, lty = 2)
legend(0.25, 0.4,
  legend = c(
    "variogram posterior mean",
    "variogram posterior median",
    "parameters posterior mode"
  ),
  lty = c(1, 2, 2), lwd = c(1, 1, 2), cex = 0.8
)

四个预先给定的位置处的后验预测分布

# curve(dnorm(x, mean = kc4$pred[1], sd = sqrt(kc4$krige.var[1])),
#   from = kc4$pred[1] - 3 * sqrt(kc4$krige.var[1]),
#   to = kc4$pred[1] + 3 * sqrt(kc4$krige.var[1])
# )
par(mfrow = c(2, 2), mar = c(3, 3, .5, .5), mgp = c(1.5, .8, 0))
for (i in 1:4) {
  kpx <- seq(kc4$pred[i] - 3 * sqrt(kc4$krige.var[i]), 
             kc4$pred[i] + 3 * sqrt(kc4$krige.var[i]), l = 100)
  kpy <- dnorm(kpx, mean = kc4$pred[i], sd = sqrt(kc4$krige.var[i]))
  bp <- density(bsp4$predictive$simulations[i, ])
  rx <- range(c(kpx, bp$x))
  ry <- range(c(kpy, bp$y))
  plot(cbind(rx, ry), type = "n", 
       xlab = paste("Location", i), ylab = "density", 
       xlim = c(-4, 4), ylim = c(0, 1.1))
  lines(kpx, kpy, lty = 2)
  lines(bp)
}

贝叶斯预测,这一步时间花费比较大

pred.grid <- expand.grid(seq(0, 1, l = 51), seq(0, 1, l = 51))
# pred.grid <-  expand.grid(seq(0, 1, l = 101), seq(0, 1, l = 101))
## Bayesian prediction
bsp <- krige.bayes(s100, loc = pred.grid, 
                   prior = prior.control(phi.discrete = seq(0, 5, l = 101)), 
                   output = output.control(n.predictive = 2))

将预测结果以图的方式呈现出来

par(mfrow = c(2, 2), mar = c(2.5, 2.5, 3, 0), mgp = c(1.5, .8, 0))
image(bsp, loc = pred.grid, main = "predicted", col = gray(seq(1, 0.1, l = 30)))
image(bsp, val = "variance", loc = pred.grid, main = "prediction variance",
      col = gray(seq(1, 0.1, l = 30)))
image(bsp, val = "simulation", number.col = 1, loc = pred.grid, 
      main = "a simulation from \n the predictive distribution", 
      col = gray(seq(1, 0.1, l = 30)))
image(bsp, val = "simulation", number.col = 2, loc = pred.grid, 
      main = "another simulation from \n the predictive distribution", 
      col = gray(seq(1, 0.1, l = 30)))

2.2.6 模拟高斯随机场

随机抽样

sim1 <- grf(100, cov.pars = c(1, .25))
par(mfrow = c(1, 2), mar = c(2.5, 2.5, 1, 0.2), mgp = c(1.5, .8, 0))
points(sim1, main = "simulated data")
plot(sim1, max.dist = 1, main = "true and empirical variograms")

在网格的格点上采样

sim2 <- grf(441, grid = "reg", cov.pars = c(1, .25))
par(mar = c(4, 4, 1.5, 1.5))
image(sim2, main = "a \"smallish\" simulation", col = gray(seq(1, .1, l = 30)))

模拟大量采样点,推荐使用 RandomFields 包(Schlather et al. 2015) 来模拟,模拟一个更加精细的随机场

sim3 <- grf(40401, grid = "reg", cov.pars = c(10, .2), method = "RF", RF = TRUE)
par(mar = c(4, 4, 1.5, 1.5))
image(sim3, main = "simulation in a finner grid", col = gray(seq(1, .1, l = 30)))

带边界的模拟,即在一个有限区域内,随机采样

data(parana)
pr1 <- grf(100, cov.pars = c(200, 40), borders = parana$borders, mean = 500)
points(pr1)

或者在规则网格上采样

pr1 <- grf(100, grid = "reg", cov.pars = c(200, 40), borders = parana$borders)
points(pr1)

控制水平或垂直方向上的采样密度

pr1 <- grf(100, grid = "reg", nx = 10, ny = 5, cov.pars = c(200, 40), 
           borders = parana$borders)
points(pr1)

不使用 geoR::grf 和 RandomFields 包,也不使用 mvtnorm::rmvnormMASS::mvrnormbrms::rmulti_normal 产生多元正态分布的随机数, 而是根据多元正态分布的定义来模拟,通过对对称正定的协方差矩阵做 Choleski 分解,来模拟高斯随机过程

模拟多元正态分布

rmvn <- function(N, mu = 0, V = matrix(1)){
  P <- length(mu)
  if(any(is.na(match(dim(V), P))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(N * P), ncol = P) %*% D + rep(mu, rep(N, P)))
}
matern_fun <- function(u, phi, kappa){
    uphi <- u/phi # kappa 和 phi 都必须大于0
    uphi <- ifelse(u > 0, (((2^(-(kappa - 1))) / ifelse(0, Inf, gamma(kappa))) * (uphi^kappa) * besselK(x = uphi, nu = kappa)), 1)
    # uphi[u > 600 * phi] <- 0 # 比值大于600就设置为0
    return(uphi)
}

# 模拟空间广义线性混合效应模型 stan-SGLMM
sim_data <- function(N = 49, intercept = -1.0, slope1 = 1.0, slope2 = 0.5,
                     lscale = 1, sdgp = 1, cov.model = "exp_quad", type = "binomial"){
  set.seed(2018)

  d <- expand.grid(
    d1 = seq(0, 1, l = sqrt(N)),
    d2 = seq(0, 1, l = sqrt(N))
  )
  D <- as.matrix(dist(d)) # 计算采样点之间的欧氏距离

  switch(cov.model,
    matern = {
      phi <- lscale
      corr_m <- matern_fun(D, phi = phi, kappa = 2)
      # corr_m <- geoR::matern(D, phi = phi, kappa = 2) # kappa = 2 固定的
      # PrevMap::matern.kernel
      m <- sdgp^2 * corr_m
    },
    exp_quad = {
      phi <- 2 * lscale^2 # 二次幂指数核函数 kappa = 2
      m <- sdgp^2 * exp(-D^2 / phi) # 多元高斯分布的协方差矩阵
    },
    exp_linear = {
      phi <- 2 * lscale^2 # 指数核函数 kappa = 0.5 的梅隆函数
      m <- sdgp^2 * exp(-D / phi) # 多元高斯分布的协方差矩阵
    }
  )

  # powered.exponential (or stable)
  # rho(h) = exp[-(h/phi)^kappa] if 0 < kappa <= 2 此处 kappa 固定为 2
  S <- MASS::mvrnorm(1, rep(0, N), m)
  # 模拟两个固定效应
  x1 <- rnorm(N, 0, 1)
  x2 <- rnorm(N, 0, 4)
  switch(type,
    binomial = {
      units.m <- rep(100, N) # N 个 100
      pred <- intercept + slope1 * x1 + slope2 * x2 + S
      mu <- exp(pred) / (1 + exp(pred))
      y <- rbinom(N, size = 100, prob = mu) # 每个采样点抽取100个样本
      data.frame(d, y, units.m, x1, x2)
    },
    poisson = {
      pred <- intercept + slope1 * x1 + slope2 * x2 + S
      y <- rpois(N, lambda = exp(pred)) # lambda 是泊松分布的期望
      # Y ~ Possion(lambda) g(u) = ln(u) u = lambda = exp(g(u))
      data.frame(d, y, x1, x2)
    }
  )
}

模拟恢复

binom_data_a1 <- sim_data(
  N = 49, intercept = -1.0, slope1 = 1.0, slope2 = 0.5,
  lscale = 1, sdgp = 1, cov.model = "exp_linear", type = "binomial"
)
fit.binom_data_a1 <- brms::brm(y | trials(units.m) ~ 1 + x1 + x2 + gp(d1, d2),
  data = binom_data_a1,
  chains = 2, thin = 2, iter = 1500, warmup = 500,
  algorithm = "sampling", family = binomial()
)

2.3 泊松分布

调用 geoR 包的 grf 函数,该函数实际基于 RandomFields 包模拟高斯随机场(Gaussian random fields),首先考虑在规则的二维平面的格点上采样。第\(i\)个格点的坐标 \(w_i = (d_{i1}, d_{i2}) \in \mathbb{R}^2\) 高斯过程的协方差函数的参数为

\[ \mathrm{Cov}(S(w_i), S(w_j)) = \sigma^2 \exp( -\|w_i - w_j\|_{2} / \phi ) \]

其中 \(\sigma^2 = 0.1, \phi = 0.2\) 协方差函数的结构为指数型

参数为 \(\mu_i\lambda(w_i)\) 的泊松分布 \(Y_i \sim \mathrm{Pois}(\mu_i\lambda(w_i))\)

泊松型空间广义线性混合效应模型为

\[ \log\big(\lambda(w_i)\big) = \alpha + S(w_i) \]

library(geoR)

set.seed(2019)
sim.g <- grf(
  grid = expand.grid(x = seq(1, 10, l = 10), y = seq(1, 10, l = 10)),
  cov.pars = c(0.1, 1.0), method = "cholesky",
  kappa = 0.5, nugget = 0, cov.model = "matern"
)

sim_pois <- list(coords = sim.g$coords, units.m = c(rep(1, 50), rep(5, 50)))
attr(sim_pois, "class") <- "geodata"
sim_pois$data <- rpois(100, lambda = sim_pois$units.m * exp(sim.g$data))

par(mar = c(4.1, 4.1, 0.5, 0.5))
plot(sim_pois$coords[, 1], sim_pois$coords[, 2], type = "n", 
     xlab = expression(d[1]), ylab = expression(d[2]))
text(sim_pois$coords[, 1], sim_pois$coords[, 2], format(sim_pois$data))
points(cbind(c(5, 8), c(5, 9)), cex = 5)

geoRglm 包提供 glsm.mcmc 函数实现了 Langevin-Hastings MCMC 算法,用于模拟条件分布,调整参数,使得采样接受率达到 60% 左右,这是 Langevin-Hastings MCMC 算法的最优接受率,调参方式是试错法。

library(geoRglm)
# 固定模型参数
model1 <- list(cov.pars = c(1, 1), beta = 1, family = "poisson")
# # 调接受率
# mcmc1.test <- mcmc.control(S.scale = 0.2, thin = 1)
# test1.tune <- glsm.mcmc(sim_pois, model = model1, mcmc.input = mcmc1.test)
# # 再试一组参数
# mcmc1.tune <- mcmc.control(S.scale = 0.5, thin = 1)
# test1.tune <- glsm.mcmc(sim_pois, model = model1, mcmc.input = mcmc1.tune)
# 现在接受率是 0.612 接近 60%
mcmc2.tune <- mcmc.control(S.scale = 0.4, thin = 1)
test2.tune <- glsm.mcmc(sim_pois, model = model1, mcmc.input = mcmc2.tune)

调用 coda 包(Plummer et al. 2006) 处理模拟结果,主要是评估链条的收敛性和混合效果。推荐用户画出所有随机效应的迭代轨迹图和自相关图,篇幅所限,我们这里观察一个随机效应的情况,实际上我们这里有100个。

library(coda)
# test2.tune.c <- create.mcmc.coda(test2.tune, mcmc.input = mcmc2.tune)
# 第45行即坐标 (5,5)
test2.tune.c <- create.mcmc.coda(test2.tune$simulations[45, ],
                                 mcmc.input = list(S.scale = 0.4, thin = 1))
par(mfrow = c(1, 2))
plot(test2.tune.c, density = FALSE, ask = FALSE, auto.layout = FALSE)
autocorr.plot(test2.tune.c, ask = FALSE, auto.layout = FALSE)

为了减少样本之间的自相关性,我们将抽样间隔设为 10,意味着每 10 次迭代抽样 1 次,并将前 2000 次迭代的结果丢弃。

mcmc1 <- mcmc.control(S.scale = 0.4, thin = 10, burn.in = 2000)
test1 <- glsm.mcmc(sim_pois, model = model1, mcmc.input = mcmc1)

广义线性空间模型预测

out1 <- output.glm.control(sim.predict = TRUE)
pred.test1 <- glsm.krige(mcmc.output = test1, locations = cbind(c(5, 8), c(5, 9)), 
                         output = out1)
## Warning in remove(list = c("v0", "invcov", "b")): object 'invcov' not found

这里用于预测的 kriging 方法是简单 kriging (simple kriging) 而不是普通 kriging (ordinary kriging),因为这里的参数都是固定的。Ordinary 和 Universal kriging 在高斯模型中相当于对参数 \(\beta\) 添加均匀先验 (uniform flat prior),在 GLSM 中,上述预测方法还没有实现。在贝叶斯推断中,我们对参数 \(\beta\) 添加了均匀先验,也在函数 pois.krigebinom.krige 中实现了上述预测方法。

预测值,预测的方差和估计的蒙特卡罗标准误差

cbind(pred.test1$predict, pred.test1$mcmc.error)
##           [,1]        [,2]
## [1,] 0.7649463 0.018401031
## [2,] 0.4006933 0.007993234

我们注意到蒙特卡罗标准误差(此误差是由于蒙特卡罗方法抽样带来的抽样误差)相比于预测值来说,可以忽略,这样抽样的过程是满意。

par(mfrow = c(1, 2))
hist(pred.test1$simulations[1, ], main = "(5, 5)")
hist(pred.test1$simulations[2, ], main = "(8, 9)")

2.3.1 贝叶斯 MCMC I

在贝叶斯框架下,函数 pois.krige.bayesbinom.krige.bayes 分别实现了泊松对数线性模型和二项逻辑线性模型,模型参数可以设置为固定或随机。

以 geoRglm 包自带的模拟数据集 p50 为例,无块金效应,在规则的二维网格上模拟的,响应变量服从泊松分布,空间效应的核函数是指数型的,即相当于 Matern 型核函数参数 \(\kappa = 0.5\),协方差参数 \(\phi = 0.1, \sigma^2 = 1.0\)

\[\log(\lambda_i) = \beta + S(w_i)\]

\(Y_i \sim \mathrm{Poisson}(\mu_i\lambda_i)\)p50$data 相当于是 \(Y_i\) 的观测值,p50$coords\(50\times2\) 的坐标矩阵,p50$units.m\(\mu_i\),如图所示,橘黄色的点是采样的位置,点的大小代表 p50$units.m 的值,点旁边的值是响应变量 \(Y_i\) 的值。

data(p50)
par(mar = c(4.1, 4.1, 0.5, 0.5))
plot(p50$coords[, 1], p50$coords[, 2], type = "n", 
     xlab = expression(d[1]), ylab = expression(d[2]))
points(p50$coords[, 1], p50$coords[, 2], cex = p50$units.m, pch = 16, col = "darkorange")
text(p50$coords[, 1], p50$coords[, 2], format(p50$data), pos = 4, offset = 0)

在这个例子中,我们先不考虑块金效应,只考虑参数 \(\beta\)\(\sigma^2\)。首先还是调参数使得 MCMC 算法的接受率接近为 60%

prior5 <- prior.glm.control(phi.prior = "fixed", phi = 0.1)
mcmc5.tune <- mcmc.control(S.scale = 0.01, thin = 1)
test5.tune <- pois.krige.bayes(p50, prior = prior5, mcmc.input = mcmc5.tune)

试错法找到参数后,开始做贝叶斯推断,每个 100 次迭代采一个样本,迭代 100000 次,共获得 1000 个样本

mcmc5 <- mcmc.control(S.scale = 0.075, thin = 100, burn.in = 10000)
out5 <- output.glm.control(threshold = 10, quantile = c(0.05, 0.99))
test5 <- pois.krige.bayes(p50,
  locations = t(cbind(c(2.5, 3), c(-6050, -3270))),
  prior = prior5, mcmc.input = mcmc5, output = out5
)

输出结果 test5 对象主要包含 5 个部分,分别是 posterior、 predictive、 model、 prior 和 mcmc.input。 posterior 包含参数的后验分布和提供的坐标处的 \(g^{-1}(S)\) 条件模拟(其实就是随机效应的模拟分布)。predictive 包含预测信息 predictive$medianpredictive$uncertainty 预测的信号值和不确定性。参数 threshold = 10 指定预测分布的概率test5$predictive$probability要小于 10%,quantiles = c(0.05, 0.99) 预测分布的两个分位点 test5$predictive$quantiles

下面是几个点处的后验分布的模拟

par(mfrow = c(1, 3))
hist(test5$posterior$simulations[10, ], main="(9, 0)")
hist(test5$posterior$simulations[23, ], main="(2, 2)")
hist(test5$posterior$simulations[36, ], main="(5, 3)")

接下来,我们考虑带块金效应的模型,且块金效应和方差之比为 \(\tau^2/\sigma^2 = 0.05\),将参数 \(\phi\) 视为随机的,通过参数 phi.discrete 指定参数 \(\phi\) 的离散先验,参数 tausq.rel = 0.05 指定相对块金效应。首先还是调参,使得采样的接受率达到 60% 且 \(\phi\) 的接受率达到 25% 至 35%。

mcmc6.tune <- mcmc.control(S.scale = 0.075, n.iter = 2000, thin = 100, phi.scale = 0.01)
prior6 <- prior.glm.control(phi.prior = "uniform", phi.discrete = seq(0.02, 1, 0.02), tausq.rel = 0.05)
test6.tune <- pois.krige.bayes(p50, prior = prior6, mcmc.input = mcmc6.tune)

调参之后,我们运行模拟,设置 40 万次迭代,采样间隔 200, 前 5000 次迭代结果丢弃,这一步模拟需要花费相当的一段时间。

mcmc6 <- mcmc.control(
  S.scale = 0.075, n.iter = 400000, thin = 200,
  burn.in = 5000, phi.scale = 0.12, phi.start = 0.5
)
test6 <- pois.krige.bayes(p50,
  locations = t(cbind(c(2.5, 3.5), c(-60, -37))),
  prior = prior6, mcmc.input = mcmc6
)

模型参数 \(\beta, \sigma^2, \phi\) 的后验分布

par(mfrow = c(1, 3), mar = c(0.5, 4.1, 4.1, 0.5))
hist(test6$posterior$beta$sample, xlab = "", main = expression(beta))
hist(test6$posterior$sigmasq$sample, xlab = "", main = expression(sigma^2))
hist(test6$posterior$phi$sample, xlab = "", main = expression(phi))

2.3.2 贝叶斯 MCMC II

library(fields)
library(spBayes)
library(MBA)

参数设置

n <- 50
coords <- cbind(runif(n, 0, 1), runif(n, 0, 1))
phi <- 3/0.5
sigma.sq <- 2
R <- exp(-phi * iDist(coords))
w <- MASS::mvrnorm(1, rep(0, n), sigma.sq * R)
beta.0 <- 0.1
y <- rpois(n, exp(beta.0 + w))

调整初值

pois.nonsp <- glm(y ~ 1, family = "poisson")
beta.starting <- coefficients(pois.nonsp)
beta.tuning <- t(chol(vcov(pois.nonsp)))
beta.starting
## (Intercept) 
##  -0.3011051
beta.tuning
##             (Intercept)
## (Intercept)    0.164399

模型拟合

n.batch <- 300
batch.length <- 50
n.samples <- n.batch * batch.length
pois.sp.chain.1 <- spBayes::spGLM(y ~ 1,
  family = "poisson", coords = coords,
  starting = list(
    beta = beta.starting, phi = 3 / 0.5, sigma.sq = 1,
    w = 0
  ), tuning = list(
    beta = 0.1, phi = 0.5, sigma.sq = 0.1,
    w = 0.1
  ), priors = list("beta.Flat",
    phi.Unif = c(3 / 1, 3 / 0.1),
    sigma.sq.IG = c(2, 1)
  ), amcmc = list(
    n.batch = n.batch,
    batch.length = batch.length, accept.rate = 0.43
  ), cov.model = "exponential",
  verbose = FALSE, n.report = 500
)
## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored
## -------------------------------------------------
pois.sp.chain.2 <- spBayes::spGLM(y ~ 1,
  family = "poisson", coords = coords,
  starting = list(
    beta = beta.starting, phi = 3 / 0.5, sigma.sq = 1,
    w = 0
  ), tuning = list(
    beta = 0.1, phi = 0.5, sigma.sq = 0.1,
    w = 0.1
  ), priors = list("beta.Flat",
    phi.Unif = c(3 / 1, 3 / 0.1),
    sigma.sq.IG = c(2, 1)
  ), amcmc = list(
    n.batch = n.batch,
    batch.length = batch.length, accept.rate = 0.43
  ), cov.model = "exponential",
  verbose = FALSE, n.report = 500
)
## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored
## -------------------------------------------------
pois.sp.chain.3 <- spBayes::spGLM(y ~ 1,
  family = "poisson", coords = coords,
  starting = list(
    beta = beta.starting, phi = 3 / 0.5, sigma.sq = 1,
    w = 0
  ), tuning = list(
    beta = 0.1, phi = 0.5, sigma.sq = 0.1,
    w = 0.1
  ), priors = list("beta.Flat",
    phi.Unif = c(3 / 1, 3 / 0.1),
    sigma.sq.IG = c(2, 1)
  ), amcmc = list(
    n.batch = n.batch,
    batch.length = batch.length, accept.rate = 0.43
  ), cov.model = "exponential",
  verbose = FALSE, n.report = 500
)
## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored
## -------------------------------------------------

调用 coda 包分析 MCMC 迭代的过程,如参数的 traceplot

# 3 条迭代链条
library(coda)
samps <- mcmc.list(
  pois.sp.chain.1$p.beta.theta.samples, 
  pois.sp.chain.2$p.beta.theta.samples,
  pois.sp.chain.3$p.beta.theta.samples
)
plot(samps)

gelman.diag(samps)
## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## (Intercept)       1.02       1.04
## sigma.sq          1.04       1.08
## phi               1.00       1.01
## 
## Multivariate psrf
## 
## 1.02
gelman.plot(samps)

burn.in <- 10000
print(round(summary(window(samps, start = burn.in))$quantiles[, c(3, 1, 5)], 2))
##               50%  2.5% 97.5%
## (Intercept) -0.75 -1.53 -0.17
## sigma.sq     0.96  0.36  2.51
## phi         15.78  4.35 29.27
samps <- as.matrix(window(samps, start = burn.in))
w <- cbind(pois.sp.chain.1$p.w.samples[, burn.in:n.samples],
           pois.sp.chain.2$p.w.samples[, burn.in:n.samples],
           pois.sp.chain.3$p.w.samples[, burn.in:n.samples])
beta.0.hat <- mean(samps[, "(Intercept)"])
w.hat <- apply(w, 1, mean)
y.hat <- exp(beta.0.hat + w.hat)

MBA 绘制预测曲面

par(mfrow = c(1, 2))
surf <- MBA::mba.surf(cbind(coords, y), no.X = 100, no.Y = 100, extend = TRUE)$xyz.est
fields::image.plot(surf, main = "Observed counts")
points(coords)

surf <- MBA::mba.surf(cbind(coords, y.hat), no.X = 100, no.Y = 100, extend = TRUE)$xyz.est
fields::image.plot(surf, main = "Fitted counts")
points(coords)

用后验 median 来预测

pred.coords <- as.matrix(expand.grid(
  seq(0.01, 0.99, length.out = 20),
  seq(0.01, 0.99, length.out = 20)
))
pred.covars <- as.matrix(rep(1, nrow(pred.coords)))
pois.pred <- spPredict(pois.sp.chain.1,
  start = 10000, thin = 10,
  pred.coords = pred.coords, pred.covars = pred.covars, verbose = FALSE
)
y.pred <- apply(pois.pred$p.y.predictive.samples, 1, median)

画图

par(mfrow = c(1, 2))
surf <- mba.surf(cbind(coords, y),
  no.X = 100, no.Y = 100, extend = TRUE
)$xyz.est
image.plot(surf, main = "Observed counts")
points(coords)

surf <- mba.surf(cbind(pred.coords, y.pred),
  no.X = 100, no.Y = 100, extend = TRUE
)$xyz.est
image.plot(surf, main = "Predicted counts")
points(pred.coords, pch = "x", cex = 1)
points(coords, pch = 19, cex = 1)

2.4 二项分布

2.4.1 贝叶斯 MCMC I

# 产生随机数
set.seed(2018)
sim <- grf(grid = expand.grid(x = seq(1, 7, l = 7),
                              y = seq(1, 7, l = 7)), 
           cov.pars = c(0.1, 2),
           kappa = 0.5, nugget = 0)
sim$units.m <- rep(100, 49)
sim$data <- rbinom(49, size = rep(100, 49), 
                   prob = exp(sim$data) / (1 + exp(sim$data)))

可视化模拟数据

plot(sim$coords, type = "n")
text(sim$coords[, 1], sim$coords[, 2], format(sim$data), cex = 1.5)

先验和参数设置

sim.pr <- prior.glm.control(phi.discrete = seq(0.05, 3, l = 60))
sim.mcmc <- mcmc.control(S.scale = 0.1, phi.scale = 0.04, burn.in = 10000)

模拟和预测

grid0 <- expand.grid(x = seq(1, 7, l = 51), y = seq(1, 7, l = 51))
run.sim <- binom.krige.bayes(sim, locations = grid0, 
                             prior = sim.pr, mcmc.input = sim.mcmc)

检查输出

names(run.sim)
## [1] "posterior"    "predictive"   "model"        "prior"       
## [5] "mcmc.input"   ".Random.seed" "call"
names(run.sim$posterior)
## [1] "phi"         "acc.rate"    "beta"        "sigmasq"     "simulations"
names(run.sim$posterior$phi)
## [1] "sample" "mean"   "var"
run.sim$posterior$phi$mean # mean(run.sim$posterior$phi$sample)
## [1] 1.7232
run.sim$posterior$phi$var
## [1] 0.1456925
run.sim$posterior$sigmasq$mean
## [1] 0.2385613
run.sim$posterior$sigmasq$var
## [1] 0.01187644

检测链条是否收敛 通常采用快速直观的图检验方法 用自相关系数 ACF 来看链条的平稳性, 如果平稳,ACF 表现为快速衰减的特征 如果不平稳,就不能用这组数据来估计参数,如 \(\beta_0, \beta_1, \beta_2\)

par(mfrow = c(2, 2), mar = c(2.3, 2.5, .5, .7), mgp = c(1.5, .6, 0), cex = 0.6)
plot(run.sim$posterior$phi$sample, type = "l")
acf(run.sim$posterior$phi$sample)
plot(run.sim$posterior$sim[2, ], type = "l")
acf(run.sim$posterior$sim[2, ])

可视化预测结果

names(run.sim$predictive)
## [1] "simulations" "median"      "uncertainty" "quantiles"
methods(image)
##  [1] image,ANY-method                image,CHMfactor-method         
##  [3] image,dgCMatrix-method          image,dgRMatrix-method         
##  [5] image,dgTMatrix-method          image,dsparseMatrix-method     
##  [7] image,lsparseMatrix-method      image,Matrix-method            
##  [9] image,nsparseMatrix-method      image,RMmodel-method           
## [11] image,spam-method               image,spam.chol.NgPeyton-method
## [13] image.default                   image.glm.krige.bayes*         
## [15] image.grf*                      image.krige.bayes*             
## [17] image.kriging*                  image.plot                     
## [19] image.smooth                    image.spam                     
## [21] image.SpatialGridDataFrame*     image.SpatialPixels*           
## [23] image.SpatialPixelsDataFrame*  
## see '?methods' for accessing help and source code
image(x = run.sim, locations = grid0, values.to.plot = run.sim$predictive$median) 

2.4.2 贝叶斯 MCMC II

library(spBayes)
library(fields)
library(MBA)
coords <- as.matrix(expand.grid(
  seq(0, 100, length.out = 8),
  seq(0, 100, length.out = 8)
))
n <- nrow(coords)
phi <- 3/50
sigma.sq <- 2
R <- sigma.sq * exp(-phi * as.matrix(dist(coords)))
w <- MASS::mvrnorm(1, rep(0, n), R)
x <- as.matrix(rep(1, n))
beta <- 0.1
p <- 1/(1 + exp(-(x %*% beta + w)))
weights <- rep(1, n)
weights[coords[, 1] > mean(coords[, 1])] <- 10
y <- rbinom(n, size = weights, prob = p)
fit <- glm((y/weights) ~ x - 1, weights = weights, family = "binomial")
beta.starting <- coefficients(fit)
beta.tuning <- t(chol(vcov(fit)))
n.batch <- 200
batch.length <- 50
n.samples <- n.batch * batch.length

模型拟合

m.1 <- spGLM(y ~ 1,
  family = "binomial", coords = coords, weights = weights,
  starting = list(
    beta = beta.starting, phi = 0.06, sigma.sq = 1,
    w = 0
  ), tuning = list(
    beta = beta.tuning, phi = 0.5, sigma.sq = 0.5,
    w = 0.5
  ), priors = list(beta.Normal = list(0, 10), phi.Unif = c(
    0.03,
    0.3
  ), sigma.sq.IG = c(2, 1)), amcmc = list(
    n.batch = n.batch,
    batch.length = batch.length, accept.rate = 0.43
  ), cov.model = "exponential",
  verbose = TRUE, n.report = 50
)
## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored
## ----------------------------------------
##  General model description
## ----------------------------------------
## Model fit with 64 observations.
## 
## Number of covariates 1 (including intercept if specified).
## 
## Using the exponential spatial correlation model.
## 
## Number of MCMC samples 10000.
## 
## Priors and hyperpriors:
##  beta normal:
##      mu: 0.000   
##      sd: 10.000  
## 
## 
##  sigma.sq IG hyperpriors shape=2.00000 and scale=1.00000
## 
##  phi Unif hyperpriors a=0.03000 and b=0.30000
## 
## Adaptive Metropolis with target acceptance rate: 43.0
## -------------------------------------------------
##      Sampling
## -------------------------------------------------
## Batch: 50 of 200, 25.00%
##  parameter   acceptance  tuning
##  beta[0]     54.0        0.17906
##  sigma.sq    38.0        0.43905
##  phi     70.0        0.83265
## -------------------------------------------------
## Batch: 100 of 200, 50.00%
##  parameter   acceptance  tuning
##  beta[0]     56.0        0.24170
##  sigma.sq    50.0        0.43905
##  phi     62.0        1.37280
## -------------------------------------------------
## Batch: 150 of 200, 75.00%
##  parameter   acceptance  tuning
##  beta[0]     42.0        0.28937
##  sigma.sq    40.0        0.43905
##  phi     38.0        1.92871
## -------------------------------------------------
## Sampled: 10000 of 10000, 100.00%
## -------------------------------------------------

诊断

burn.in <- 0.9 * n.samples
sub.samps <- burn.in:n.samples
print(summary(window(m.1$p.beta.theta.samples, start = burn.in)))
## 
## Iterations = 9000:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1001 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                Mean      SD Naive SE Time-series SE
## (Intercept) -0.3023 0.22325 0.007056       0.036922
## sigma.sq     0.8268 0.34595 0.010935       0.065959
## phi          0.1902 0.06928 0.002190       0.007267
## 
## 2. Quantiles for each variable:
## 
##                 2.5%     25%     50%     75%  97.5%
## (Intercept) -0.70886 -0.4599 -0.2902 -0.1536 0.1555
## sigma.sq     0.31978  0.5732  0.7945  1.0147 1.6695
## phi          0.04671  0.1390  0.1925  0.2519 0.2975

预测

beta.hat <- m.1$p.beta.theta.samples[sub.samps, "(Intercept)"]
w.hat <- m.1$p.w.samples[, sub.samps]
p.hat <- 1 / (1 + exp(-(x %*% beta.hat + w.hat)))
y.hat <- apply(p.hat, 2, function(x) {
  rbinom(n, size = weights, prob = p)
})
y.hat.mu <- apply(y.hat, 1, mean)
y.hat.var <- apply(y.hat, 1, var)

可视化预测结果

par(mfrow = c(1, 2))
surf <- mba.surf(cbind(coords, y.hat.mu),
  no.X = 100, no.Y = 100,
  extend = TRUE
)$xyz.est
image.plot(surf, main = "Estimated number of successful trials")
text(coords, label = paste("(", y, ")", sep = ""))

surf <- mba.surf(cbind(coords, y.hat.var),
  no.X = 100, no.Y = 100,
  extend = TRUE
)$xyz.est
image.plot(surf, main = paste(
  "Variance of estimated number of successful trials",
  "(observed # of trials)",
  sep = "\n"
))
text(coords, label = paste("(", weights, ")", sep = ""))

2.5 核污染浓度分布

library(geoR)
library(geoRglm)

2.5.1 探索性分析

朗格拉普岛 rongelap 数据集来自 geoRglm 包

基于似然的方法如何估计模型参数

data(rongelap)
points(rongelap, xlab = "Coord X", ylab = "Coord Y",
       cex.max = 0.8, col = gray(seq(1, 0.1, l = 100)),
       pt.divide = "equal", weights.divide = "units.m")

补充样本变差图

# 距离矩阵
dist_m <- as.matrix(dist(rongelap$coords))
max.dist <- 0.25 * max(dist_m)
bin1 <- variog(
  coords = rongelap$coords, data = rongelap$data,
  uvec = seq(0, max.dist, length = 50)
)
plot(bin1, main = "classical estimator")

image(dist_m)

距离有两类对应右上、左下两个细格子上的采样

hist(as.vector(dist_m[upper.tri(dist_m)]))

将矩阵元素值转化到 0-1 区间,去掉量纲的影响

\[x_i^{new} = \frac{x_i - \min(x)}{\max(x) - \min(x)}\]

# cloud1 <- variog(rongelap, option = "cloud", max.dist = 1)
# plot(cloud1, main = "classical estimator")
# bin1 <- variog(rongelap, uvec = seq(0, 1, l = 11))
# plot(bin1, main = "classical estimator")
rongelap_fit <- glm(data ~ 1, family = poisson(link = "log"), offset = log(units.m), data = rongelap)
rongelap_fit
## 
## Call:  glm(formula = data ~ 1, family = poisson(link = "log"), data = rongelap, 
##     offset = log(units.m))
## 
## Coefficients:
## (Intercept)  
##       2.014  
## 
## Degrees of Freedom: 156 Total (i.e. Null);  156 Residual
## Null Deviance:       61570 
## Residual Deviance: 61570     AIC: 63090

诊断图

plot(rongelap_fit)

2.5.2 马尔科夫链蒙特卡罗

如何求模型参数,参考文献 Ribeiro Jr., Christensen, and Diggle (2003)Christensen and Ribeiro Jr. (2002)

需要先调整参数使得 MCMC 算法的接受率为 60%,后验分布的模拟

ron1 <- pois.krige(rongelap,
  krige = list(
    cov.pars = c(0.2654, 151.5885),
    beta = 1.8212, nugget = 0.1337
  ),
  mcmc.input = mcmc.control(
    S.scale = 0.5, thin = 10,
    n.iter = 10000, burn.in = 2000
  )
)
hist(ron1$intensity[1,])

每个站点的预测和预测误差

rongelap_pred <- cbind.data.frame(
  mean = rowMeans(ron1$intensity),
  var = apply(ron1$intensity, 1, var)
)

2.5.3 蒙特卡罗极大似然法

参考 Christensen (2004)

选一组初值(这组初值如何来的),使用 MCML 算法, Box-Cox 变换的参数 \(\lambda = 1\) 即表示对原数据不做变换。

# sigmasq = 2.40 phi=340
# tausq = 2.075 * 2.40 = 2.075* sigmasq
MCmle.input.fixed <- glsm.mcmc(rongelap,
  model = list(
    family = "poisson", link = "boxcox", cov.pars = c(2.40, 340),
    beta = 6.2, nugget = 2.075 * 2.40, lambda = 1
  ),
  mcmc.input = mcmc.control(S.scale = 0.5, thin = 20, burn.in = 10000)
)

诊断收敛性

library(coda)
MCmle.coda <- create.mcmc.coda(MCmle.input.fixed,
  mcmc.input = list(thin = 20, burn.in = 10000)
)

默认情形下,plot(MCmle.coda) 将所有站点的预测分布画出来,篇幅所限,我们仅画一个站点的抽样分布

# 样本序列 = 1000 = 20000/20
plot(MCmle.coda[,"S[1]"])

autocorr.plot(MCmle.coda[,"S[1]"])

带块金效应

# profile maximum likelihood 
mcmcobj <- prepare.likfit.glsm(MCmle.input.fixed)
# 不做 box-cox 变换 lambda = 1
lik.boxcox.1.expon.nugget <- likfit.glsm(mcmcobj, ini.phi = 100, fix.nugget.rel = FALSE)
lik.boxcox.1.expon.nugget
## likfit.glsm: estimated model parameters:
##      beta   sigmasq       phi tausq.rel 
## "  6.188" "  2.404" "338.080" "  2.053" 
## 
##  likfit.glsm : maximised log-likelihood = 0.002419

关于参数 \(\phi,\tau^2_{rel} = \tau^2/\sigma^2\) 的二维剖面似然轮廓图

pr.lik.rongelap <- proflik.glsm(mcmcobj, lik.boxcox.1.expon.nugget,
  phi.values = seq(8, 40) * 20,
  nugget.rel.values = seq(1, 3, l = 21)
)

似然曲面轮廓图

par(cex = 0.9, mar = c(4.5, 4.5, 0.5, 0.5))
plot(pr.lik.rongelap, levels = seq(-5, 1, by = 0.05), labcex = .55)

ngx <- 126
ngy <- 75
temp.grid <- polygrid(seq(-6250, 0, l = ngx), seq(-3600, 100, l = ngy), 
                      borders = rongelap$borders, vec.inout = TRUE)
ng <- ngx * ngy
grid <- temp.grid$xypoly
grid.ind <- temp.grid$vec.inout
rm(temp.grid)
# 每个站点的预测和方差
emp.mean <- apply(MCmle.input.fixed$sim, 1, mean)
emp.var <- cov(t(MCmle.input.fixed$sim))
nug.value <- lik.boxcox.1.expon.nugget$nugget.rel * lik.boxcox.1.expon.nugget$cov.pars[1]
result <- krige.conv(
  data = emp.mean - lik.boxcox.1.expon.nugget$beta, coords = rongelap$coords,
  locations = grid, krige = krige.control(
    type.krige = "sk", beta = 0,
    cov.pars = lik.boxcox.1.expon.nugget$cov.pars,
    nugget = nug.value
  )
)
rongelap.mean <- result$predict + lik.boxcox.1.expon.nugget$beta + 1
d0 <- loccoords(coords = rongelap$coords, locations = grid)
v0 <- ifelse(d0 > 1e-10, cov.spatial(obj = d0, cov.pars = lik.boxcox.1.expon.nugget$cov.pars),
  lik.boxcox.1.expon.nugget$cov.pars[1] + nug.value
)
invcov <- varcov.spatial(
  coords = rongelap$coords, nugget = nug.value,
  cov.pars = lik.boxcox.1.expon.nugget$cov.pars, inv = TRUE
)
AA <- t(v0) %*% invcov$inverse
rongelap.var <- diag(AA %*% emp.var %*% t(AA)) + result$krige.var
par(mfrow = c(2, 1), mar = c(3, 3, 1, .75), mgp = c(2, .6, 0), cex = 0.8)
grid.mat <- rep(NA, ng)
grid.mat[grid.ind == TRUE] <- rongelap.mean
plot(cbind(c(-6200, -100), c(-3500, 50)),
  type = "n",
  xlab = "Coordinate X (m)", ylab = "Coordinate Y (m)"
)
image(result,
  locations = expand.grid(seq(-6250, 0, l = ngx), seq(-3600, 100, l = ngy)),
  values = grid.mat, col = gray(seq(1, 0, l = 30)),
  x.leg = c(-6000, -3000), y.leg = c(-700, -300),
  cex.leg = 0.7, add = TRUE, zlim = c(3, 13)
)

lines(rongelap$borders, lwd = 2.5)
grid.mat[grid.ind == TRUE] <- sqrt(rongelap.var)
plot(cbind(c(-6200, -100), c(-3500, 50)),
  type = "n",
  xlab = "Coordinate X (m)", ylab = "Coordinate Y (m)"
)
image(result,
  locations = expand.grid(seq(-6250, 0, l = ngx), seq(-3600, 100, l = ngy)),
  values = grid.mat, col = gray(seq(1, 0, l = 30)),
  x.leg = c(-6000, -3000), y.leg = c(-700, -300),
  cex.leg = 0.7, add = TRUE, zlim = c(0, 5)
)
lines(rongelap$borders, lwd = 2.5)

## the prediction uncertainties for data locations :

# sqrt(rongelap.var)[sqrt(rongelap.var) < 1]
summary(sqrt(rongelap.var)[sqrt(rongelap.var) < 1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.06271 0.10193 0.14228 0.13541 0.16481 0.19074
## the prediction uncertainties for other locations :

# sqrt(rongelap.var)[sqrt(rongelap.var) > 1]
summary(sqrt(rongelap.var)[sqrt(rongelap.var) > 1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.334   2.468   2.519   2.502   2.549   2.631
### prediction when assuming that tausq is measurement error

result2 <- krige.conv(
  data = emp.mean - lik.boxcox.1.expon.nugget$beta,
  coords = rongelap$coords, locations = grid,
  krige = krige.control(
    type.krige = "sk", beta = 0,
    cov.pars = lik.boxcox.1.expon.nugget$cov.pars,
    nugget = nug.value
  ), output = output.control(signal = TRUE)
)
rongelap.mean2 <- result2$predict + lik.boxcox.1.expon.nugget$beta + 1
v0.2 <- cov.spatial(obj = d0, cov.pars = lik.boxcox.1.expon.nugget$cov.pars)
AA.2 <- t(v0.2) %*% invcov$inverse
rongelap.var2 <- diag(AA.2 %*% emp.var %*% t(AA.2)) + result2$krige.var
par(mfrow = c(2, 1), mar = c(3, 3, 1, .75), mgp = c(2, .6, 0), cex = 0.8)
grid.mat <- rep(NA, ng)
grid.mat[grid.ind == TRUE] <- rongelap.mean2
plot(cbind(c(-6200, -100), c(-3500, 50)),
  type = "n",
  xlab = "Coordinate X (m)", ylab = "Coordinate Y (m)"
)
image(result2,
  locations = expand.grid(seq(-6250, 0, l = ngx), seq(-3600, 100, l = ngy)), 
  values = grid.mat,
  col = gray(seq(1, 0, l = 30)), x.leg = c(-6000, -3000), y.leg = c(-700, -300),
  cex.leg = 0.7, add = TRUE, zlim = c(6, 11)
)
lines(rongelap$borders, lwd = 2.5)
grid.mat[grid.ind == TRUE] <- sqrt(rongelap.var2)
plot(cbind(c(-6200, -100), c(-3500, 50)),
  type = "n",
  xlab = "Coordinate X (m)", ylab = "Coordinate Y (m)"
)
image(result2,
  locations = expand.grid(seq(-6250, 0, l = ngx), seq(-3600, 100, l = ngy)), 
  values = grid.mat,
  col = gray(seq(1, 0, l = 30)), x.leg = c(-6000, -3000), y.leg = c(-700, -300),
  cex.leg = 0.7, add = TRUE, zlim = c(0.6, 1.5)
)
lines(rongelap$borders, lwd = 2.5)

## the prediction uncertainties for data locations :

# sqrt(rongelap.var2)[sqrt(rongelap.var) < 1]
summary(sqrt(rongelap.var2)[sqrt(rongelap.var) < 1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6991  0.9944  1.0895  1.0370  1.1196  1.2143
## the prediction uncertainties for other locations :

# sqrt(rongelap.var2)[sqrt(rongelap.var) > 1]
summary(sqrt(rongelap.var2)[sqrt(rongelap.var) > 1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7154  1.0750  1.1878  1.1433  1.2511  1.4098

2.5.4 贝叶斯 MCMC I

geoRglm

2.5.5 贝叶斯 MCMC II

基于 MCMC 算法的 spBayes 和 fields 包 (Douglas Nychka et al. 2017) 计算结果

2.5.6 限制极大似然估计

基于似然函数的 spaMM 包 (Rousset and Ferdy 2014) 计算结果

# library(fields)
library(spaMM)

2.5.7 低秩近似估计

Low Rank 估计

PrevMap 包,随机过程的谱分解(奇异值分解),降秩作用,目的是对空间随机效应降维

2.5.8 拉普拉斯近似 INLA

INLA 集成嵌套拉普拉斯近似算法

2.6 疟疾流行度分布

喀麦隆及其周边地区眼线虫病感染情况

机器学习算法如何引入空间统计,随机森林分析 loaloa 数据 https://github.com/thengl/GEOSTAT

空间效应如何对应到机器学习中的特征,即空间随机过程对应到机器学习中的什么?

2.7 广义线性混合效应模型

比较三类基于似然的频率推断,特别是适用范围,考虑一般的随机效应和空间效应的实现

2.7.1 模型结构

2.7.2 参数估计

一般的广义线性混合效应模型和带空间效应的广义线性混合效应模型

  1. 简述 GLMM 模型结构
  2. 参数估计中的问题

glmmPQL 实现的理论基础

不同方法近似不同,参数的表达形式不同,在统一的符号表述下都要列出来,最后基于 R 的实现,要以表格的形式呈现出来做对比,比较各个近似的优缺点,既要讨论理论层面的优缺点、也要考虑目前状态下实现的情况、还要考虑应用此算法可能面临的问题。

近似随机效应的积分,不同的方法适用于不同的随机效应以及随机效应的维数

MASS 包的 glmmPQL 函数提供了基于惩罚拟似然 penalized quasi-likelihood 的参数估计,通过近似获得显式解

GLMMadaptive 包 adaptive Gauss-Hermite quadrature 近似计算随机效应的积分

lme4 包 glmer 函数 Laplace approximation 和 adaptive Gauss-Hermite quadrature approximation

2.7.3 数值模拟

# 空间位置
coords <- expand.grid(
  d1 = seq(0, 1, length.out = 15),
  d2 = seq(0, 1, length.out = 15)
)
# 超参数设置
sigma_sq <- 1.5^2
phi <- 2
# 协方差矩阵
D <- sigma_sq * exp(-as.matrix(dist(coords)) / phi)
# 回归系数设置
beta0 <- 0.5
beta1 <- 0.3
# 观测变量
x <- rnorm(15 * 15, 0, 1)
library(MASS)
# 多元高斯分布
eta <- mvrnorm(2, mu = beta0 + beta1 * x, Sigma = D)
# 响应变量
y <- rpois(15 * 15, lambda = exp(eta))
# 样本矩阵
dat <- cbind.data.frame(coords, x, y, dummy = 1:225)
# PQL 估计 SGLMM 模型参数
library(nlme)
fit <- glmmPQL(
  fixed = y ~ 1 + x, random = ~ 1 | dummy,
  family = poisson("log"), data = dat,
  correlation = corExp(1.5, form = ~ d1 + d2, 
                       nugget = TRUE, fixed = FALSE)
)
# 初值估计,不包含空间随机效应
fit0 <- glm(y ~ 1 + x, family = poisson("log"), data = dat)
# 样本变差图

非高斯的随机效应

https://niasra.uow.edu.au/content/groups/public/@web/@inf/@math/documents/mm/uow236296.pdf

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#spatial-and-temporal-correlation-models-heteroscedasticity-r-side-models

2.8 软件信息

本章代码的运行环境,复现环境

sessionInfo()
## R Under development (unstable) (2019-10-05 r77257)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8.1 x64 (build 9600)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] spaMM_3.0.0     MBA_0.0-9       spBayes_0.4-2   Matrix_1.2-17  
##  [5] Formula_1.2-3   magic_1.5-9     abind_1.4-5     fields_9.9     
##  [9] maps_3.3.0      spam_2.3-0      dotCall64_1.0-0 coda_0.19-3    
## [13] geoRglm_0.9-11  gdtools_0.2.1   geoR_1.7-5.2.1 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2              compiler_4.0.0         
##  [3] tools_4.0.0             digest_0.6.21          
##  [5] nlme_3.1-141            evaluate_0.14          
##  [7] lattice_0.20-38         rlang_0.4.0            
##  [9] parallel_4.0.0          yaml_2.2.0             
## [11] xfun_0.10               stringr_1.4.0          
## [13] knitr_1.25              systemfonts_0.1.1      
## [15] RandomFieldsUtils_0.5.3 svglite_1.2.2          
## [17] pbapply_1.4-2           tcltk_4.0.0            
## [19] rmarkdown_1.16          bookdown_0.14          
## [21] sp_1.3-1                magrittr_1.5           
## [23] codetools_0.2-16        splancs_2.01-40        
## [25] htmltools_0.4.0         RandomFields_3.3.6     
## [27] MASS_7.3-51.4           stringi_1.4.3          
## [29] proxy_0.4-23

参考文献

Banerjee, Sudipto, Bradley P. Carlin, and Alan E Gelfand. 2015. Hierarchical Modeling and Analysis for Spatial Data. 2ed ed. Statistics & Applied Probability. Chapman & Hall/CRC.

Bonat, Wagner Hugo, and Paulo Justiniano Ribeiro. 2016. “Practical Likelihood Analysis for Spatial Generalized Linear Mixed Models.” Environmetrics 27 (2): 83–89. https://onlinelibrary.wiley.com/doi/abs/10.1002/env.2375.

Breslow, N. E., and D. G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88 (421): 9–25.

Christensen, O. F., and P. J. Ribeiro Jr. 2002. “geoRglm: A Package for Generalised Linear Spatial Models.” R News 2 (2): 26–28. https://CRAN.R-project.org/doc/Rnews/Rnews_2002-2.pdf.

Christensen, Ole F. 2004. “Monte Carlo Maximum Likelihood in Model-Based Geostatistics.” Journal of Computational and Graphical Statistics 13 (3): 702–18.

Diggle, Peter J., and Emanuele Giorgi. 2016. “Model-Based Geostatistics for Prevalence Mapping in Low-Resource Settings.” Journal of the American Statistical Association 111 (515): 1096–1120. https://doi.org/10.1080/01621459.2015.1123158.

Diggle, Peter, Rana Moyeed, Barry Rowlingson, and Madeleine Thomson. 2002. “Childhood Malaria in the Gambia: A Case-Study in Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 51 (4): 493–506. https://doi.org/10.1111/1467-9876.00283.

Diggle, P. J., J. A. Tawn, and R. A. Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3): 299–350. https://doi.org/10.1111/1467-9876.00113.

Douglas Nychka, Reinhard Furrer, John Paige, and Stephan Sain. 2017. “fields: Tools for Spatial Data.” Boulder, CO, USA: University Corporation for Atmospheric Research. https://doi.org/10.5065/D6W957CT.

Geyer, Charles J. 1994. “On the Convergence of Monte Carlo Maximum Likelihood Calculations.” Journal of the Royal Statistical Society, Series B 56 (1): 261–74.

Gräler, Benedikt, Edzer Pebesma, and Gerard Heuvelink. 2016. “Spatio-Temporal Interpolation Using gstat.” The R Journal 8 (1): 204–18. https://journal.r-project.org/archive/2016/RJ-2016-014/index.html.

Hastie, Trevor. 1996. “Pseudosplines.” Journal of the Royal Statistical Society, Series B 58 (2): 379–96.

Hosseini, Fatemeh. 2016. “A New Algorithm for Estimating the Parameters of the Spatial Generalized Linear Mixed Models.” Environmental and Ecological Statistics 23 (2): 205–17. https://doi.org/10.1007/s10651-015-0335-6.

Liang, Faming, Yichen Cheng, Qifan Song, Jincheol Park, and Ping Yang. 2013. “A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data.” Journal of the American Statistical Association 108 (501): 325–39. https://doi.org/10.1080/01621459.2012.746061.

Lindgren, Finn, and Håvard Rue. 2015. “Bayesian Spatial Modelling with R-INLA.” Journal of Statistical Software 63 (19): 1–25.

Marta Blangiardo, Michela Cameletti. 2015. Spatial and Spatio-Temporal Bayesian Models with R-Inla. Wiley.

Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. “coda: Convergence Diagnosis and Output Analysis for Mcmc.” R News 6 (1): 7–11. https://www.r-project.org/doc/Rnews/Rnews_2006-1.pdf.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. The MIT Press. http://www.gaussianprocess.org/gpml/.

Ribeiro Jr., Paulo J., Ole F. Christensen, and Peter J. Diggle. 2003. “geoR and geoRglm: Software for Model-Based Geostatistics.” Proceedings of DSC. https://www.r-project.org/conferences/DSC-2003/Proceedings/RibeiroEtAl.pdf.

Rousset, François, and Jean-Baptiste Ferdy. 2014. “Testing Environmental and Genetic Effects in the Presence of Spatial Autocorrelation.” Ecography 37 (8): 781–90. http://dx.doi.org/10.1111/ecog.00566.

Rue, Håvard, Sara Martino, Nicolas Chopin, and et al. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92.

Rue, Håvard, Andrea I. Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren. 2017. “Bayesian Computing with INLA: A Review.” Annual Reviews of Statistics and Its Applications 4 (1): 395–421. https://arxiv.org/abs/1604.00860.

Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer. http://lmdvr.r-forge.r-project.org.

Schlather, Martin, Alexander Malinowski, Peter J. Menck, Marco Oesting, and Kirstin Strokorb. 2015. “Analysis, Simulation and Prediction of Multivariate Random Fields with Package RandomFields.” Journal of Statistical Software 63 (8): 1–25. https://www.jstatsoft.org/v63/i08/.

Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Second. New York, NY: Springer-Verlag. https://doi.org/10.1007/978-3-319-24277-4.

Zhang, Hao. 2002. “On Estimation and Prediction for Spatial Generalized Linear Mixed Models.” Biometrics 58 (1): 129–36.