Part 2 SVCs with GGP-GAMs

The key ideas underpinning the this section are:

  1. Standard linear regression assumes that predictor-to-response relationships to be the same throughout the study area.
  2. This is not the case for many geographic process
  3. Many geographic processes have a Gaussian form when they are examined over 2-dimensional space, as they essentially exhibit distance decay.
  4. The splines introduced in Part 1 can be of different forms. One such form is a Gaussian Process (GP) spline.
  5. GP splines can be specified to model non-linearity (wiggliness) over geographic space if location is included with the covariate.
  6. A GAM with GP splines parameterised by location - a Geographic GP GAM or GGP-GAM - defines a spatially varying coefficient (SVC) model

The first section of this document (Introduction: SVCs) provides a brief introduction to SVCs which seek to quantify how and where statistical relationships vary spatially. The second section (GAMs with GP splines and the GGP-GAM) provides a formal description of GAMs and their extension to SVCs. This has equations and can be skimmed or skipped! The third section (Creating GGP-GAM SVCs) contains the meat of this session, where the Brexit case study is extended an SVC using a Geographical Gaussian Process GAM (GGP-GAM). It describes how to create GAMs with Geographic spline, how to tune the model and how to map and interpret the GGP-GAM SVCs.

Click HERE if you want to jump straight to the coding!

2.1 Introduction: SVCs

As was shown in Part 1, a standard linear regression seeks to model the relationship between a response variable and a series of predictor variables. It estimates a single set of unknown regression coefficients for each predictor (and intercept). It assumes that the response variable has a normal or Gaussian distribution and that the errors assumed independent and identically distributed, It is commonly solved via an ordinary least squares (OLS) approach.

For non-Gaussian response variables (such as integer counts), the OLS regression can be replaced with a Generalized Linear Model (GLM) (Nelder and Wedderburn 1972, McCullagh and Nelder 2019) whose coefficients are typically estimated via maximum likelihood (ML). For greater flexibility in capturing non-linear relationships between the response and predictors, a GLM can be extended by a Generalized Additive Model (GAM) (Hastie 2017) where each regression relationship can be modelled non-linearly,as an arbitrary function, say through splines or some other smoothing tool.

OLS regression and classic forms of GLM assume fixed (constant) coefficient processes. However, in reality, the relationship between the response and predictor variables in a geographical context may change with observation location. There may be a number of reasons for this, such as unaccounted-for local factors or bad measurements (remember that unlike physics, for social geography, urban studies and other related disciplines, there are no universal laws that govern behaviours, preferences, attitudes and sentiment!). Or this could be due to spatial heterogeneity in the process being examined. In this context, Spatially Varying Coefficient (SVC) regression models provide an alternative to ‘whole map’ or global regressions, which may incorrectly assume spatial stationarity in regression coefficients (Openshaw 1996) - i.e. that the relationships between the target variable and the response variables are the same everywhere.

For completeness, the next section provides a full theoretical description of the Geographical Gaussian Process GAM (GGP-GAM) and can be skimmed or skipped and returned to later!

2.2 GAMs with GP splines and the GGP-GAM

GAMs are able to handle many kinds of responses (Wood 2017, Fahrmeir et al. 2021). They generate multiple model terms which are added together and provide an intuitive approach to fit relatively complex relationships in data with complex interactions and non-linearities (Wood 2017). The outputs of GAMs provide readily understood summaries of the relationship between predictor and response variables and how the outcome is modelled. They are able to predict well from complex systems, quantify relationships, and make inferences about these relationships, such as what variables are important and at what range of values they are most influential (Hastie and Tibshirani 1990, Wood 2017). GAMs can perform as well as or better than most machine learning models and they are relatively fast (Friedman 2001, Wood 2017). Importantly, in the context of SVC modelling, GAMs combine predictive power, model accuracy and model transparency and generate “intrinsically understandable white-box machine learning models that provide a technically equivalent, but ethically more acceptable alternative to [machine learning] black-box models(Zschech et al. 2022, p. p2).

Splines are typically used to model non-linear relationships in GAMs, and they can be of different forms depending on the problem and data (Hastie and Tibshirani 1990) . Splines are represented as a linear combination of basis functions, and each basis function is assigned a coefficient that is linearly combined to generate predictions of the outcome variable. Basis functions can be either single or multi-dimensional in terms of predictor variables. As a result, a GAM consists of linear sums of multi-dimensional basis functions, allowing for the modelling of complex relationships. The appropriate degree of “wiggliness” in the spline is determined by the smoothing parameter, which balances over-fitting versus capturing the complexity in the relationship.

In recent work, Fan and Huang (2022) proposed an SVC model via a GLM with reduced-rank thin-plate splines. An alternative approach to SVC modelling is to consider the predictor-to-response relationship over space as a GP. A GP is a random process over functions, and its terms can be modelled using GP splines within a GAM. In this work, low rank GP splines parameterised with location were used as GPs are effective in specifying autocorrelation in spatially-varying random functions (Williams and Rasmussen 2006), and GP-based smoothing using observations at specific locations can identify any spatial trends in the data.

GAMs are able to calibrate regression models where functions of the predictor variables are unknown and of the form:

\[\begin{equation} y = \alpha + f_1(z_1) + f_2(z_2) + \cdots + f_m(z_m) + \epsilon \label{eqn3} \end{equation}\]

where the \(j\)-th predictor \(z_j\) may be a scalar or a vector.

These can be extended such that each \(f_j(z_j)\) is a linear regression coefficient on another scalar predictor \(x_j\):

\[\begin{equation} y = \alpha+f_{0}(z_0) + x_1f_1(z_1) + x_2f_2(z_2) + \cdots + x_mf_m(z_m) + \epsilon \label{eqn4} \end{equation}\]

And, if \(z_0 = z_1 = \cdots z_m = z\) for example, and \(z\) is a vector of spatial coordinates then this specifies a SVC model:

\[\begin{equation} y = \alpha +f_{0}(z) + x_1f_1(z) + x_2f_2(z) + \cdots + x_mf_m(z) + \epsilon \label{eqn5} \end{equation}\] where \(f_{j}(z)\) represents the \(j\)-th SVC.

One way of specifying \(\alpha(z) \cdots f_m(z)\) is that each function is generated from a GP spline with a zero mean defined as follows:

\[\begin{equation} f_{j}(z) = \sum_{p=1}^{P} \kappa_p(z)\gamma_{j,p} \qquad \gamma_{j,p} \sim N(0,s^{2}_{j}) \label{eqn6} \end{equation}\]

where \(\kappa_p(z)\) is a basis function, \(\gamma_{j,p}\) is the corresponding coefficient and \(s^{2}_{j}\) is called the smoothing parameter. The SVC \(f_j(z)\) has a large spatial variation if \(s^{2}_{j}\) is large, while constant if \(s^{2}_{j}\) equals zero. Thus, the smoothing parameter determines “wiggliness” of the SVC.

The basis function \(\kappa_p(z)\) is specified so that \(f_j(z)\) has a spatially smooth map pattern characterized by the following covariance function:

Note that GPs also have a covariance function:

\[\begin{equation} \text{Cov}(f_m(z),f_j(z') = s^{2}_{j}c(d_{z,z'}) \label{eqn7} \end{equation}\]

where \(d_{z,z'}\) is the distance between locations \(z\) and \(z'\). The covariance function \(c(d_{z,z'})\) decreases with \(d_{z,z'}\) increases to model the smooth pattern. This is similar to kriging based models, that also use covariance functions (via semivariograms). It is also similar to MGWR: a covariance function for each \(f_j(z)\) is individually calibrated to optimise model fit, similar to the determination of parameter specific bandwidths. Consequently a core GAM task, in the context of SVCs, is to estimate the smoothness parameters \(s^{2}_{j}\) and so estimate \(f_j(z)\).

A GAM uses smooth functions of the predictor variables in which the values of the explained variable \(y\) are assumed to have an exponential family distribution:

\[\begin{equation} f(y|\theta) = h(y)\exp\left(\nu(\theta)T(x)-A(\theta)\right) \label{eqn8} \end{equation}\]

where \(h(.)\), \(\nu(.)\), \(T(.)\) and \(A(.)\) are known functions, and \(\theta\) is a vector of parameters. This form is flexible and can accommodate a wide range of commonly used distributions such as Gaussian, Poisson, Gamma, or Binomial with a known number of trials.

As explained, GAMs can be used to estimate SVCs if the non-linear relationships are constructed as spatial non-stationary ones. This can be done through GP splines over geographical rather than attribute space: a GP spline with location is specified for each predictor variable \(j\), and because the GP has a mean zero, a further constant \(\alpha_j\) is added as an offset, creating the SVCs \(\beta_j(z)\). In the standard 2D case, \(z_i=(u_i,v_i)\). This is the Geographical Gaussian Process GAM (GGP-GAM).

2.3 Creating GGP-GAM SVCs

If you are in a new R/RStudio session then the first thing to do is to re-load the packages:

# load the packages
library(parlitools) # for UK voting data 
library(tidyverse)  # for data manipulation and plotting
library(sf)         # for spatial data
library(mgcv)       # for GAMs
library(foreach)    # for a repeated operations (used in a single line of code!)
library(cols4all)   # for nice shading in graphs and maps
library(cowplot)    # for managing plots

2.3.1 A simple GGP-SVC

We will load the Brexit data later, but to illustrate GGP-GAMs we will work with a smaller house price dataset for London with fewer variables. These data are projected on OSGB coordinates (ESPSG 27700). The code below downloads two .rda files to your working directory and then loads them into your R session:. These contain an areas of the London boroughs and 316 records of house sales data (from ages ago!). These are both in sp format which has been deprecated.

# download the data files 
download.file("https://github.com/cran/GWmodel/blob/master/data/LondonHP.rda?raw=True",
              "./LondonHP.rda", mode = "wb")
download.file("https://github.com/cran/GWmodel/blob/master/data/LondonBorough.rda?raw=True",
              "./LondonBorough.rda", mode = "wb")
load("LondonHP.rda")
load("LondonBorough.rda")
# check to see what has been loaded
ls()
## [1] "londonborough" "londonhp"

Recall that the aim is to calibrate an SVC model where the spatial coefficient functions are assumed to be realisation of a Gaussian Process (GP) - such as those underlying the models based on Kriging. The models have semivariogram functions - and the semivariogram for each function is individually calibrated to optimise model fit.

The code below creates a new data table (in tibble format) called house with five columns, the price, floor area and easting and northing of each house in the data set, and an intercept column (int) - it is easier to treat the intercept as an addressable term in the model later on, as you will see. It also converts the house price to 1000s of pounds and the coordinates from metres to kilometres.

house <-londonhp %>% 
  st_as_sf() %>% 
  select(price=PURCHASE,floor=FLOORSZ) %>% 
  mutate(price=price/1000) 
## Loading required package: sp
house <- house %>% 
  st_drop_geometry() %>% 
  bind_cols(house %>% st_coordinates() %>% as_tibble()) %>%
  mutate(X=X/1000,Y=Y/1000,int=1)
head(house)
##    price floor     X     Y int
## 0 157.00    77 533.2 159.4   1
## 1 113.50    75 533.3 159.7   1
## 2  81.75    64 532.0 159.8   1
## 3 150.00    95 531.9 160.1   1
## 4 190.00   107 532.8 160.3   1
## 5 159.95   100 535.7 161.7   1

From this, the gam function in mgcv allows a GP varying coefficient model to be fitted.

ggp.gam.1 <- gam(price~ 0 + 
                   int + s(X,Y,bs='gp',by=int) + 
                   floor + s(X,Y,bs='gp',by=floor),
                 data=house)

Notice the 0 + in the model. This indicates that the intercept coefficient is not included implicitly and it is included explicitly as int. Also notice the different form of the splines from those specified in Part 1. Here, for each covariate a GP spline is specified for X and Y (the coordinates in geographic space) and the covariate is passed to the by parameter to generate vary coefficients. The model has 2 key terms:

  • int + (X,Y,bs='gp',by=int) - this is the spatially varying intercept term - the bs='gp' part states that this is a GP model, and by=int suggests the GP should be multiplied by one. The GPs modelled in the gam function all have a mean of zero, so the int (intercept) term indicates that an extra offset is added to the function.
  • floor + s(X,Y,bs='gp',by=floor) - as before this is a GP model, but the by=floor argument suggests that this GP function is multiplied by the floor variable. Again the GP has a mean zero so a fixed term floor is added as an offset here. Together these make a spatially varying coefficient for floor.

The model output can be assessed: the k' and edf parameters are quite close, but the p-values are high and and the k-index is greater than 1 so this looks OK. The diagnostic plots are again generated by the gam.check function as in 2.1:

# check 
gam.check(ggp.gam.1)
The GAM diagnostic plots.

Figure 2.1: The GAM diagnostic plots.

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 0.006960545 .
## The Hessian was positive definite.
## Model rank =  66 / 67 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                k'  edf k-index p-value
## s(X,Y):int   32.0 24.7    1.08    0.94
## s(X,Y):floor 33.0 31.6    1.08    0.94
# model summary
summary(ggp.gam.1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## price ~ 0 + int + s(X, Y, bs = "gp", by = int) + floor + s(X, 
##     Y, bs = "gp", by = floor)
## 
## Parametric coefficients:
##       Estimate Std. Error t value Pr(>|t|)   
## int     18.996      5.966   3.184  0.00163 **
## floor  -18.239     51.423  -0.355  0.72311   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df     F  p-value    
## s(X,Y):int   24.7  27.49 2.278 0.000538 ***
## s(X,Y):floor 31.6  32.17 9.527  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 66/67
## R-sq.(adj) =  0.825   Deviance explained = 97.1%
## GCV = 1216.1  Scale est. = 993.64    n = 316

Here it can be seen that both s(X,Y):int and S(X,Y):floor are significant, suggesting that both terms are spatially varying. Recall that these are referred to as the smooth terms.

We can extract these using predict. For example, to obtain the geographically varying intercept, we create a dummy data set with zero value for the floor area, and the int term set to 1. In the model this sets the regression terms involving the floor variable to zero, so the predicted value is just \(\beta_0\) at each of the house locations.

get_b0 <- house %>% mutate(floor=0,int=1)
house <- house %>% mutate(b0=predict(ggp.gam.1,newdata=get_b0))

Now, if floor is set to 1 instead of 0 in a dummy set, and int is set to zero, the predicted values will be \(\beta_1\).

get_b1 <- house %>% mutate(floor=1,int=0)
house <- house %>% mutate(b1=predict(ggp.gam.1,newdata=get_b1))

So house has two new columns, for b0 and b1. Standard ggplot approaches can be used to map the spatial distribution of the intercept (b0 in the data, formally \(\beta_0\)) and the floor area covariate (b1 in the data, formally \(\beta_1\)) as in Figure 2.2:

tit =expression(paste(""*beta[0]*""))
p1 <- 
  ggplot(data = house, aes(x=X,y=Y,col=b0)) + 
  geom_point() + 
  scale_colour_continuous_c4a_div(palette="brewer.spectral",name=tit) + 
  coord_equal()

tit =expression(paste(""*beta[1]*" "))
p2 <- 
  ggplot(house,aes(x=X,y=Y,col=b1)) + 
  geom_point() + 
  scale_colour_continuous_c4a_div(palette="brewer.prgn",name=tit) + 
  coord_equal()
plot_grid(p1, p2, ncol = 1)
The Intercept and floor area coefficient estimates.

Figure 2.2: The Intercept and floor area coefficient estimates.

Note that for models with more variables, the same technique as used with floor in model specification and varying coefficient extraction may be used.

2.3.2 Further Analysis

The point observations in the house data in this case have limited geographic coverage. Other sf features may be used to illustrate the pattern of geographical variation in a clearer way. The aim is to show estimates of both \(\beta_0\) and \(\beta_1\) using hexagonal grids, and also to show their standard errors. Note that the predict function computes standard errors as well as actual predictions, if the parameter se is TRUE.

Firstly a hexagonal grid is created as an sf object over the area of the londonborough object (in sf format), together with attributes X and Y giving the centroids of each hexagon in OSGB projection (EPSG 27700). These will be used to compute the regression coefficients.

# create the hex grid
hgrid <- st_make_grid(londonborough %>% st_as_sf(),
                      square=FALSE,n=50) %>% 
  st_sf() %>% mutate(id=glue::glue('Hex{sprintf("%03i",row_number())}')) %>%
  st_join(londonborough %>% st_as_sf()) %>% filter(!is.na(B_Name))
# extract and bind the hex grid centriods 
hgrid <- hgrid %>% st_centroid() %>% st_coordinates() %>% 
  {./1000} %>% as_tibble() %>% bind_cols(hgrid,.) %>% 
  # select the attributes of interest
  select(X,Y,id) 
st_crs(hgrid) = 27700

The hgrid object can be used as a framework to generate spatially varying estimates (predictions) of the coefficients using the ggp.gam.1. Notice how each covariate is in turn set to 1 while the others are set to zero to create the SVCs. Essentially this uses the X and Y variables in hgrid to estimate or interpolate each SVC in turn across the geographic space defined by X and Y.

hgrid <- hgrid %>% 
  mutate(b0=predict(ggp.gam.1,
                    newdata=mutate(hgrid,int=1,floor=0)),
         se0=predict(ggp.gam.1,se=TRUE,
                    newdata=mutate(hgrid,int=1,floor=0))$se.fit,
         b1=predict(ggp.gam.1,
                    newdata=mutate(hgrid,int=0,floor=1)),
         se1=predict(ggp.gam.1,se=TRUE,
                  newdata=mutate(hgrid,int=0,floor=1))$se.fit)
head(hgrid)
## Simple feature collection with 6 features and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 502400.4 ymin: 174391.9 xmax: 504736 ymax: 192933
## Projected CRS: OSGB36 / British National Grid
##           X        Y     id                       geometry         b0      se0
## 10 502.9843 175.0661 Hex010 POLYGON ((502984.3 174391.9... -11.923217 184.5767
## 34 503.5682 176.0774 Hex034 POLYGON ((503568.2 175403.2... -12.594749 158.1122
## 41 503.5682 190.2361 Hex041 POLYGON ((503568.2 189561.9...  24.085098 133.3468
## 42 503.5682 192.2588 Hex042 POLYGON ((503568.2 191584.5...  31.325245 161.6515
## 56 504.1521 175.0661 Hex056 POLYGON ((504152.1 174391.9...  -6.817327 169.4032
## 57 504.1521 177.0888 Hex057 POLYGON ((504152.1 176414.6... -13.027121 131.8759
##            b1      se1
## 10 0.05044958 2.959464
## 34 0.46901128 2.466322
## 41 6.12422900 2.268488
## 42 7.86987200 2.762351
## 56 0.24051129 2.586997
## 57 0.80423612 2.002104

Next, the four plots are created as in Figure 2.3:

# Take a colour palette from 'cols4all' 
# 'earth' is a good one as well
pal <-  'classic_orange_white_blue' 
plots <- foreach (v=c('b0','b1','se0','se1')) %do%
  {ggplot(hgrid,aes_string(fill=v,col=v)) + geom_sf() +
      scale_fill_continuous_c4a_div(palette = pal) +
      scale_colour_continuous_c4a_div(palette = pal) +
      theme_bw() +
      theme(axis.title=element_blank(),
            axis.text=element_blank(),
            axis.ticks=element_blank())
      }

plot_grid(plotlist=plots,nrow=2,ncol=2)
The spatially distributed coefficient estimates interpolated over a hexagonal grid, with the standard errors.

Figure 2.3: The spatially distributed coefficient estimates interpolated over a hexagonal grid, with the standard errors.

2.3.3 Back to Brexit

The preceding subsections working with the London house price data highlighted a couple of important considerations when using GGP-GAMs for SVC modelling:

  1. An Intercept attribute needs to be set equal to 1 in the input data, and then explicitly specified in the model in the same was as any other covariate. The model formal syntax should accommodate this to ensure that the model does not fit an intercept as follows
gam(target_variable ~ 0 +
      intercept + s(X,Y, bs='gp', by=intercept) + 
      ...
  1. The input data needs some coordinates which are the covariates in each smooth. These are then multiplied by the variable specified by the by parameter in each smooth s. It is sensible to convert these to projected coordinates rather than degrees and to then think about the units. Here kilometres were specified rather than metres.
  2. Notice how the smooths are specified, with the locational variables as the smooth covariates (first arguments) and the variable as the by parameter:
gam(target_variable ~ 0 +
      intercept + s(X,Y, bs='gp', by=intercept) + 
      x1 + s(X,Y, bs='gp', by=x1) +
      ...
      xp + s(X,Y, bs='gp', by=xp), 
    data = input_data)
  1. The predict function is used to generate the SVCs, from data which has the variable for the SVC being calculated set to 1 and all other model inputs except the locational variables (here X and Y) set to 0:
get_b1 <- house %>% mutate(floor=1,int=0)
house <- house %>% mutate(b1 = predict(ggp.gam.1, newdata = get_b1))
  1. Standard errors (SEs) for the SVCs can be generated from the predict function as well, if the se parameter is set to TRUE:
house <- house %>% 
  mutate(b1_se = predict(ggp.gam.1, se=TRUE, newdata = get_b1)$se.fit)
  1. If the smooths / splines are specified with location, then the predict function can be used to interpolate. In the London house price example this extended the point observations to a hexagonal surface, that had similarly named locational attributes .
hgrid <- 
  hgrid %>% mutate(b1 = predict(ggp.gam.1, newdata=mutate(hgrid,floor=1,int=0)))

All of these consideration can be used to inform an analysis of the Brexit data. The code below loads the Brexit data from the earlier session, transforms the data to OSGB projection (EPSG 27700), creates the X and Y variables and then defines an initial model:

# load the data if needed
hex.sp <- st_read("hex.sp.gpkg", quiet = T)
# create the X and Y variables from transformed data 
coords = st_coordinates(st_centroid(st_transform(hex.sp, 27700)))
## Warning in st_centroid.sf(st_transform(hex.sp, 27700)): st_centroid assumes
## attributes are constant over geometries of x
# head(coords)
hex.sp <- hex.sp %>%
  mutate(Intercept = 1, 
         X = coords[,1], 
         Y = coords[,2])
# create an initial model
ggp.gam.2 = gam(leave~ 0 + 
              Intercept + s(X,Y,bs='gp', by=Intercept) + 
              to15 + s(X,Y,bs='gp', by=to15) +
              over65 + s(X,Y,bs='gp', by=over65) +
              ttu + s(X,Y,bs='gp', by=ttu) +
              lhosp + s(X,Y,bs='gp', by=lhosp) +
              manu + s(X,Y,bs='gp', by=manu) +
              degree + s(X,Y,bs='gp', by=degree) +
              badhealth + s(X,Y,bs='gp', by=badhealth) + 
              bornuk + s(X,Y,bs='gp', by=bornuk), 
            data = hex.sp)            

It is possible to examine the model using the gam.check function, to determine whether it needs to be tuned with a different k parameter.

gam.check(ggp.gam.2)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 8 iterations by steepest
## descent step failure.
## The RMS GCV score gradient at convergence was 0.0660714 .
## The Hessian was not positive definite.
## Model rank =  44 / 305 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                     k'   edf k-index p-value    
## s(X,Y):Intercept 32.00  2.00    0.63  <2e-16 ***
## s(X,Y):to15      33.00  4.47    0.63  <2e-16 ***
## s(X,Y):over65    33.00  3.29    0.63  <2e-16 ***
## s(X,Y):ttu       33.00  3.42    0.63  <2e-16 ***
## s(X,Y):lhosp     33.00  3.02    0.63  <2e-16 ***
## s(X,Y):manu      33.00  3.18    0.63  <2e-16 ***
## s(X,Y):degree    33.00  4.00    0.63  <2e-16 ***
## s(X,Y):badhealth 33.00  3.03    0.63  <2e-16 ***
## s(X,Y):bornuk    33.00  5.21    0.63  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, although the k-index and p-values are low, the k' are not approaching the effective degrees of freedom (edf). This is a reasonable fitted model. You could experiment with different values of k as in the example below):

## not run ##
k.val = 65 # try up around 65
gam.m = gam(leave~ 0 + 
                Intercept + s(X,Y,bs='gp', k = k.val, by=Intercept) + 
                to15 + s(X,Y,bs='gp', k = k.val, by=to15) +
                over65 + s(X,Y,bs='gp', k = k.val, by=over65) +
                ttu + s(X,Y,bs='gp', k = k.val, by=ttu) +
                lhosp + s(X,Y,bs='gp', k = k.val, by=lhosp) +
                manu + s(X,Y,bs='gp', k = k.val, by=manu) +
                degree + s(X,Y,bs='gp', k = k.val, by=degree) +
                badhealth + s(X,Y,bs='gp', k = k.val, by=badhealth) + 
                bornuk + s(X,Y,bs='gp', k = k.val, by=bornuk), 
              data = hex.sp)  
gam.check(gam.m)
summary(gam.m)
## end not run ## 

You can examine the original GGP-GAM model:

summary(ggp.gam.2)

The parametric coefficients and summaries of non-parametric smooths / splines are returned. However we need to do a bit of work to extract the SVCs. This is done below.

The spatially varying coefficients and standard errors can be generated using the function below. This takes either the original data or an interpolation grid (dataset) containing similarly named locations.

# Function for the SVC generation
ggp_gam_svc = function(model, terms, input_data) {
    n_t = length(terms)
    input_data_copy = input_data
    output_data = input_data
    for (i in 1:n_t) {
        zeros = rep(0, n_t)
        zeros[i] = 1
        terms_df = data.frame(matrix(rep(zeros, nrow(input_data)), ncol = n_t, byrow = T))
        names(terms_df) = terms
        input_data_copy[, terms] = terms_df
      betas.i = predict(model, se = TRUE, newdata = input_data_copy)
        b.i = betas.i$fit
        se.i = betas.i$se.fit
        expr1 = paste0("b_", terms[i], "= b.i")
        expr2 = paste0("se_",terms[i], "= se.i")
        output_data = output_data %>% 
            mutate(within(., !!parse(text = expr1))) %>% 
            mutate(within(., !!parse(text = expr2))) 
    }
    output_data$yhat = predict(model, newdata = input_data)
    output_data
}
# the Brexit model
terms = c("Intercept", "to15", "over65", "ttu", "lhosp", "manu", "degree", "badhealth", "bornuk")
hex.svc = ggp_gam_svc(ggp.gam.2, terms, hex.sp)
hex.svc %>% st_drop_geometry() %>% select(starts_with("b_")) %>% summary()
##   b_Intercept            b_to15            b_over65           b_ttu       
##  Min.   :-68.91894   Min.   :-0.66906   Min.   :0.02924   Min.   :0.3186  
##  1st Qu.:-19.93075   1st Qu.:-0.06459   1st Qu.:0.26456   1st Qu.:0.5546  
##  Median :  2.11680   Median : 0.16007   Median :0.35597   Median :0.6525  
##  Mean   :  0.02938   Mean   : 0.16575   Mean   :0.35929   Mean   :0.6418  
##  3rd Qu.: 21.59564   3rd Qu.: 0.40897   3rd Qu.:0.44276   3rd Qu.:0.7282  
##  Max.   : 54.09961   Max.   : 0.90808   Max.   :0.79593   Max.   :1.0531  
##     b_lhosp            b_manu            b_degree        b_badhealth      
##  Min.   :-2.2758   Min.   :-1.04164   Min.   :-1.5642   Min.   :-1.49286  
##  1st Qu.:-0.7892   1st Qu.:-0.29584   1st Qu.:-0.7378   1st Qu.:-0.40824  
##  Median :-0.1053   Median : 0.11466   Median :-0.4468   Median :-0.10344  
##  Mean   :-0.1498   Mean   : 0.09383   Mean   :-0.4750   Mean   :-0.07691  
##  3rd Qu.: 0.4867   3rd Qu.: 0.48447   3rd Qu.:-0.2056   3rd Qu.: 0.28511  
##  Max.   : 1.9637   Max.   : 1.12757   Max.   : 0.6330   Max.   : 1.08972  
##     b_bornuk       
##  Min.   :-0.22077  
##  1st Qu.: 0.07772  
##  Median : 0.25865  
##  Mean   : 0.28928  
##  3rd Qu.: 0.50015  
##  Max.   : 0.88019

We can examine the spatial distribution of the SVCs as in Table 2.1 and see more variation in some covariates than others. Note that many (most) flip signs, showing a positive relationship to the the Brexit vote in some places and a negative relationship in others.

#gam.check(gam.m)
svc.tab <- hex.svc %>% st_drop_geometry() %>% select(starts_with("b_"))
svc.tab = round(t(apply(svc.tab,2,summary)),3)

knitr::kable(svc.tab, booktabs = T, digits = 3, row.names = T, linesep = "",
      caption = "The distributions of the GGP-GAM spatially varying coefficient estimates.")
Table 2.1: The distributions of the GGP-GAM spatially varying coefficient estimates.
Min. 1st Qu. Median Mean 3rd Qu. Max.
b_Intercept -68.919 -19.931 2.117 0.029 21.596 54.100
b_to15 -0.669 -0.065 0.160 0.166 0.409 0.908
b_over65 0.029 0.265 0.356 0.359 0.443 0.796
b_ttu 0.319 0.555 0.652 0.642 0.728 1.053
b_lhosp -2.276 -0.789 -0.105 -0.150 0.487 1.964
b_manu -1.042 -0.296 0.115 0.094 0.484 1.128
b_degree -1.564 -0.738 -0.447 -0.475 -0.206 0.633
b_badhealth -1.493 -0.408 -0.103 -0.077 0.285 1.090
b_bornuk -0.221 0.078 0.259 0.289 0.500 0.880

The SVCs can be mapped to and again a function is define below to do this, resulting in the maps in Figure 2.4. These show some distinct spatial trends. For example

  • Degree % was negatively associated with the Brexit vote but positively associated with it in Scotland.
  • Born UK % was negatively associated in the South East but positively elsewhere.
  • Over 65 % was positively associated throughout the UK but was stronger in the west.
  • etc…
## function for plotting 
# you could explore other palettes
# try entering c4a_gui()
plot_vgam_coef_func = function(svc_data, var.name, tit = NULL) {
  if(is.null(tit)) tit = var.name
  var  = svc_data %>% 
    st_drop_geometry() %>% 
    dplyr::select(all_of(var.name)) %>%
    unlist() %>% 
    as.vector()
  if (sign(max(var)) * sign(min(var)) == 1) flip = F
  if (sign(max(var)) * sign(min(var)) == -1) flip = T
  # test to see if the coefficient is all the same sign
  # need divergent palette if not!
  if(flip) {
    ggplot(svc_data, aes_string(fill=var.name)) + 
    geom_sf(col = NA) + 
    scale_fill_continuous_c4a_div(palette="scico.vik", name= tit, mid = 0, reverse = T ) + 
    theme_bw() +
    theme(legend.position = "bottom", 
          axis.title.x=element_blank(),
          axis.text.x=element_blank(),
          axis.ticks.x=element_blank(), 
          axis.title.y=element_blank(),
          axis.text.y=element_blank(),
          axis.ticks.y=element_blank(),
          text=element_text(size=8))   
  } else {
    ggplot(svc_data, aes_string(fill=var.name)) + 
    geom_sf(col = NA) + 
    scale_fill_continuous_c4a_seq(palette="scico.nuuk", name= tit) + 
    theme_bw() +
    theme(legend.position = "bottom", 
          axis.title.x=element_blank(),
          axis.text.x=element_blank(),
          axis.ticks.x=element_blank(), 
          axis.title.y=element_blank(),
          axis.text.y=element_blank(),
          axis.ticks.y=element_blank(),
          text=element_text(size=8))
  }
}
tit = expression(paste("" * beta[0] * " "))
p1 = plot_vgam_coef_func(hex.svc, "b_Intercept", tit)
p2 = plot_vgam_coef_func(hex.svc, "b_to15")
p3 = plot_vgam_coef_func(hex.svc, "b_over65")
p4 = plot_vgam_coef_func(hex.svc, "b_ttu")
p5 = plot_vgam_coef_func(hex.svc, "b_lhosp")
p6 = plot_vgam_coef_func(hex.svc, "b_manu")
p7 = plot_vgam_coef_func(hex.svc, "b_degree")
p8 = plot_vgam_coef_func(hex.svc, "b_badhealth")
p9 = plot_vgam_coef_func(hex.svc, "b_bornuk")

plot_grid(p1, p2, p3, p4, p5,p6,p7,p8,p9, ncol = 3)
The spatially varying coefficient (SVC) estimates from GGP-GAM.

Figure 2.4: The spatially varying coefficient (SVC) estimates from GGP-GAM.

2.4 Summary

2.4.1 GGP-GAM rationale

The rationale for using GAMs with GP splines is as follows:

  • GAMs with splines or smooths capture non-linear relationships between the response variable and covariates.
  • splines generate a varying coefficient model when they are parameterised with more than one variable, and a spatially varying coefficient model is generated if these include location.
  • different splines are available, but GP splines reflect Tobler’s First Law of geography (spatial autocorrelation, process spatial heterogeneity, etc).
  • GAMs are robust, have a rich theoretical background and been subject to much development.

2.4.2 Undertaking a GGP-GAM

The following steps provide a general sequence of steps for undertaking a GGP-GAM:

  • prepare the data by transforming it to a geographic (map) projection, extracting the coordinates as variables (centroids if areas) and converting them to a reasonably sized unit (km were used above) and setting an intercept term equal to 1.
  • calibrate a GAM regression with GP splines for the intercept and each variable, parameterised with the location variables.
  • inspect the resultant GGP-GAM and check the splines / smooths have sufficient degrees of freedom and increase the \(k\) parameter (knots) if need be and re-recreate the GGP-GAM.
  • generate the SVCs by setting each variable to 1 and the others to 0 and using the predict function (note this can be done over the original observation locations or over new locations such as a grid).
  • summarise and map the resultant SVCs to understand how and where the relationships with the target variable vary spatially.

2.4.3 Some final observations

Initial research has demonstrated the formulation and application of a GAM with GP splines calibrated via observation location as a multiscale spatially varying coefficient model: the Geographical Gaussian Process GAM (GGP-GAM) (Comber, Harris, Murakami, et al. 2022, Comber, Harris, and Brunsdon 2022). The GGP-GAM was compared with the most popular SVC model, Geographically Weighted Regression (GWR) (Brunsdon et al. 1996) and shown to out-perform MGWR. Other work demonstrates GGP-GAMs to have a number of theoretical, inferential and computational advantages over GWR related approaches and a full exposition of the method and comparison with MGWR is in the process of being reviewed by IJGIS (Comber et al. submitted).

References

Brunsdon, C., Fotheringham, A.S., and Charlton, M.E., 1996. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28 (4), 281–298.
Comber, A., Harris, P., and Brunsdon, C., 2022. Spatially varying coefficient regression with GAM gaussian process splines: GAM (e)-on. AGILE: GIScience Series, 3, 31.
Comber, A., Harris, P., and Brunsdon, C., submitted. Multiscale spatially varying coefficient modelling using a geographical gaussian process GAM. International Journal of Geographical Information Science.
Comber, A., Harris, P., Murakami, D., Tsutsumida, N., and Brunsdon, C., 2022. Geographically varying coefficient regression: GWR-exit and GAM-on?(short paper). In: 15th international conference on spatial information theory (COSIT 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
Fahrmeir, L., Kneib, T., Lang, S., and Marx, B.D., 2021. Regression models. In: Regression. Springer, 23–84.
Fan, Y.-T. and Huang, H.-C., 2022. Spatially varying coefficient models using reduced-rank thin-plate splines. Spatial Statistics, 51, 100654.
Friedman, J.H., 2001. Greedy function approximation: A gradient boosting machine. Annals of statistics, 1189–1232.
Hastie, T.J., 2017. Generalized additive models. In: Statistical models in s. Routledge, 249–307.
Hastie, T. and Tibshirani, R., 1990. Generalized additive models. Chapman hall & CRC. Monographs on Statistics & Applied Probability. Chapman and Hall/CRC, 1.
McCullagh, P. and Nelder, J.A., 2019. Generalized linear models. Routledge.
Nelder, J.A. and Wedderburn, R.W., 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135 (3), 370–384.
Openshaw, S., 1996. Developing GIS-relevant zone-based spatial analysis methods. Spatial analysis: modelling in a GIS environment, 55–73.
Williams, C.K. and Rasmussen, C.E., 2006. Gaussian processes for machine learning. MIT press Cambridge, MA.
Wood, S.N., 2017. Generalized additive models: An introduction with r. CRC press.
Zschech, P., Weinzierl, S., Hambauer, N., Zilker, S., and Kraus, M., 2022. GAM (e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints. arXiv preprint arXiv:2204.09123.