Chapter 8 Spatial Varying Coefficient Models

8.1 Introduction

This practical extends the regression models that you constructed in Week 4 to the spatial (special?) case, and particularly considers Spatially Varying Coefficient (SVC) models.

The coefficients, \(\beta\)’s, in a regression model may vary for a number of reasons:

  • a poor conceptual model
  • social geography is not physics: laws do not govern behaviours, preferences, attitudes and sentiment
  • bad measurements
  • unaccounted for local factors
  • OR process spatial heterogeneity.

Therefore ‘whole map’ regression models may unreasonably assume spatial stationarity in regression.

Thus the basic idea behind SVC models is that the relationship between \(x\) and \(y\) may be *spatially non-stationary** (i.e. may vary in different in different locations).

A key advantage of SVCs is that because they generate coefficient estimates for different locations in the study area, they can be mapped. The maps indicate how the relationship between an outcome ( \(y\) ) and factors ( \(x\)’s ) varies spatially.

This practical introduces two approach for handling spatially autocorrelated data:

  1. Geographically Weighted Regression (GWR - Brunsdon, Fotheringham, and Charlton (1996), Fotheringham, Brunsdon, and Charlton (2003)) and Multi-scale GWR (MGWR) Fotheringham, Yang, and Kang (2017). These are the SVC brand leader and MGWR is now considered as the default GWR (Comber et al. 2023).
  2. GGP Generalised Additive Models (Geographical Gaussian Process GAMs) which have just been proposed (Comber, Harris, and Brunsdon 2023) and shown to be a (faster and more precise) alternative SVC to the GWR and are able to handle a variety of response types (i.e. different distributions of \(y\) ).

This session will again use the Georgia data that was used in Week 4 and will again construct a regression model of the median income variable (MedInc), but with slightly different set of predictor variables.

You will need the following packages

library(sf)         # for spatial data
library(tidyverse)  # for everything!
library(GWmodel)    # to undertake the GWR
library(tmap)       # for mapping
library(mgcv)       # for GAMs

And you will need to load the georgia data as in Week 2 which contains census data for the counties in Georgia, USA.

download.file("https://github.com/gwverse/gw/blob/main/data/georgia.rda?raw=True",
              "./georgia.RData", mode = "wb")
load("georgia.RData")

You should examine this in the usual way - you may want to practice your EDA and spatial EDA outside of the practical - and recall that this data has a number of County level variables including:

  • % with college degree (PctBach)
  • % elderly (PctEld)
  • % foreign born (PctFB)
  • median income of the county (MedInc in dollars)

Recall that the classic OLS regression has the basic form of:

\[ y_i = \beta_{0} + \sum_{k=1}^{m}\beta_{k}x_{ik} + \epsilon_i \]

where for observations indexed by \(i=1 \dots n\), \(y_i\) is the response variable, \(x_{ik}\) is the value of the \(k^{th}\) predictor variable, \(m\) is the number of predictor variables, \(\beta_0\) is the intercept term, \(\beta_k\) is the regression coefficient for the \(k_{th}\) predictor variable and \(\epsilon_i\) is the random error term that is independently normally distributed with zero mean and common variance \(\sigma^2\). OLS is commonly used for model estimation in linear regression models.

And that this is specified using the function lm (linear model). The model we constructed in Week 5 is recreated here with the response variable \(y\) is median income (MedInc), and the 6 (i.e. \(k\)) socio-economic variables are PctBach, PctEld and PctFB are the different predictor variables, the \(k\) different \(x\)’s:

m = lm(MedInc~ PctBach + PctEld + PctFB, data = georgia) 
summary(m)
## 
## Call:
## lm(formula = MedInc ~ PctBach + PctEld + PctFB, data = georgia)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -27979  -4789   -922   2712  34805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  51566.1     3528.2  14.616  < 2e-16 ***
## PctBach        798.0      147.9   5.397 2.49e-07 ***
## PctEld       -1789.0      231.2  -7.738 1.21e-12 ***
## PctFB        -1902.0      691.4  -2.751  0.00665 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7672 on 155 degrees of freedom
## Multiple R-squared:  0.4771, Adjusted R-squared:  0.4669 
## F-statistic: 47.13 on 3 and 155 DF,  p-value: < 2.2e-16

Recall also from Week 2 that we are interested in the following regression model properties:

  • Significance indicated by the p-value in the column \(Pr(>|t|)\) as a measure of how statistically surprising the relationship is;
  • The Coefficient Estimates (the \(\beta_{k}\) values) that describe how change in the \(x_k\) are related (linearly) to changes in \(y\);
  • And the Model fit given by the \(R^2\) value, that tells us how well the model predicts \(y\).

All well and good. However the implicit assumptions in linear regression that the relationships between the various \(x\) terms and \(y\) are linear, that the error terms have identical and independent normal distributions and that the relationships between the \(x_k\)’s and \(y\) are the same everywhere, are quite often a consequence of wishful thinking, rather than a rigorously derived conclusion. To take a step closer to reality, it is a good idea to look beyond these assumptions to obtain more reliable predictions, or gain a better theoretical understanding of the process being considered.

Regression models (and most other models) are aspatial and because they consider all of the data in the study area in one universal model, they are referred to as Global models. For example, the coefficient estimates of the model Median Income in Georgia assume that the contributions made by the different variables in predicting MedInc are the same across the study area. In reality this assumption of spatial invariance is violated in many instances when geographic phenomena are considered.

For example, the association between PctBach and MedInc, might be different in different places. This is a core concept for geographical data analysis, that of process spatial heterogeneity. That is, how and where processes, relationships, and other phenomena vary spatially, as introduced in Week 1 and reinforced in the lecture this week.

The standard OLS regression model (above) can be extended to a Spatially Varying Coefficient (SVC) model. The form is the same except that now location is included and a value for each regression coefficient is assigned to a location \((u,v)\) in space (e.g. \(u\) is an Easting and \(v\) a Northing), so that we denote it \(\beta_k(u,v)\):

\[ y = \beta_{0_(u_i, v_i)} + \sum_{m}^{k=1}\beta_{k}x_{ik_{(u_i, v_i)}} + \epsilon_{(u_i, v_i)} \]

In this session you will explore two SVC approaches for solving this equation (and thereby for handling the impacts of any spatial dependencies in the error terms) and for quantifying process spatial heterogeneity using:

  1. Geographically Weighted Regression and Multiscale GWR
  2. GAMs with Gaussian process splines parameterised with location (GGP-GAMs)

These both use relative location (i.e. distance between observations) and because degrees have different dimensions depending on where you are on the earth1, the data need to be converted to a Geographic projection. Here the the georgia data are transformed to a the EPSG 102003 USA Contiguous Albers Equal Area Conic projection.

georgia = st_transform(georgia, "ESRI:102003")

8.2 Geographically Weighted Regression

8.2.1 Description

In overview, geographically weighted (GW) approaches use a moving window or kernel that passes through the study area. At each location being considered, data under the kernel are subsetted, weighted by their distance to the kernel centre and used to make a local calculation of some kind. In GWR this is a regression. In this way GW approaches construct a series of models at discrete locations rather than a single one for whole study area.

There are a number of options for the shape of the kernel - see page 6 in Gollini et al. (2013) - and the default is to use a bi-square kernel. This generates higher weights at locations very near to the kernel centre relative to those towards the edge.

The size or bandwidth of the kernel is critical as it determines the degree of smoothing and the scale over which the GWR operates. This can and should be optimally determined rather than just picking a bandwidth. An optimal bandwidth for GWR can be found by minimising a model fit diagnostic. Options include a leave-one-out cross-validation (CV) score which optimises model prediction accuracy and the Akaike Information Criterion (AIC) which optimises model parsimony - that is it trades off prediction accuracy with model complexity (number of variables used). Note also that the kernel can be defined as a fixed value (i.e. a distance), or in an adaptive way (i.e. with a specific number of nearby data points). Gollini et al. (2013) provide a description of this.

Thus there are 2 basic steps to any GW approach:

  1. Determine the optimal kernel bandwidth (i.e. how many data points to include in each local model or the kernel / window size);
  2. Fit the GW model.

8.2.2 Undertaking a GWR

These steps are undertaken in the code block below with an adaptive bandwidth, noting that georgia needs to be converted to an sp format data (the GWmodel package has not been updated to take sf format objects).

The bandwidth - the kernel size - is critical: it how many data points / observations are included in each local model. However, determining the optimal bandwidth requires different potential bandwidths to be evaluated, which in turn requires some measure of model fit to be applied. There are 2 options for this in a GWR (as indicated above):

  • leave-one-out Cross Validation (CV). This essentially removes each observation one at a time for a potential bandwidth, and provides a measure of how well the local model with that bandwidth predicts the value of each removed data point;
  • Akaike’s Information Criterion (AIC or corrected AIC, AICc). This provides a measure of model parsimony, a kind of balance between model accuracy and model complexity. A parsimonious model is one that has a explanatory power but is as simple as possible.

In determining the optimal GWR bandwidth, AIC is preferred to CV because it reflects model parsimony and its use tends to avoid over-fitting GWR models (thus AIC bandwidths tend to be larger than bandwidths found using CV).

The GWmodel package (Lu et al. 2014) has tools for determining the optimal GWR bandwidth, but requires the data to be transformed to sp format from sf:

# convert to sp
georgia.sp = as(georgia, "Spatial")
# determine the kernel bandwidth
bw <- bw.gwr(MedInc~ PctBach + PctEld + PctFB,
             approach = "AIC",
             adaptive = T,
             data=georgia.sp) 
## Adaptive bandwidth (number of nearest neighbours): 105 AICc value: 3248.461 
## Adaptive bandwidth (number of nearest neighbours): 73 AICc value: 3239.974 
## Adaptive bandwidth (number of nearest neighbours): 51 AICc value: 3229.892 
## Adaptive bandwidth (number of nearest neighbours): 40 AICc value: 3231.947 
## Adaptive bandwidth (number of nearest neighbours): 60 AICc value: 3234.233 
## Adaptive bandwidth (number of nearest neighbours): 47 AICc value: 3229.298 
## Adaptive bandwidth (number of nearest neighbours): 43 AICc value: 3231.71 
## Adaptive bandwidth (number of nearest neighbours): 48 AICc value: 3228.433 
## Adaptive bandwidth (number of nearest neighbours): 50 AICc value: 3229.04 
## Adaptive bandwidth (number of nearest neighbours): 48 AICc value: 3228.433

You should have a look at the bandwidth: it is an adaptive bandwidth (kernel size) indicating that its size varies but the nearest 48 observations will be used to compute each local weighted regression. Here it, has a value of 48 indicating that 30% of the county data will be used in each local calculation (there are 159 records in the data):

bw
## [1] 48

Try running the below to see the fixed distance bandwidth

# determine the kernel bandwidth
bw2 <- bw.gwr(MedInc~ PctBach + PctEld + PctFB, 
             approach = "AIC",
             adaptive = F,
             data=georgia.sp) 
## Fixed bandwidth: 348973.8 AICc value: 3262.816 
## Fixed bandwidth: 215720.8 AICc value: 3245.745 
## Fixed bandwidth: 133365.9 AICc value: 3240.205 
## Fixed bandwidth: 82467.81 AICc value: 3304.821 
## Fixed bandwidth: 164822.7 AICc value: 3240.651 
## Fixed bandwidth: 113924.6 AICc value: 3245.723 
## Fixed bandwidth: 145381.3 AICc value: 3240.589 
## Fixed bandwidth: 125940 AICc value: 3240.421 
## Fixed bandwidth: 137955.4 AICc value: 3240.394 
## Fixed bandwidth: 130529.5 AICc value: 3240.143 
## Fixed bandwidth: 128776.5 AICc value: 3240.177 
## Fixed bandwidth: 131612.9 AICc value: 3240.155 
## Fixed bandwidth: 129859.9 AICc value: 3240.147 
## Fixed bandwidth: 130943.3 AICc value: 3240.145 
## Fixed bandwidth: 130273.7 AICc value: 3240.143 
## Fixed bandwidth: 130687.5 AICc value: 3240.143 
## Fixed bandwidth: 130431.8 AICc value: 3240.143
bw2
## [1] 130431.8

This indicates a fixed bandwidth of ~130km. For comparison the distances between the counties is summarised below:

summary(as.vector(st_distance(georgia)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0   85620  156531  166238  237457  517837

You should now create the GWR model with adaptive bandwidth stored in bw, and have a look at the outputs:

# fit the GWR model
m.gwr <- gwr.basic(MedInc~ PctBach + PctEld + PctFB, 
                adaptive = T,
                data = georgia.sp,
                bw = bw)  
m.gwr
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2023-11-21 17:58:41 
##    Call:
##    gwr.basic(formula = MedInc ~ PctBach + PctEld + PctFB, data = georgia.sp, 
##     bw = bw, adaptive = T)
## 
##    Dependent (y) variable:  MedInc
##    Independent variables:  PctBach PctEld PctFB
##    Number of data points: 159
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -27979  -4789   -922   2712  34805 
## 
##    Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)  51566.1     3528.2  14.616  < 2e-16 ***
##    PctBach        798.0      147.9   5.397 2.49e-07 ***
##    PctEld       -1789.0      231.2  -7.738 1.21e-12 ***
##    PctFB        -1902.0      691.4  -2.751  0.00665 ** 
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 7672 on 155 degrees of freedom
##    Multiple R-squared: 0.4771
##    Adjusted R-squared: 0.4669 
##    F-statistic: 47.13 on 3 and 155 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 9123479259
##    Sigma(hat): 7623.079
##    AIC:  3301.791
##    AICc:  3302.183
##    BIC:  3183.48
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 48 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                  Min.  1st Qu.   Median  3rd Qu.     Max.
##    Intercept 36563.45 51275.92 57838.03 65500.51 87342.20
##    PctBach    -117.33   248.02   493.77   893.53  1937.83
##    PctEld    -3491.73 -2512.30 -2049.18 -1605.08  -793.53
##    PctFB     -8282.57 -3704.10 -1873.14  -408.23  2581.19
##    ************************Diagnostic information*************************
##    Number of data points: 159 
##    Effective number of parameters (2trace(S) - trace(S'S)): 39.54012 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 119.4599 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 3228.433 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 3178.961 
##    BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 3145.552 
##    Residual sum of squares: 3695444020 
##    R-square value:  0.7881856 
##    Adjusted R-square value:  0.7174852 
## 
##    ***********************************************************************
##    Program stops at: 2023-11-21 17:58:41

The printout gives quite a lot of information: the standard OLS regression model, a summary of the GWR coefficients and a whole range of fit measures (\(R^2\), AIC, BIC).

8.2.3 Evaluating the results

The code above prints out the whole model summary. You should compare the summary of the local (i.e. spatially varying) GWR coefficient estimates with the global (i.e. fixed) outputs of the ordinary regression coefficient estimates. Note the first line of code below. The output of the GWR model has lots of information including a slot called SDF or spatial data frame and this is an sp format object.

# GWR outputs
names(m.gwr)
## [1] "GW.arguments"  "GW.diagnostic" "lm"            "SDF"          
## [5] "timings"       "this.call"     "Ftests"

To access the data table in the SDF slot @data is added to this.

gwr.summary = apply(m.gwr$SDF@data[, 1:4], 2, summary)
tab.gwr <- rbind(gwr.summary, coef(m))
rownames(tab.gwr)[7] <- "Global"
tab.gwr <- round(tab.gwr, 1)

And you can inspect this table - note that the t() function transposes the table:

t(tab.gwr)
Table 8.1: A comparison of the GWR and OLS regression coefficient estimates.
Min. 1st Qu. Median Mean 3rd Qu. Max. Global
Intercept 36563.4 51275.9 57838.0 57684.0 65500.5 87342.2 51566.1
PctBach -117.3 248.0 493.8 597.3 893.5 1937.8 798.0
PctEld -3491.7 -2512.3 -2049.2 -2095.0 -1605.1 -793.5 -1789.0
PctFB -8282.6 -3704.1 -1873.1 -2260.0 -408.2 2581.2 -1902.0

You could write this out to a csv file for inclusion in a report or similar:

write.csv(t(tab.gwr), file = "coef_tab.csv")

So the Global coefficient estimates are similar to the Mean and Median values of the GWR output. But you can see the variation in the association between median income (MedInc) and the different coefficient estimates.

So for example, for PctPov (poverty) :

  • the global coefficient estimate is NA.
  • the local coefficient estimate varies from 3.02072^{4} to 4.31069^{4} between the 1st and 3rd quartiles.

8.2.4 Mapping GWR outputs

The variation in the GWR coefficient estimates can be mapped to show how different factors are varying associated with Median Income in different parts of Georgia. These data are held in the SDF of the GWmodel output which is converted to sf format in the code below and examined:

gwr_sf = st_as_sf(m.gwr$SDF)
gwr_sf

There are lots of information in the GWR output, but here we are interested in are the coefficient estimates and the t-values whose names are the same as the inputs (and the t-values with _TV).

The code below creates maps where large variation in the relationship between the predictor variable and median income was found as in Figure 8.1:

tm_shape(gwr_sf) +
  tm_fill( "PctEld", palette = "viridis", style = "kmeans") +
  tm_layout(legend.position = c("right","top"), frame = F)
Maps of the GWR coefficient estimates.

Figure 8.1: Maps of the GWR coefficient estimates.

It is also possible to create maps showing coefficients whose values flip from positive to negative in different parts of the state, as in Figure 8.2.

tm_shape(gwr_sf) +
  tm_fill(c("PctFB", "PctBach"),midpoint = 0, style = "kmeans") +
  tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"), frame = F) 
Maps of GWR coefficient estimates, whose values flip from positive to negative.

Figure 8.2: Maps of GWR coefficient estimates, whose values flip from positive to negative.

So what we see here are the local variations in the degree to which changes in different variables are associated with changes in Median Income. The maps above are not maps of the predictor variables (PctBach, PctPov and PctBlack). Rather, they show how the relationships between the inputs and Median Income vary spatially - the spatially vary strength of the relationships,

Now a critical aspect is that coefficient significance through the p-value is now local rather than a global statistic. Here we need to go into the t-values from which p-values are derived2. Significant coefficient estimates are those whose t-values are less than -1.96 or greater than +1.96. It possible to identify and map these locations from the GWR outputs which have the t-values (_TV) for each input variable.

The code below recreates one of the maps above maps but highlights areas where PctBlack is a significant predictor of MedInc (Figure 8.3. It shows that coefficients are highly localized.

# determine which are significant
tval = gwr_sf %>%dplyr::select(all_of("PctBach_TV")) %>% st_drop_geometry()
signif = tval < -1.96 | tval > 1.96
# map the counties
tm_shape(gwr_sf) +
  tm_fill("PctBach",midpoint = 0) + tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"))+
  # now add the tvalues layer
  tm_shape(gwr_sf[signif,]) + tm_borders()
A map of the spatial variation of the PctBach coefficient estimate, with significant areas highlighted.

Figure 8.3: A map of the spatial variation of the PctBach coefficient estimate, with significant areas highlighted.

8.2.5 GWR Summary

GWR investigates if and how relationships between response and predictor variables vary geographically. It is underpinned by the idea that whole map (constant coefficient) regressions such as those estimated by ordinary least squares may make unreasonable assumptions about the stationarity of the regression coefficients under investigation. Such approaches may wrongly assume that regression relationships (between predictor and target variables) are the same throughout the study area.

GWR outputs provides measures of process heterogeneity – geographical variation in data relationships – through the generation of mappable and varying regression coefficients, and associated statistical inference. It has been extensively applied in a wide variety of scientific and socio-scientific disciplines.

There are a number of things to note about GWR:

  1. It reflects a desire to shift away from global, whole map (Openshaw 1996) and one-size-fits-all statistics, towards approaches that capture local process heterogeneity. This was articulated by Goodchild’s (2004) who proposed a 2nd law of geography: the principle of spatial heterogeneity or non-stationarity, noting the lack of a concept of an average place on the Earth’s surface comparable, for example, to the concept of an average human (Goodchild 2004).

  2. There have been a number of criticisms of GWR (see Section 7.6 in Gollini et al. (2013)).

  3. There have been a number of advances to standard GWR notably Multi-Scale which is now considered to be the default GWR. These have all been summarised in a GWR Route map paper (Comber et al. 2023).

Consideration of developments and limitations will be useful for your assignment write up.

Task 1

Create a function for creating a tmap of coefficients held in an sf polygon object, whose values flip, highlighting significant observations. Note that functions are defined in the following way:

my_function_name = function(parameter1, paramter2) { # note the  curly brackets 
  # code to do stuff
  # value to return
}

A very simple example is below. This simply takes variable name from an sf object, creates a vector of values and returns the median and the mean:

median_mean_from_sf = function(x, var_name) {
  # do stuff
  x = x %>% dplyr::select(all_of(var_name)) %>% st_drop_geometry() %>% unlist()
  mean_x = mean(x, na.rm = T)
  median_x = median(x, na.rm = T)
  # values to return
  c(mean_x, median_x)
}
# test it
median_mean_from_sf(x = georgia, var_name = "MedInc")
## [1] 37147.09 34307.00

You should be able to use the code that was used to generate the plot above for the do stuff bit:

# determine which are significant
tval = gwr_sf %>% dplyr::select(all_of("PctBach_TV")) %>% st_drop_geometry()
signif = tval < -1.96 | tval > 1.96
# map the counties
tm_shape(gwr_sf) +
  tm_fill("PctBach",midpoint = 0) + tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"))+
  # now add the tvalues layer
  tm_shape(gwr_sf[signif,]) + tm_borders()

NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.

8.3 Multi-scale GWR

8.3.1 Description

In a standard GWR only a single bandwidth is determined and applied to each predictor variable. In reality some relationships in spatially heterogeneous processes may operate over larger scales than others>f this is the case the best-on-average scale of relationship non-stationarity determined by a standard GWR may under- or over-estimate the scale of individual predictor-to-response relationships.

To address this, Multi-scale GWR (MGWR) can be used (Yang 2014; Fotheringham, Yang, and Kang 2017; Oshan et al. 2019). It determines a bandwidth for each predictor variable individually, thereby allowing individual response-to-predictor variable relationships to vary. The individual bandwidths provide insight into the scales of relationships between response and predictor variables, allowing for local to global bandwidths. Recent work has suggested that MGWR should be the default GWR (Comber et al. 2023), with a standard GWR only being used in specific circumstances.

8.3.2 Undertaking a MGWR

The code below undertakes a search for the optimal bandwidths, and then having found them they are passed into a second MGWR operation. These will generate a lot of text to your screen!

gw.ms <- gwr.multiscale(MedInc~ PctBach + PctEld + PctFB,
                        data = as(georgia, "Spatial"),
                        adaptive = T, max.iterations = 1000,
                        criterion="CVR",
                        kernel = "bisquare",
                        bws0=c(100,100,100),
                        verbose = F, predictor.centered=rep(F, 3))

We can examine the bandwidths for the individual predictor variables:

bws = gw.ms$GW.arguments$bws
bws
## [1]  80  53 157  18

And pass these into a second model.

gw.ms <- gwr.multiscale(MedInc~ PctBach + PctEld + PctFB, 
                        data = as(georgia, "Spatial"),
                        adaptive = T, max.iterations = 1000,
                        criterion="CVR",
                        kernel = "bisquare",
                        bws0=bws,
                        bw.seled = rep(T, length(bws)),
                        verbose = F, predictor.centered=rep(F, 3))

8.3.3 Evaluating the results

The coefficient estimate spatial variation can be examined (hint: have a look at what is returned (names(gw.ms):

coefs_msgwr = apply(gw.ms$SDF@data[, 1:4], 2, summary)
round(coefs_msgwr,1)
##         Intercept PctBach  PctEld    PctFB
## Min.      52701.7   563.0 -2161.5 -14197.2
## 1st Qu.   53320.5   718.9 -2158.7  -6338.5
## Median    54451.5  1011.9 -2152.8  -4054.0
## Mean      55197.2  1092.1 -2149.8  -4525.3
## 3rd Qu.   57567.4  1536.1 -2140.2  -2737.6
## Max.      58228.9  1601.1 -2132.9    482.0

Finally these can be assembled into a table:

tab.mgwr = data.frame(Bandwidth = bws, 
                 t(round(coefs_msgwr,1)))
names(tab.mgwr)[c(3,6)] = c("Q1", "Q3")
tab.mgwr
Table 8.2: The MSGWR variable specific bandwidths and coefficient estimates.
Bandwidth Min. Q1 Median Mean Q3 Max.
Intercept 80 52701.7 53320.5 54451.5 55197.2 57567.4 58228.9
PctBach 53 563.0 718.9 1011.9 1092.1 1536.1 1601.1
PctEld 157 -2161.5 -2158.7 -2152.8 -2149.8 -2140.2 -2132.9
PctFB 18 -14197.2 -6338.5 -4054.0 -4525.3 -2737.6 482.0

You should examine these, particularly the degree to which the coefficient estimates vary in relation to their bandwidths. Notice that smaller bandwidths (a smaller moving window) results in greater variation and this reduces as the bandwidths tend towards the global (recall that there are \(n = 159\) observations in the georgia data). So some of the bandwidths indicate highly local relationships with the target variable, while other are highly global, with bandwidths approaching \(n = all\).

Notice the coefficients and the degree to which vary, alongside the bandwidths - high bandwidths of course result in less variation in the coefficient estimates.

These tell us something about the local-ness of the interactions of different covariates with the target variable MedInc: PctFB, PctPov and PctBlack have small, highly local bandwidths, using few data points in each local regression. Whilst PctRural, PctBach and PctEld have a global relationship with MedInc and nearly all of their values are used in each local regression model.

In the standard GWR PctFB flipped and it does so here potentially because of the more localised bandwidths (recall that the standard GWR model had a bandwidth of 48).

You could write this out to a csv file for inclusion in a report or similar as in the GWR coefficient example above. And the coefficients can be mapped. To a degree there is no point in mapping the coefficients whose bandwidths tend towards the global (i.e. are close to \(n\)), but the others are potentially interesting.

Again these are held in the SDF of the GWmodel output which is converted to sf format in the code below and examined:

mgwr_sf = st_as_sf(gw.ms$SDF)
mgwr_sf

8.3.4 Mapping MGWR outputs

The code below creates the maps for the local covariates:

p1 = tm_shape(mgwr_sf) +
  tm_fill("PctBach", palette = "viridis", style = "kmeans") +
  tm_layout(legend.position = c("right","top"), frame = F)
# coefficients whose values flip 
p2 = tm_shape(mgwr_sf) +
  tm_fill("PctFB",midpoint = 0, style = "kmeans") +
  tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"), frame = F) 
tmap_arrange(p1, p2, nrow = 1)
Maps of the MGWR coefficient estimates.

Figure 8.4: Maps of the MGWR coefficient estimates.

You may have noticed that t-values are also included in the outputs and gain significant areas can be identified and highlighted. This suggests a different spatial extent to that indicated by a standard GWR.

# determine which are significant
tval = mgwr_sf$PctFB_TV
signif = tval < -1.96 | tval > 1.96
# map the counties
tm_shape(mgwr_sf) +
  tm_fill("PctFB",midpoint = 0) + tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"))+
  # now add the tvalues layer
  tm_shape(mgwr_sf[signif,]) + tm_borders()
A map of the spatial variation of the PCTBlack coefficient estimate arising from a MGWR, with significant areas highlighted.

Figure 8.5: A map of the spatial variation of the PCTBlack coefficient estimate arising from a MGWR, with significant areas highlighted.

8.3.5 Comparing GWR and MGWR

It is instructive to compare the GWR and MGR outputs, both in terms of the numeric distributions and the spatial ones. The inter-quartile ranges of the numeric distributions give as idea about the different spreads and we can examine these with respect to the global coefficient estimates from the OLS model and the MGWR bandwidths:

gwr.iqr <- t(tab.gwr[c(2,5,7),])
mgwr.iqr <- tab.mgwr[, c(1,3,6)]
gwr.iqr
##           1st Qu. 3rd Qu.  Global
## Intercept 51275.9 65500.5 51566.1
## PctBach     248.0   893.5   798.0
## PctEld    -2512.3 -1605.1 -1789.0
## PctFB     -3704.1  -408.2 -1902.0
mgwr.iqr
##           Bandwidth      Q1      Q3
## Intercept        80 53320.5 57567.4
## PctBach          53   718.9  1536.1
## PctEld          157 -2158.7 -2140.2
## PctFB            18 -6338.5 -2737.6

The maps similarly may show large differences in both the value of the coefficient estimates (see Figure 8.6) in different places but also the significant areas (see Figure 8.7).

# gwr
p3 = tm_shape(gwr_sf) +
  tm_fill(c("Intercept", "PctBach"), palette = "viridis", style = "kmeans") +
  tm_layout(legend.position = c("right","top"), frame = F)
# mgwr
p4 = tm_shape(mgwr_sf) +
  tm_fill(c("Intercept", "PctBach"), palette = "viridis", style = "kmeans") +
  tm_layout(legend.position = c("right","top"), frame = F)
tmap_arrange(p3, p4, nrow = 2)
Maps of the coefficients arising from a GWR (top) and MGWR (bottom).

Figure 8.6: Maps of the coefficients arising from a GWR (top) and MGWR (bottom).

# gwr
tval = gwr_sf$PctFB_TV
signif = tval < -1.96 | tval > 1.96
# map the counties
p5 = tm_shape(gwr_sf) +
  tm_fill("PctFB",midpoint = 0, style = "kmeans") + 
  tm_style("col_blind") +
  tm_layout(legend.position = c("right","top"))+
  # now add the tvalues layer
  tm_shape(mgwr_sf[signif,]) + tm_borders()
# mgwr
tval = mgwr_sf$PctFB_TV
signif = tval < -1.96 | tval > 1.96
# map the counties
p6 = tm_shape(mgwr_sf) +
  tm_fill("PctFB", midpoint = 0, style = "kmeans") + 
  tm_style("col_blind") +
  tm_layout(legend.position = c("right","top"))+
  # now add the tvalues layer
  tm_shape(mgwr_sf[signif,]) + tm_borders()
tmap_arrange(p5, p6, nrow = 1)
Maps of the signiticant coefficients arising from a GWR (left) and MGWR (right).

Figure 8.7: Maps of the signiticant coefficients arising from a GWR (left) and MGWR (right).

8.3.6 MGWR Summary

A single bandwidth is used in standard GWR under the assumption that the response-to-predictor relationships operate over the same scales for all of the variables contained in the model. This may be unrealistic because some relationships can operate at larger scales and others at smaller ones. A standard GWR will nullify these differences and find a best-on-average scale of relationship non-stationarity (geographical variation), as described through the bandwidth.

MSGWR was proposed to allow for a mix of relationships between predictor and response variables: from highly local to global. In MGWR, the bandwidth for each relationship is determined separately, allowing the scale of individual response-to-predictor relationships to vary.

For these reasons, the MGWR has been recommended as the default GWR. MGWR plus an OLS regression provides solid basis for making a decision to use a GWR and for determining the variation predictor variable to response variable relationships: the MGWR bandwidths indicate their different scales. A standard GWR should only be used in circumstances where all the predictors are found to have the same bandwidth.

8.4 GGP-GAM

8.4.1 Description

There are a number of long standing conceptual criticisms of GWR / MGWR that are described in full in Comber et al. (2023) and summarised elsewhere Comber et al. (2022):

  • the GGP-GAM is computationally quicker (although this can be overcome by parallelising)
  • GWR and MGWR also suffer from an absence of a generic framework for any form of Geographically Weighted model and a clear theoretical framework
  • more importantly the GAM framework (including GP splines) has a stronger theoretical background and offers greater flexibility than the family of GWR models (the main theoretical limitation is that the kernel-based approach means GWR and MGW generate a collection of local models rather than a single model)
  • MGWR can only handle Gaussian responses
  • MGWR has no options for handling or penalizing collinearity, down-weighting outliers, handling heteroskedastic and autocorrelated error terms.
  • there are concerns about GWR’s reliability as a spatial predictor

GAMs are able to handle many kinds of responses (Wood 2017; Fahrmeir et al. 2021). They generate multiple model terms which are added together and provide an intuitive approach to fit relatively complex relationships in data with complex interactions and non-linearities (Wood 2017). The outputs of GAMs provide readily understood summaries of the relationship between predictor and response variables and how the outcome is modelled. They are able to predict well from complex systems, quantify relationships, and make inferences about these relationships, such as what variables are important and at what range of values they are most influential (Hastie and Tibshirani 1990; Wood 2017). GAMs can perform as well as or better than most machine learning models and they are relatively fast (Friedman 2001; Wood 2017). Importantly, in the context of SVC modelling, GAMs combine predictive power, model accuracy and model transparency and generate “intrinsically understandable white-box machine learning models that provide a technically equivalent, but ethically more acceptable alternative to [machine learning] black-box models(Zschech et al. 2022, p2).

GAMs use splines (sometimes called smooths) to model nonlinear relationships which can be of different forms depending on the problem and data (Hastie and Tibshirani 1990). A spline can be represented as a linear combination of functions called basis functions. Each of these is assigned a coefficient and these are linearly combined to generate the predictions of outcome (\(\hat{y}\)). Basis functions can be single or multi-dimensional in terms of predictor variables. Thus, the form of a GAM is composed of linear sums of multi-dimensional basis functions and this is what allows complex relationships to be modelled. The appropriate degree of “wiggliness” in the spline is determined by the smoothing parameter, which balances over-fitting versus capturing the complexity in the relationship.

The key to SVC modelling with GAMs is that the Gaussian process(GP) splines are parameterised with observation location, as proposed in Comber, Harris, and Brunsdon (2023) - hence a Geographical Gaussian Process GAM (GGP-GAM). A GP is a random process over functions, and its terms can be modelled using GP splines within a GAM.

8.4.2 Undertaking a GGP-GAM

The code below extracts the coordinates from the projected georgia data and converts these to kilometres. It also creates a dummy variable for the intercept.

coords = st_coordinates(st_centroid(georgia))
## Warning: st_centroid assumes attributes are constant over geometries
# head(coords)
df.gam = 
  georgia %>% 
  mutate(Intercept = 1, 
         X = coords[,"X"]/1000, 
         Y = coords[,"Y"]/1000) %>%
  st_drop_geometry() %>%
  as_tibble()

First, just for illustration the gam function form the mgcv package is used to generate a standard parametric model. This generates the same model as the that created by lm above and stored in m:

library(mgcv)
gam_lm = gam(MedInc~ PctBach + PctEld + PctFB, data = df.gam)
summary(gam_lm)

Then a GGP-GAM can be fitted, specifying a GP spline, s for each covariate. Notice the stricture of the spline in the code below:

gam.m = gam(MedInc~ 0 + 
              Intercept + s(X,Y,bs='gp', by=Intercept) + 
              PctBach + s(X,Y,bs='gp', by=PctBach) +
              PctEld + s(X,Y,bs='gp', by=PctEld) +
              PctFB + s(X,Y,bs='gp', by=PctFB),
              data = df.gam)            

It is possible to check that the GAM has fitted correctly with the correct number of knots, \(k\):

gam.check(gam.m)

Here it has, however the steps for adjusting the GAM parameters if it does not are given in Comber et al. (n.d.).

8.4.3 Evaluating the results

Next we can inspect the model:

summary(gam.m)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## MedInc ~ 0 + Intercept + s(X, Y, bs = "gp", by = Intercept) + 
##     PctBach + s(X, Y, bs = "gp", by = PctBach) + PctEld + s(X, 
##     Y, bs = "gp", by = PctEld) + PctFB + s(X, Y, bs = "gp", by = PctFB)
## 
## Parametric coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## Intercept  56001.1     3787.8  14.785   <2e-16 ***
## PctBach      453.9     3609.9   0.126    0.900    
## PctEld      -273.6     2853.8  -0.096    0.924    
## PctFB      17806.1   107586.4   0.166    0.869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df     F p-value   
## s(X,Y):Intercept 14.807 17.653 1.185 0.29120   
## s(X,Y):PctBach    8.635 10.657 1.433 0.16957   
## s(X,Y):PctEld     7.653  9.268 0.945 0.49627   
## s(X,Y):PctFB     22.984 25.572 2.369 0.00145 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 132/135
## R-sq.(adj) =  0.805   Deviance explained = 99.1%
## GCV = 3.345e+07  Scale est. = 2.1547e+07  n = 159

This returns summaries of the GGP-GAM fixed (parametric) coefficient estimates which can be interpreted as the linear terms in the model, in a similar way as the results of the standard linear regression model. Here the effect of the sufficiently large \(k\) can be seen: the Intercept has a significant global effect, but the other covariates do not .

Summaries of the spline smooth terms are also reported. The edf (effective degrees of freedom) summarises the complexity of the spline smooths, with an edf value of 1 indicating a straight line, 2 a quadratic curve etc. Higher edf values indicate increasing non-linearity in the relationship between the covariate and the response. Here, these are for each covariate over a 2-dimensional space defined by \((X,Y)\). The p-values relate to splines / smooths defined over this geographic space and their significance can be interpreted as indicating whether they vary locally over space. In contrast to the parametric coefficients the GGP splines for PctFB is locally significant (i.e. its relationship with the target variable \(y\) varies locally), whereas the GGP spline for the others do not.

This is a very different model to those suggested by GWR and MGWR: The GAM model suggests a large and significant global Intercept and a significant local effect associated with PctFB.

8.4.4 GGP-GAM SVCs

It also is possible to generate the GGP-GAM SVCs and how the relationships between \(y\) and the \(x\)s vary. The function below does this for all of the covariates. What it does in outline is in turn, to set each variable to 1 and all the others to 0, and then uses the model to predict the coefficients for that variable (covariate). The predict function is also used to do this and to calculate the standard error of the prediction.

# function
ggp_gam_svc = function(model, terms, input_data) {
    n_t = length(terms)
    input_data_copy = input_data
    output_data = input_data
    for (i in 1:n_t) {
        zeros = rep(0, n_t)
        zeros[i] = 1
        df.j = vector()
        terms_df = data.frame(matrix(rep(zeros, nrow(input_data)), ncol = n_t, byrow = T))
        names(terms_df) = terms
        input_data_copy[, terms] = terms_df
        se.j = predict(model, se = TRUE, newdata = input_data_copy)$se.fit
        b.j  = predict(model,newdata=input_data_copy)
        expr1 = paste0("b_", terms[i], "= b.j")
        expr2 = paste0("se_",terms[i], "= se.j")
        output_data = output_data %>% 
            mutate(within(., !!parse(text = expr1))) %>% 
            mutate(within(., !!parse(text = expr2))) 
    }
    output_data$yhat = predict(model, newdata = input_data)
    output_data
}
# apply it
terms = c("Intercept", "PctBach", "PctEld", "PctFB")
gam_svc = ggp_gam_svc(gam.m, terms, df.gam)
# check the results
gam_svc |>
  select(starts_with("b_")) |> 
  sapply(summary) |> 
  t() |> round(1)
##                 Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## b_Intercept  36238.8 51592.3 56266.4 56001.1 62292.5 70493.2
## b_PctBach     -206.2   164.9   514.7   569.7   942.0  1455.1
## b_PctEld     -2827.7 -2294.6 -1809.4 -1872.6 -1613.1  -463.6
## b_PctFB     -12970.0 -5558.8 -1805.4 -2833.9  -135.3  5261.5

The SVC results and areas of significance can be mapped. The code below assigns the georgia geometry to the results of the SVC and maps the SVC of the significant covariates with its standard errors. Note the assignment of the geometry to the SVC object:

The code below creates the maps for the local covariates:

# assign geometry 
st_geometry(gam_svc) <- st_geometry(georgia)
p7 = tm_shape(gam_svc) +
  tm_fill("b_PctFB",midpoint = 0, style = "kmeans") +
  tm_style("col_blind")+
  tm_layout(legend.position = c("right","top"), frame = F) 
p8 = tm_shape(gam_svc) +
  tm_fill("se_PctFB", palette = "viridis", style = "kmeans") +
  tm_layout(legend.position = c("right","top"), frame = F)
tmap_arrange(p7, p8, nrow = 1)
Map of the GGP-GAM SVC with standard errors.

Figure 8.8: Map of the GGP-GAM SVC with standard errors.

You could compare the GWR, MGWR and GGP-GAM outputs. They indicate subtly different spatial processes.

tmap_arrange(p5, p6, p7)

8.4.5 GGP-GAM summary

This GGP-GAM generates a very different output the to MGWR, with many more coefficients (try running coef(gam.m)). The GAM SVC is quick and has a clear theoretical framework. However they lack an explicit quantification fo the scale of the relationship between target and response variables as are generated by GWR and MGWR bandwidths. In contrast they do however provide a measure of the spatial complexity of the predictor to response relationship and generate more reliable models (compare the \(R^2\) fit metrics of the 3 models).

Some useful descriptions of GAMs can be found at

And this is a great video if you have a spare hour:

8.5 Answers to Tasks

Task 1

Create a function for creating a tmap of coefficients held in an sf polygon object, whose values flip, highlighting significant observations.

The function below takes three inputs:

  • x an sf object
  • var_name the name of a variable in the GWR output
  • vr_name_TV the name of the t-values for the variable in the GWR output
my_map_signif_coefs_func = function(x, var_name, var_name_TV) {
  # determine which are significant
  tval = x %>% dplyr::select(all_of(var_name_TV)) %>% st_drop_geometry()
  signif = tval < -1.96 | tval > 1.96
  # map the counties
  p_out = tm_shape(x) +
    tm_fill(var_name,midpoint = 0) + tm_style("col_blind")+
    tm_layout(legend.position = c("right","top"))+
    # now add the tvalues layer
    tm_shape(x[signif,]) + tm_borders()
  # return the output
  p_out
}

Test it!

my_map_signif_coefs_func(x = gwr_sf, var_name = "PctFB", var_name_TV = "PctFB_TV")
my_map_signif_coefs_func(x = gwr_sf, var_name = "PctBach", var_name_TV = "PctBach_TV")

Appendix: SVCs over interpolation grids.

It is also possible to use estimate SVCs over grids. GWR and GGP-GAMs work with point locations. In the examples above these were the polygon centroid and the results were attached back to the polygon, either within the GWR / MGWR functions or by assigning the geometry after estimating the SVCs in the GGP-GAM approach. However sometimes, we only have points and we want to estimate SVCs surfaces. Consider the example below:. The points are in red and the locations that we want to estimate SVCs over are in black.

# create georgia outline
gg_outline <- st_sf(st_union(georgia))
# create county points
gg_pt = st_centroid(georgia)
## Warning: st_centroid assumes attributes are constant over geometries
# have a look 
gg_pt
## Simple feature collection with 159 features and 16 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 949905.4 ymin: -674547.7 xmax: 1392468 ymax: -217623.8
## Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
## First 10 features:
##    Latitude  Longitud TotPop90 PctRural PctBach PctEld PctFB PctPov PctBlack
## 1  31.75339 -82.28558    15744     75.6     8.2  11.43  0.64   19.9    20.76
## 2  31.29486 -82.87474     6213    100.0     6.4  11.77  1.58   26.0    26.86
## 3  31.55678 -82.45115     9566     61.7     6.6  11.11  0.27   24.1    15.42
## 4  31.33084 -84.45401     3615    100.0     9.4  13.17  0.11   24.8    51.67
## 5  33.07193 -83.25085    39530     42.7    13.3   8.64  1.43   17.5    42.39
## 6  34.35270 -83.50054    10308    100.0     6.4  11.37  0.34   15.1     3.49
## 7  33.99347 -83.71181    29721     64.6     9.2  10.63  0.92   14.7    11.44
## 8  34.23840 -84.83918    55911     75.2     9.0   9.66  0.82   10.7     9.21
## 9  31.75940 -83.21976    16245     47.0     7.6  12.81  0.33   22.0    31.33
## 10 31.27424 -83.23179    14153     66.2     7.5  11.98  1.19   19.3    11.62
##           X       Y    ID     Name MedInc                      geom class
## 1  941396.6 3521764 13001  Appling  32152 POINT (1288960 -549806.6)     1
## 2  895553.0 3471916 13003 Atkinson  27657 POINT (1240651 -607485.6)     1
## 3  930946.4 3502787 13005    Bacon  29342 POINT (1276765 -573573.3)     1
## 4  745398.6 3474765 13007    Baker  29610 POINT (1093093 -623515.3)     4
## 5  849431.3 3665553 13009  Baldwin  36414 POINT (1179548 -416562.8)     2
## 6  819317.3 3807616 13011    Banks  41783 POINT (1137903 -277270.2)     3
## 7  803747.1 3769623 13013   Barrow  49829 POINT (1123611 -319834.5)     3
## 8  699011.5 3793408 13015   Bartow  47309 POINT (1017758 -305414.5)     3
## 9  863020.8 3520432 13017 Ben Hill  28009 POINT (1201770 -560855.1)     1
## 10 859915.8 3466377 13019  Berrien  33786 POINT (1208139 -614350.9)     3
##        Inc
## 1  low_med
## 2      low
## 3      low
## 4      low
## 5     high
## 6     high
## 7     high
## 8     high
## 9      low
## 10 low_med
# make a 5km grid geometry 
gr_geom = st_centroid(st_make_grid(gg_outline, 5000))
# clip
gg_gr = data.frame(ID = 1:length(gr_geom))
# assign geometry 
st_geometry(gg_gr) <- gr_geom
# have a look
gg_gr
## Simple feature collection with 9797 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 941720.7 ymin: -698778.2 xmax: 1421721 ymax: -198778.2
## Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
## First 10 features:
##    ID                   geometry
## 1   1 POINT (941720.7 -698778.2)
## 2   2 POINT (946720.7 -698778.2)
## 3   3 POINT (951720.7 -698778.2)
## 4   4 POINT (956720.7 -698778.2)
## 5   5 POINT (961720.7 -698778.2)
## 6   6 POINT (966720.7 -698778.2)
## 7   7 POINT (971720.7 -698778.2)
## 8   8 POINT (976720.7 -698778.2)
## 9   9 POINT (981720.7 -698778.2)
## 10 10 POINT (986720.7 -698778.2)
# map
tm_shape(gg_outline) + tm_borders() +
  tm_shape(gg_gr) + tm_dots() +
    tm_shape(gg_pt) +tm_dots(col = "red",size = 0.2) 

The code to estimate and map the GWR regression surface is:

# create a new GWR model
m.gwr2 <- gwr.basic(MedInc~ PctBach + PctEld + PctFB, 
                adaptive = T,
                data = as(gg_pt, "Spatial"),
                regression.points = as(gg_gr, "Spatial"),
                bw = bw)  
# convert the SDF to sf
gwr_sf2 = st_as_sf(m.gwr2$SDF)
coords = st_coordinates(gwr_sf2)
gwr_sf2 = cbind(gwr_sf2, coords)
# map - here `i think ggplot is much better for surfaces
# I also like the cols4all palettes 
library(cols4all)
# have a look at them 
# c4a_gui()
ggplot() + 
  geom_raster(data = gwr_sf2, aes(x = X, y = Y, fill = PctBach)) +
    scale_fill_continuous_c4a_seq(palette="kovesi.linear_wh_rd_bk", name = "% Bachelors") +
  geom_sf(data = gg_outline, col = "black", fill = NA) +
  theme_bw() +
  coord_sf() +
  theme(legend.position = "bottom",
          axis.title=element_blank(),
            axis.text=element_blank(),
            axis.ticks=element_blank(),
          legend.text = element_text(size=8))

References

Brunsdon, Chris, A Stewart Fotheringham, and Martin E Charlton. 1996. “Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity.” Geographical Analysis 28 (4): 281–98.
Comber, Alexis, Christopher Brunsdon, Martin Charlton, Guanpeng Dong, Richard Harris, Binbin Lu, Yihe Lü, et al. 2023. “A Route Map for Successful Applications of Geographically Weighted Regression.” Geographical Analysis 55 (1): 155–78.
Comber, Alexis, Martin Callaghan, Paul Harris, Binbin Lu, Nick Malleson, and Chris Brunsdon. 2022. “Gwverse: A Template for a New Generic Geographically Weighted r Package.” Geographical Analysis 54 (3): 685–709.
Comber, Alexis, Paul Harri, Daisuke Murakami, Tomoki Nakaya, Naru Tsutsumida, Takahiro Yoshida, and Chris Brunsdon. n.d. “Spatially Varying Coefficient Modelling with a Geographical Gaussian Process GAM (GGP-GAM).” Paper Submitted to Geographical Analysis.
Comber, Alexis, Paul Harris, and Chris Brunsdon. 2023. “Multiscale Spatially Varying Coefficient Modelling Using a Geographical Gaussian Process GAM.” International Journal of Geographical Information Science, 1–21.
Fahrmeir, Ludwig, Thomas Kneib, Stefan Lang, and Brian D Marx. 2021. “Regression Models.” In Regression, 23–84. Springer.
Fotheringham, A Stewart, Chris Brunsdon, and Martin Charlton. 2003. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. John Wiley & Sons.
Fotheringham, A Stewart, Wenbai Yang, and Wei Kang. 2017. “Multiscale Geographically Weighted Regression (MGWR).” Annals of the American Association of Geographers 107 (6): 1247–65.
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.
Gollini, Isabella, Binbin Lu, Martin Charlton, Christopher Brunsdon, and Paul Harris. 2013. “GWmodel: An r Package for Exploring Spatial Heterogeneity Using Geographically Weighted Models.” arXiv Preprint arXiv:1306.0413.
Goodchild, Michael F. 2004. “The Validity and Usefulness of Laws in Geographic Information Science and Geography.” Annals of the Association of American Geographers 94 (2): 300–303.
Hastie, T, and R Tibshirani. 1990. “Generalized Additive Models. Chapman Hall & CRC.” Monographs on Statistics & Applied Probability. Chapman and Hall/CRC 1.
Lu, Binbin, Paul Harris, Martin Charlton, and Chris Brunsdon. 2014. “The GWmodel r Package: Further Topics for Exploring Spatial Heterogeneity Using Geographically Weighted Models.” Geo-Spatial Information Science 17 (2): 85–101.
Openshaw, Stan. 1996. “Developing GIS-Relevant Zone-Based Spatial Analysis Methods.” Spatial Analysis: Modelling in a GIS Environment, 55–73.
Oshan, Taylor M, Ziqi Li, Wei Kang, Levi J Wolf, and A Stewart Fotheringham. 2019. “Mgwr: A Python Implementation of Multiscale Geographically Weighted Regression for Investigating Process Spatial Heterogeneity and Scale.” ISPRS International Journal of Geo-Information 8 (6): 269.
Wood, Simon N. 2017. Generalized Additive Models: An Introduction with r. CRC press.
Yang, Wenbai. 2014. “An Extension of Geographically Weighted Regression with Flexible Bandwidths.” PhD thesis, University of St Andrews.
Zschech, Patrick, Sven Weinzierl, Nico Hambauer, Sandra Zilker, and Mathias Kraus. 2022. “GAM (e) Changer or Not? An Evaluation of Interpretable Machine Learning Models Based on Additive Model Constraints.” arXiv Preprint arXiv:2204.09123.