Part 1 GAM with Splines
In this session you will:
- Set up your RStudio session, loading data and packages
- Undertake and unpick a standard GAM regression
- Extend this to a GAM with splines (also referred to as “smooths”)
1.1 Data and Packages
The first code chuck loads the packages required, checking for the presence on your computer and installing them if necessary. Remember that you can execute chunks of code by highlighting it and clicking the Run button, or by placing your cursor inside it and pressing Cmd+Shift+Enter.
# package list
<- c("parlitools", "tidyverse","sf", "mgcv", "ggspatial", "cols4all", "cowplot")
packages # check which packages are not installed
<- packages[!packages %in% installed.packages()[, "Package"]]
not_installed # install missing packages
if (length(not_installed) > 1) {
install.packages(not_installed, repos = "https://cran.rstudio.com/", dep = T)
}# load the packages
library(parlitools) # for UK voting data
library(tidyverse) # for data manipulation and plotting
library(sf) # for spatial data
library(mgcv) # for GAMs
library(ggspatial) # for map backdrops
library(cols4all) # for nice shading in graphs and maps
library(cowplot) # for managing plots
Next some data of the 2016 UK Brexit vote are created. Census and voting data can be obtained from the parlitools
R package (Odell 2020). This includes data on voting for the 632 Westminster parliamentary constituencies in Great Britain (i.e. excluding those in Northern Ireland, for which some data is missing). Information on the data and the package can be found at . The voting data were linked to socio-economic factors that were associated with the Leave vote as described in Beecham et al. (2020), from census data also contained in the parlitools
R package. The data are reported over hexagonal cartograms. (NB The true constituency areas are available from the UK Office of National Statistics at https://geoportal.statistics.gov.uk/datasets/d3650c649bff40768badafdf0518a4af_0/.)
## 1.1 create the data
# Great Britain, not NI
<- str_detect(west_hex_map$gss_code,'^N')
NI <- sum(!NI) ## Cheeky recast of logicals to numeric 0/1
n <- west_hex_map[!NI,] ## Subset of map
hex.sp st_crs(hex.sp) = st_crs(hex.sp)
## 1.2 Join the other data tables
<- hex.sp %>% rename(ons_const_id=gss_code) %>%
hex.sp left_join(bes_2015,by = 'ons_const_id') %>%
left_join(leave_votes_west, by = 'ons_const_id') %>%
left_join(census_11, by = 'ons_const_id')
## 1.3 Select variables for analysis
<-
hex.sp %>%
hex.sp mutate(over65 = age_65_to_74+ age_75_to_84 + age_85_to_89 + age_90_plus,
ttu = nssecsemi_routine + nssecroutine + nsseclower_supervisor,
lhosp = industry_accommodation,
manu = industry_manufacturing,
degree = qual_level_4,
badhealth = health_bad + health_very_bad,
bornuk = born_uk,
leave = figure_to_use *100,
to15 = turnout_15) %>%
select(ons_const_id, leave, to15, over65, ttu, lhosp, manu, degree, badhealth, bornuk, region_name)
## 1.4 create regions
# tidy region names
$region_name = gsub(" Euro Region", "", hex.sp$region_name)
hex.sp$region_name = gsub(" and the Humber", "", hex.sp$region_name)
hex.sp$region_name = gsub("Eastern", "East", hex.sp$region_name)
hex.sp<-
regions_hex.sp %>%
hex.sp group_by(region_name) %>%
summarise(geometry = sf::st_union(geometry)) %>%
ungroup()
If you want to examine the data outside or R/RStudio (for example in a GIS), it can be saved off to a GeoPackage format:
st_write(hex.sp, "hex.sp.gpkg", quiet = T, delete_layer = T)
The variables are described in Table 1.1.
Variable | Description |
---|---|
leave | Leave vote (%) |
to15 | Turn out in the 2015 general election (%) |
over65 | Over 65s (%) |
ttu | Transport trade and utilities employment (%) |
lhosp | Leisure & hospitality employment (%) |
manu | Manufacturing employment (%) |
degree | Degree (level 4) qualification (%) |
badhealth | Bad health (%) |
bornuk | UK born (%) |
The spatial distribution of the response variable and the UK regions for reference in shown in Figure 1.1. The voting pattern reflects some well recognised local trends: low leave vote in Scotland and London, high leave vote in the Midlands and North.
<-
p1 ggplot() +
geom_sf(data = hex.sp,aes(fill=(leave-50)), col = NA) +
scale_fill_continuous_c4a_div(palette="tol.sunset",name='Leave (%) - 50%' ) +
theme_bw() +
theme(axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank())
<-
p2 ggplot() +
geom_sf(data = regions_hex.sp, aes(fill = region_name), col = NA) +
scale_fill_discrete_c4a_cat(palette="brewer.paired",name='Regions') +
theme_bw() +
theme(axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank())
plot_grid(p1,p2, ncol = 2)

Figure 1.1: The Brexit leave vote majority over parliamentary constituencies with the UK regions, recast into hexagons.
1.2 Regression
The standard form for a linear regression, which are commonly solved using an ordinary least squares approach (OLS) is:
\[y = \beta_{0} + \sum_{m}^{k=1}\beta_{k}x_{k} + \epsilon \]
where \(y\) is the response variable, \(x_{k}\) is the value of the \(k^{th}\) predictor variable, \(m\) is the number of predictor variables, \(\beta_0\) is the intercept term, \(\beta_k\) is the regression coefficient for the \(k^{th}\) predictor variable and \(\epsilon\) is the random error term.
We can make an initial model of land price and examine the result:
= lm(leave~ to15 + over65 + ttu + lhosp + manu + degree + badhealth + bornuk,
reg.mod data = hex.sp)
summary(reg.mod)
##
## Call:
## lm(formula = leave ~ to15 + over65 + ttu + lhosp + manu + degree +
## badhealth + bornuk, data = hex.sp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.8099 -2.6869 0.3864 3.0490 13.9272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 147.18794 5.04424 29.179 < 2e-16 ***
## to15 -0.90338 0.06559 -13.773 < 2e-16 ***
## over65 1.29320 0.07921 16.327 < 2e-16 ***
## ttu -0.25512 0.07267 -3.511 0.000479 ***
## lhosp -1.16661 0.13615 -8.569 < 2e-16 ***
## manu 0.56592 0.08414 6.726 3.95e-11 ***
## degree -0.99318 0.05906 -16.817 < 2e-16 ***
## badhealth -2.05115 0.20333 -10.088 < 2e-16 ***
## bornuk -0.09962 0.03339 -2.984 0.002957 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.088 on 623 degrees of freedom
## Multiple R-squared: 0.8047, Adjusted R-squared: 0.8022
## F-statistic: 320.8 on 8 and 623 DF, p-value: < 2.2e-16
We have a model with a good \(R^2\) value, describing the relationship between the different covariates and the leave vote. And the coefficient estimates can be interpreted in the usual way:
- each additional 1% of the electorate that voted in the 2015 election, is associated with a 0.9% decrease in the leave vote;
- each additional 1% of the population who are 0ver 65, is associated with a 1.3% increase in the leave vote;
- …etc
This should all be relatively familiar, and sets the scene for GAMs in the next section.
1.3 GAM Regression
GAMs are Generalised Additive Models. They were originally proposed by Hastie and Tibshirani (1987). The gam
function in the mcgv
package (Wood 2018) can be used to fit a regression model in the same way,. The model created by the code below generates the same fixed parametric coefficient estimates and leading to the same interpretations:
= gam(leave~ to15 + over65 + ttu + lhosp + manu + degree + badhealth + bornuk,
gam.mod data = hex.sp)
summary(gam.mod)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## leave ~ to15 + over65 + ttu + lhosp + manu + degree + badhealth +
## bornuk
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 147.18794 5.04424 29.179 < 2e-16 ***
## to15 -0.90338 0.06559 -13.773 < 2e-16 ***
## over65 1.29320 0.07921 16.327 < 2e-16 ***
## ttu -0.25512 0.07267 -3.511 0.000479 ***
## lhosp -1.16661 0.13615 -8.569 < 2e-16 ***
## manu 0.56592 0.08414 6.726 3.95e-11 ***
## degree -0.99318 0.05906 -16.817 < 2e-16 ***
## badhealth -2.05115 0.20333 -10.088 < 2e-16 ***
## bornuk -0.09962 0.03339 -2.984 0.002957 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## R-sq.(adj) = 0.802 Deviance explained = 80.5%
## GCV = 26.257 Scale est. = 25.883 n = 632
However, GAMs also allow users to examine non-linear relationships through the uses of smooths or splines (more on these later). The basic ideas behind the GAM framework are that:
- Relationships between the dependent variable and predictor variables follow smooth patterns that can be linear or non-linear.
- It is possible to estimate these smooth relationships individually and simultaneously and then predict the target variable by adding them up.
These ideas are illustrated in the figure below from https://multithreaded.stitchfix.com/blog/2015/07/30/gam/:
where \(Y\) is the target variable \(E(Y)\) is the expected value, \(g(Y)\) is the link function that links the expected value to the predictor variables \(x_1 \dots x_p\), and the terms \(s_1(x_1), \dots , s_p(x_p)\) indicate the smooth, nonparametric (non-linear) functions. Formal descriptions of GAMs and splines are provided in Part 2.
The code below plots the target variable against the predictor variables. Here is is evident that each of the relationships with the target variable are relatively smooth, but importantly that the the assumption of linear relationships estimated under a standard regression may not be appropriate in all cases:
%>% st_drop_geometry() %>%
hex.sp select(-ons_const_id , -region_name) %>%
pivot_longer(-leave) %>%
ggplot(aes(leave, value)) +
#geom_point(alpha = 0.2) +
geom_smooth() +
facet_wrap(~name, scales = "free")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Figure 1.2: Scatterplots of the target variable (Brexit leave vote) with the covariates.
1.4 GAMs with Splines / Smooths
To handle these non-linearities GAMs can be specified with smooths, which are also referred to as splines. These are best illustrated through some simulated data which is created in the code snippet below along with a standard linear model:
# make simulated data
set.seed(1)
<- gamSim(1,n=100,scale=2)[, 1:4] dat
## Gu & Wahba 4 term additive model
head(dat)
## y x0 x1 x2
## 1 14.531995 0.2655087 0.6547239 0.2675082
## 2 16.113160 0.3721239 0.3531973 0.2186453
## 3 9.583515 0.5728534 0.2702601 0.5167968
## 4 15.686696 0.9082078 0.9926841 0.2689506
## 5 8.221554 0.2016819 0.6334933 0.1811683
## 6 9.903443 0.8983897 0.2132081 0.5185761
# create OLS regression model
= lm(y ~ x1 + x2, data = dat)
mod summary(mod)
##
## Call:
## lm(formula = y ~ x1 + x2, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8749 -1.3978 -0.0907 1.6120 7.9196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.2807 0.7534 9.664 7.14e-16 ***
## x1 8.0138 1.0482 7.645 1.51e-11 ***
## x2 -5.9673 1.0222 -5.838 7.01e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.835 on 97 degrees of freedom
## Multiple R-squared: 0.4859, Adjusted R-squared: 0.4753
## F-statistic: 45.84 on 2 and 97 DF, p-value: 9.685e-15
Everything is fine superficially here: we can see a positive effect of \(x_1\) on \(y\), a negative one for \(x_2\), with a reasonable \(R^2\). A similar story is found if some diagnostic plots are generated as in Figure 1.3:
# Residuals vs. Fitted plot
= data.frame(fitted = mod$fitted.values, resids = mod$residuals) %>%
p1 ggplot(aes(x = fitted, y = resids)) +
geom_point(col = "tomato", alpha = 0.5, pch = 19) +
theme_bw() + ggtitle("Residuals vs Fitted") +
geom_smooth(se = FALSE, col = "tomato") +
xlab("Fitted values") + ylab("Residuals")
# Quantiles (QQ) plot
= data.frame(studentised.resids = rstudent(mod)) %>%
p2 ggplot(aes(sample = studentised.resids)) +
stat_qq(col = "dodgerblue", pch = 1, cex = 2.5) +
stat_qq_line(col = "dodgerblue", lty = 2) +
theme_bw() + ggtitle("Normal QQ") +
xlab("Theoertical Quantiles") + ylab("Studentised Residuals")
plot_grid(p1, p2, ncol = 1)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Figure 1.3: Diagnostics pllots of the OLS regression of simulated data.
So far so good. However, if the relationships between the target variable \(y\) and the 2 covariates are examined as in Figure 1.4, then there are perhaps some reasons for concern: although there is a broadly linear and the positive effect for \(x_1\), it is a bit curvy; but the relationship with \(x_2\) is difficult to characterise in a linear way - it is initially sharply positive for low values, and then negative.
%>% select(y, x1, x2) %>%
dat pivot_longer(-y) %>%
ggplot(aes(x = value, y = y)) +
geom_point() +
facet_wrap(~name) +
xlab("")+ ylab("leave") +
geom_smooth(se = FALSE) +
theme_bw()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Figure 1.4: Scatterplots of the simualted target variable agsinst the covariates.
This is a very common situation: the assumptions of linearity of a standard linear model are often violated, or the model does not capture the nuances of the data very well. Options for handling this may be to explore non-linear relationships such as polynomial regression with quadratic or cubic terms, or through more complex specifications involving sines or cosines etc.
However, these may fail with highly complex relationships, particularly they may fail to capture the true amount of ‘wiggliness’ in the data or they may overfit it. And we frequently have to randomly guess about the form of the relationship until we find a decent model form and associated fits.
One way of dealing with this non-linearity and unknown wiggliness is to fit a series of mini models to different parts of the data in a piecewise way. This can be done by dividing the data into chunks at various points (or knots) along the x-axis, and then fit a linear or polynomial model to that subset of data. The following code fits a linear regression for subsets of \(x_2\) identified manually from the figure above and plots them (Figure ??).
# subset - these define the knots
= dat %>% filter(x2< 0.25)
xa = dat %>% filter(x2>= 0.25 & x2 < 0.45)
xb = dat %>% filter(x2>= 0.45 & x2 < 0.75)
xc = dat %>% filter(x2>= 0.75)
xd # plot
= ggplot(data = rbind(xa, xb, xc, xd), aes(x = x2, y = y)) +
p1 geom_point(col = NA) +
geom_point(data = xa, aes(x2, y), col = "tomato") +
stat_smooth(data = xa, aes(x2, y), method = "lm",
se = FALSE, color = "tomato") +
geom_point(data = xb, aes(x2, y), col = "dodgerblue") +
stat_smooth(data = xb, aes(x2, y), method = "lm",
se = FALSE, color = "dodgerblue") +
geom_point(data = xc, aes(x2, y), col = "goldenrod") +
stat_smooth(data = xc, aes(x2, y), method = "lm",
se = FALSE, color = "goldenrod") +
geom_point(data = xd, aes(x2, y), col = "darkgreen") +
stat_smooth(data = xd, aes(x2, y), method = "lm",
se = FALSE, color = "darkgreen")
And it is easy to can see how these could be extended to other relationships for each subset. The code below uses the example of quadratic relationships between the x2
and y
in the subsets and plots them in Figure 1.5:
= ggplot(data = rbind(xa, xb, xc, xd), aes(x = x2, y = y)) + geom_point(col = NA) +
p2 geom_point(data = xa, aes(x2, y), col = "tomato") +
stat_smooth(data = xa, aes(x2, y), method = "lm",
formula = y ~ x + I(x^2),
se = FALSE, color = "tomato") +
geom_point(data = xb, aes(x2, y), col = "dodgerblue") +
stat_smooth(data = xb, aes(x2, y), method = "lm",
formula = y ~ x + I(x^2),
se = FALSE, color = "dodgerblue") +
geom_point(data = xc, aes(x2, y), col = "goldenrod") +
stat_smooth(data = xc, aes(x2, y), method = "lm",
formula = y ~ x + I(x^2),
se = FALSE, color = "goldenrod") +
geom_point(data = xd, aes(x2, y), col = "darkgreen") +
stat_smooth(data = xd, aes(x2, y), method = "lm",
formula = y ~ x + I(x^2),
se = FALSE, color = "darkgreen")
plot_grid(p1, p2)

Figure 1.5: Scatterplot of the x2 covariate, subseted into chunks with ‘within-variable’ quadratic regression trend lines.
You can see from the plots above how, for a single variable - recall we are just examining x2
in the simulated data, many different relationships with the target variable (y
) may be present within the data, and how these can be extracted using knots.
In the code above, the knots - points at which a within-data local relationships were found - were identified manually and mini-models for each subset were only created in within the code for the plot.
However, these ideas of multiple “within the variable” models, underpin the notion of splines or smooths in a Generalized Additive Model. A formal definition is provided in the next session.
For a GAM with splines can be fitted to the data:
= gam(y ~ s(x1) + s(x2), data = dat)
gam.mod gam.check(gam.mod)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 29 iterations.
## The RMS GCV score gradient at convergence was 4.258793e-05 .
## The Hessian was positive definite.
## Model rank = 19 / 19
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(x1) 9.00 5.71 1.16 0.92
## s(x2) 9.00 5.72 0.99 0.43
summary(gam.mod)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x1) + s(x2)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.832 0.213 41.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x1) 5.710 6.859 10.86 <2e-16 ***
## s(x2) 5.723 6.863 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.704 Deviance explained = 73.8%
## GCV = 5.1794 Scale est. = 4.5354 n = 100
The model is well tuned with 9 knots for each covariate and a fixed term for the intercept. The fit has increased from the OLS regression model (mod
) defined earlier and the coefficients can be extracted:
coef(gam.mod)
## (Intercept) s(x1).1 s(x1).2 s(x1).3 s(x1).4 s(x1).5
## 8.8317045 1.3821500 1.4110734 2.5028131 2.2725032 0.9366032
## s(x1).6 s(x1).7 s(x1).8 s(x1).9 s(x2).1 s(x2).2
## -1.3508776 1.1290149 -4.1634269 -1.7907158 -5.0219858 -15.5418200
## s(x2).3 s(x2).4 s(x2).5 s(x2).6 s(x2).7 s(x2).8
## -3.6806831 -5.4304512 -1.4428636 5.1484617 1.4518634 18.6901232
## s(x2).9
## 6.3781799
Notice that there is one coefficient for fixed term and each knot in the smooth. We can see, for example,that the coefficients for x1
have mostly positive slopes but also negative ones in places within the data, and that those for x2
are mostly negative.
plot(coef(gam.mod)[2:10])
abline(h = 0)
Returning to the Brexit data, a GAM with splines / smooths is constructed using the code below. Note how There are many types of spline / smooths (see the help page help(s)
for the full set of options). Here the defaults for the gam
function were accepted.
<- gam(leave~ s(to15) + s(over65) + s(ttu) + s(lhosp) + s(manu) +
gam.bx.mod s(degree) + s(badhealth) + s(bornuk),
data = hex.sp)
It is possible to check the model to see whether the number of knots has been sufficiently optimised using the gam.check
function from the mcgv
package. This prints out a number of diagnostic plots as before: QQ-plot, residuals against predictors, a histogram of residuals and response (\(y\)) vs. fitted values (\(\hat{y})\).
gam.check(gam.bx.mod)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 1.387693e-06 .
## The Hessian was positive definite.
## Model rank = 73 / 73
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(to15) 9.00 2.76 0.97 0.21
## s(over65) 9.00 2.90 0.98 0.28
## s(ttu) 9.00 4.76 1.02 0.67
## s(lhosp) 9.00 2.47 1.04 0.84
## s(manu) 9.00 2.74 1.02 0.67
## s(degree) 9.00 2.80 0.90 <2e-16 ***
## s(badhealth) 9.00 4.41 1.00 0.42
## s(bornuk) 9.00 1.00 0.94 0.09 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perhaps of most interest is the printed summary of the gam.check
function, which indicates the number of knots, \(k\) in each spline / smooth and values for the effective degrees of freedom (edf
). The edf
values can be interpreted as an indicator of the non-linearity of the relationship of the response (pricel
) with the predictor variables - i.e. the effective degrees of freedom would have a value of ~1 if the model penalized the smooth term to a near linear relationship. Higher edf
values indicate increasing non-linearity in the relationship. Here all edf
values are less than \(k\) indicating that the model is adequately specified. Here the p-values relate to the knots parameter \(k\) and whether it may be too low, especially if the reported edf
is close to the k'
value, the maximum possible EDF for the term. These all look OK.
In the mcgv
package, the parameter which controls the degree of smoothing of the data via spline knots is optimised during GAM fitting. This will be returned to in the context of spatially varying coefficient models with Gaussian Process splines in the next session.
Finally, it is also possible to examine the full summary of the GAM including fixed (parametric) coefficient estimates, which can be can be interpreted as linear terms in the model, and a summary of the smooths. All are significant and the model fit (\(R^2\)) has increased compared to the initial models (reg.mod
and gam.mod
).
summary(gam.bx.mod)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## leave ~ s(to15) + s(over65) + s(ttu) + s(lhosp) + s(manu) + s(degree) +
## s(badhealth) + s(bornuk)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.0583 0.1907 272.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(to15) 2.764 3.532 59.02 < 2e-16 ***
## s(over65) 2.898 3.760 80.85 < 2e-16 ***
## s(ttu) 4.757 5.880 4.25 0.000402 ***
## s(lhosp) 2.474 3.133 12.95 < 2e-16 ***
## s(manu) 2.742 3.517 14.53 < 2e-16 ***
## s(degree) 2.796 3.608 61.85 < 2e-16 ***
## s(badhealth) 4.408 5.469 21.21 < 2e-16 ***
## s(bornuk) 1.000 1.000 10.39 0.001332 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.824 Deviance explained = 83.1%
## GCV = 23.936 Scale est. = 22.996 n = 632
For the smooth terms (splines), the p-values relate to the smooths defined over the variable space, and their significance can be interpreted as indicating whether they vary within that space. The p-values indicate that all of the covariates are significant locally except manu
and badhealth
. Any insignificant p-values can be interpreted as indicating that although there are effects, these do not vary locally within this attribute space.
Again the coefficients can be examined, and here the bornuk
ones are extremely low.
= matrix(coef(gam.bx.mod)[-1], ncol = 8)
mat colnames(mat) = names(hex.sp)[3:10]
mat
## to15 over65 ttu lhosp manu degree
## [1,] -0.03208693 -0.86832884 0.1146659 0.09466284 1.00212782 0.12030573
## [2,] 0.88319856 0.03839976 -3.0721789 0.76683457 0.34803006 0.18855487
## [3,] 0.09169345 -0.35863446 -0.8519068 -0.03125884 -0.09331350 0.04480240
## [4,] 0.69376063 0.17423738 0.6588155 -0.83502142 0.09496868 -0.31274332
## [5,] -0.18046641 0.30819461 -0.1973320 0.29238087 0.02102715 0.07893378
## [6,] -0.65060032 -0.16447298 0.7619501 -0.79444011 -0.11200009 0.46008706
## [7,] -0.21480666 0.31592848 -0.1920892 0.37609755 0.01249560 0.04370047
## [8,] 4.22577112 3.15070352 -3.7124649 -3.86661485 0.49231716 -2.98262376
## [9,] -4.94241406 4.89424920 -1.1810484 -1.08596851 1.14810055 -8.24086777
## badhealth bornuk
## [1,] 1.15025416 -3.596843e-10
## [2,] -3.02825998 -3.876715e-10
## [3,] -0.16663199 1.938178e-11
## [4,] 1.27332722 -2.366977e-10
## [5,] 0.08460121 -1.293325e-12
## [6,] -1.33469348 1.983867e-10
## [7,] -0.20290879 -4.053034e-13
## [8,] 7.41570212 6.654922e-10
## [9,] -4.91423243 -1.370869e+00
1.5 Summary
The aim of this session was to set the scene for spatially varying coefficient models with GAMs in the next session. It aimed to introduce and illustrate the workings of GAMs and particularly to provide a generalised understanding of what GAM smooths do and how they do it. A key consideration of the specification of knots (\(k\)) in splines and thus nature of the smoothing they specify. These are optimised by the gam
function in the mcgv
package.