Chapter 9 Location Allocations
9.1 Setup: Packages and Data
You will need to load the following packages, testing for the presence on your R installation as before (although in a different way!):
<- c("sf", "tidyverse","tmap", "devtools", "gtools")
packages # check which packages are not installed
<- packages[!packages %in% installed.packages()[, "Package"]]
not_installed # install missing packages
if (length(not_installed) > 1) {
install.packages(not_installed, repos = "https://cran.rstudio.com/", dep = T)
}# load the packages
library(sf) # for spatial data
library(tidyverse) # for data wrangling
library(tmap) # for mapping
library(devtools)
library(gtools)
We also need the tbart
package and this needs to be installed in a different way using the install_version
function of the devtools
package and the load the package with the library
function.
install_version("tbart", version = "1.0", dep = T)
library(tbart)
And you will need to load the georgia
data as in previous practicals.
download.file("https://github.com/gwverse/gw/blob/main/data/georgia.rda?raw=True",
"./georgia.RData", mode = "wb")
load("georgia.RData")
You should examine this in the usual way:
# spatial data
georgiasummary(georgia)
tm_shape(georgia) + tm_borders()
tm_shape(georgia) + tm_polygons("MedInc")
You will also need some data from the VLE for later sections:
= st_read("Trafford_WPZsPop.gpkg", quiet = T) %>% `st_crs<-`(27700) wz
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
= read.csv("geog3191grocers2016.csv") stores
9.2 Location allocation overview
Location allocation is a suite of techniques for determining optimal facility siting and locations relative to some measure of spatial demand. It can be used in a number of ways:
- to determine locations for new facilities (shops, hospitals, etc).
- to reorganise or reallocate existing facilities (in order to better service the demand for them).
- to determine which facilities to close based on a set of spatial constraints.
In this sense location allocation relates to supply and demand. What it seeks to do is to identify the locations that best satisfy spatially distributed demand.
Supply can be considered as the provision of facilities at specific locations, either current or future. Often the resources need to supply the demands at any given location are assumed to exist - nurses needed for a hospital, deliveries for a shop, but they also involve some weighting based on supply capacity of some kind, for example floor area as is used in retail geography.
Demand maybe conceived as a simple distance or other cost (financial or travel time for example) and may be weighted maybe by some measure of targeted demand. these may be simple counts of population, or targeted subgroups such existing customers, people with disabilities, children, unemployed etc.
There are a number of important and related considerations in location allocation modelling:
1. The problem of dimensionality In only very few cases is it possible to identify the best set of locations deterministically. That is, by evaluating each set of possible solutions. This is because of the extremely large number of possible sets of solutions to even very small problems. Consider the problem of identifying the best 25 locations from 10000 possible locations (for example locations spaced at 100m in a 10km x 10km study area). The number of different combinations facilities to evaluate is very large:
\[\frac{10000! }{ (25! \times (10000!-25!)}\]
This equals \(4.15e+35684\) permutations of solutions to evaluate. This is impossible to deterministically (i.e. one by one).
2. Searching through the space The dimensionality problem means that all possible sets of solutions cannot be individually evaluated. This is because the search space the number of possible solutions is too great. To overcome this what are called heuristic searches or search heuristics are used to search through the solution space. These look for local optima and try to determine whether these are globally optimal as well. Different approaches tackle this search in different ways. the basic idea is that they find parts of the solution space that are quite good and try to search deeper within these in different ways. These include things you may have heard of such as genetic algorithms, tabu searches. One such approach is the Teitz and Bart heuristic which is used in the examples in this practical.
3. Optimality and the evaluation function Location allocation seeks to determine the best or optimal set of locations for facility given demand. In order to determine the optimal set of facility locations given some spatial distribution of demand, requires that what is meant by best or optimal is defined in some way. This is done through an evaluation function which helps the heuristic search through the solution space. A common objective is, for example, to minimise demand weighted distance to a set of facilities. The evaluation function controls the heuristic search - it determines what is “good” and therefore defines the regions in the solution space that are examined more deeply.
4. Demand Weighting The evaluation function can evaluate just distances (e.g. to the nearest facility) for each demand location, or these can be weighted in some way. Weights might reflect customers, segments of the population, or other characteristics of the demand areas, such as deprivation. Many location allocation analyses seek to minimise measures such as demand weighted distance, measured in units such as person kilometres. It is a concept assumes that facility access and distances are important but the weights could be other things such as spending power. The worked example below demonstrates this.
Worked Example Consider the matrix of distances between demand locations, A, B and C and potential supply locations 1,2,3 and 4 in the table below.
# define a distance matrix
= c(10, 190, 180, 150)
A = c(180, 10, 120, 150)
B = c(150, 150, 10, 150)
C = t(data.frame(A = A,B = B,C=C))
df colnames(df) = c(1:4)
# have a look
df
## 1 2 3 4
## A 10 190 180 150
## B 180 10 120 150
## C 150 150 10 150
If we wanted to identify the best 2 locations based distance alone, then we would evaluates each set of 2 supply locations and find the set that minimises the distance of each demand location to its nearest supply location. The combinations
function is useful in this very simple case. It is used below to create an object, gr
, of all possible sets of locations:
= combinations(n=4, r =2, v= c(1:4))
gr gr
## [,1] [,2]
## [1,] 1 2
## [2,] 1 3
## [3,] 1 4
## [4,] 2 3
## [5,] 2 4
## [6,] 3 4
We can use this to evaluated the simple distance based case. The loop below takes each set of potential locations and evaluates the distance of each demand location to the nearest supply. The results show that locations 1 and 3 provide the optimal set of locations, when “optimal” is defined on distance alone.
= vector()
res.out for (i in 1:nrow(gr)) {
# select the potential locations
= unlist(gr[i,])
index # get their distances
= df[, index]
df.i # extract the min distance for each row (demand)
= sum(apply(df.i, 1, min))
res.i # add the result to the result vector
= append(res.out, res.i)
res.out
}cbind(gr, res.out)
## res.out
## [1,] 1 2 170
## [2,] 1 3 140
## [3,] 1 4 310
## [4,] 2 3 200
## [5,] 2 4 310
## [6,] 3 4 280
We can add some demand weighting:
= data.frame(df, Pop = c(100, 250, 350))
df2 df2
## X1 X2 X3 X4 Pop
## A 10 190 180 150 100
## B 180 10 120 150 250
## C 150 150 10 150 350
And use this in our evaluation of each set. This time the evaluation of each candidate location in each set incorporates distance weighted by demand population, resulting a different set of locations being identified as optimal, locations 2 and 3:
= vector()
res.out for (i in 1:nrow(gr)) {
# select the potential locations
= unlist(gr[i,])
index # get their distances
= df2[, index]
df.i # extract the min distance for each row (demand) * Pop
= sum(apply(df.i, 1, min) * df2$Pop)
res.i # add the result to the result vector
= append(res.out, res.i)
res.out
}cbind(gr, res.out)
## res.out
## [1,] 1 2 56000
## [2,] 1 3 34500
## [3,] 1 4 91000
## [4,] 2 3 24000
## [5,] 2 4 70000
## [6,] 3 4 48500
However this deterministic approach although transparent, will not work with many everyday problems and applications. The next sections provide two case studies based on published work, that are more complex and thus dimensional that this simple example.
9.2.1 Summary
- The aim of location allocation is to optimise the siting of \(n\) facilities given a set of potential locations.
- it is important to note that sets of potential solutions are evaluated together rather identifying the best location, then the next best and so on.
- The computational space can easily be very large and all effective algorithms use some kind of heuristic search to identify the \(n\) best locations.
- Typically approaches use algorithms that seek to minimise some kind of demand weighted distance (e.g. with units such as person kilometres). This results in the locations that are the most easily accessed by the most people being selected.
9.3 Allocation with the tbart
package
The tbart
package provides an implementation of Teitz and Bart’s algorithm. This attempts to find subset of size \(p\) within a set of potential supply points such that summed distances of the demand locations (also points) in the set allocated to the nearest supply point in \(p\) is minimised. The package notes that “Although generally effective, this algorithm does not guarantee that a globally optimal subset is found”.
The tbart
package hasn’t yet been updated to work with sf
format objects and so spatial data needs to be converted to sp
format. The code below uses the Georgia data sp
format so the data iin
sf` format need to be converted. What the code below below does is to determine the best 5 counties out of 159 to site some kind of facility in order to best cover the demands represented by the counties. In this first example there is no weighting by population etc, it is purely evaluated on distance.
# convert georgia to sp
= as(georgia, "Spatial")
georgia_sp
# find the best 5 counties to allocate others to
set.seed(123)
<- allocate(georgia_sp, p=5)
allocations.list
# merge the counties allocated to each point
# this function does this - run all in one go to compile it
= function(geo, group) {
st_union_by = vector()
y2 = 1
count #loop over by groups and merge units
for (i in unique(group)) {
#which units
= geo[which(group == i),]
z #merge
= (st_union(z))
y = rbind(y2, y)
y2 #y2[[count]] = y
= count + 1
count
}# create the output
= st_sfc((y2))
out # set the crs
st_crs(out) = st_crs(geo)
# return
out
}# and then call it to create the zones object
= st_union_by(georgia, allocations.list)
zones # plot the result
tm_shape(georgia) + tm_polygons() +
tm_shape(zones) + tm_borders(col = "red", lty = 2, lwd = 3) +
tm_shape(georgia[unique(allocations.list), ]) + tm_polygons("red")
What this has done is take the counties, convert them into points (their geometric centroids) and use these as both the supply and demand locations. If a distance matrix of the distances between supply and demand locations is not supplied then the allocate
function calculates it.
In the code below the distance matrix is pre calculated and is weighted by population using the attributes in the Georgia data set before being passed to the allocate function.
# convert georgia to pints (not needed but is more transparent)
= st_centroid(georgia) georgia_pt
## Warning in st_centroid.sf(georgia): st_centroid assumes attributes are constant
## over geometries of x
# distance matrix
= st_distance(georgia_pt, georgia_pt)
dmat 1:6, 1:6] dmat[
## Units: [m]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.00 75237.10 26712.93 209277.4 172409.6 311599.8
## [2,] 75237.10 0.00 49540.34 148426.6 200462.4 345831.5
## [3,] 26712.93 49540.34 0.00 190341.1 184671.5 327228.2
## [4,] 209277.45 148426.65 190341.07 0.0 224285.0 349132.7
## [5,] 172409.59 200462.42 184671.54 224285.0 0.0 145384.7
## [6,] 311599.80 345831.51 327228.18 349132.7 145384.7 0.0
# weight dmat by pop
= dmat*georgia_pt$TotPop90 dmatw
You can check that this has done what it should: consider the rows in dmat
as the demand locations and the columns as potential supply points. In the first row of the weighted distance matrix (i.e. demand weighted distance), we would expect the distances to be multiplied by the population of the first county (15744):
# check 1st row first 6 population values
head(georgia_pt$TotPop90)
## [1] 15744 6213 9566 3615 39530 10308
# check 1st row first 6 columns of dmat
1, 1:6] dmat[
## Units: [m]
## [1] 0.00 75237.10 26712.93 209277.45 172409.59 311599.80
# check 1st row first 6 columns times 1st pop value
1, 1:6] * 15744 dmat[
## Units: [m]
## [1] 0 1184532840 420568432 3294864129 2714416565 4905827189
# check 1st row first 6 columns of weighted dmat
1, 1:6] dmatw[
## Units: [m]
## [1] 0 1184532840 420568432 3294864129 2714416565 4905827189
We can now plug this weighted distance matrix into the allocation and this pulls the allocations towards the large centres of population:
set.seed(120)
<- allocate(georgia_sp, metric = dmatw, p=5)
allocations.list = st_union_by(georgia, allocations.list)
zones # plot the results
# plot the result
tm_shape(georgia) + tm_polygons() +
tm_shape(zones) + tm_borders(col = "red", lty = 2, lwd = 3) +
tm_shape(georgia[unique(allocations.list), ]) + tm_polygons("red")
Finally, we can plot the allocations using a star diagram. The function below creates this from the outputs of the allocations
function using the outputs of the allocate
function and converting them to sf
format. This generates the lines representation of the allocations.
# create allocations
<- allocations(georgia_sp,p=5,force=unique(allocations.list))
georgia_allocs # convert to sf
= st_as_sf(georgia_allocs)
georgia_allocs # create star lines function
= function (sf1,sf2, alloc) {
create_star_lines if (missing(sf2))
<- sf1
sf2 if (missing(alloc)) {
<- sf1$allocation
alloc
}<- st_coordinates(st_centroid(sf1))
co1 <- st_coordinates(st_centroid(sf2))
co2 <- matrix(nrow = 0, ncol = 5)
result for (i in 1:nrow(co1)) {
<- rbind(result, c(i, c(co1[i, 1], co2[alloc[i], 1]), c(co1[i, 2], co2[alloc[i], 2]) ) )
result
}= as.data.frame(result)
result <- apply(result, 1, function(x)
ls
{<- as.numeric(x[c(2,3,4,5)])
v <- matrix(v, nrow = 2)
m return(st_sfc(st_linestring(m), crs = st_crs(sf1)))
})= Reduce(c, ls)
ls = data.frame(ID = result$V1)
star_lines st_geometry(star_lines) = ls
return(star_lines)
}<- create_star_lines(georgia_allocs) star_lines
## Warning in st_centroid.sf(sf1): st_centroid assumes attributes are constant over
## geometries of x
## Warning in st_centroid.sf(sf2): st_centroid assumes attributes are constant over
## geometries of x
And the results can be mapped as in Figure 9.3.
# star lines
= tm_shape(georgia) + tm_borders(col = "lightgrey") +
p1 tm_shape(star_lines)+tm_lines(col = "darkred", lwd = 2)
# distance to allocation
= tm_shape(georgia_allocs) +
p2 tm_polygons(col = "allocdist",
title = "Distance",
palette = "Reds", n = 7, style = "kmeans")
tmap_arrange(p1, p2)
9.4 Location-Allocation for supply and demand modelling
The previous section determined the locations that minimised either weighted or unweighed distance between each area to a single location.
In this section a more realistic application is developed: where to locate a new convenience store. Our objective it to identify potential locations for 3 new new convenience stores. We have some data from GEOG3191 of existing grocery stores in the Trafford area of Manchester and some indication of demand through a workplace zone population, which we can use as our target population.
It is instructive to have a quick look at the data you loaded earlier. Figure 9.4 shows the workplace zones and the the head
of the stores data:
# full map with embellishments
tm_shape(wz) + tm_polygons("TotalWP", palette = "YlOrRd", n = 10, type = "kmeans") +
tm_layout(legend.position = c("left", "top")) +
tm_compass(type = "rose", position = c("left", "bottom")) +
tm_scale_bar(position = c("left", "bottom"))
head(stores)
## StoreID Brand Category Town Suburb Postcode
## 1 0253/0014 ALDI Grocery/Supermarkets MANCHESTER TYLDESLEY M29 8EW
## 2 0253/0001 ALDI Grocery/Supermarkets BIRMINGHAM WARD END B8 2AN
## 3 0253/0008 ALDI Grocery/Supermarkets BIRMINGHAM BORDESLEY GREEN B9 5SP
## 4 0253/0002 ALDI Grocery/Supermarkets SOLIHULL SHIRLEY B90 3AE
## 5 0253/6005 ALDI Grocery/Supermarkets WHITEHAVEN CA28 9DL
## 6 0253/6037 ALDI Grocery/Supermarkets WOLVERHAMPTON WV2 3HP
## Grid_East Grid_North Floorspace..sqft.
## 1 368835.00 402119.00 3040
## 2 412535.00 288186.00 4000
## 3 412125.00 286841.00 4000
## 4 411894.00 279150.00 5000
## 5 297192.00 517643.00 5000
## 6 391642.00 296564.00 5900
9.4.1 Data preparation
Here we are interested in convenience store locations as we want to find locations for 3 new ones. We need to filter out records / observations that are not in the Trafford area in Manchester. The code below creates a 2km buffer around the workplace zones in Figure 9.4, a spatial layer from the stores data using Easting and Northing (these are in OSGB projection) and the clips the stores spatial layer with the buffer:
# convert to numeric
$Grid_East = as.numeric(as.character((stores$Grid_East))) stores
## Warning: NAs introduced by coercion
$Grid_North = as.numeric(as.character((stores$Grid_North))) stores
## Warning: NAs introduced by coercion
# get rid of NAs
= !is.na(stores$Grid_North)
index # create spatial layer
= st_as_sf(stores[index,], coords = c("Grid_East", "Grid_North"), crs = 27700)
stores_sf # trim down fields
$type = "existing"
stores_sf= stores_sf[,9]
stores_sf # Create 2km spatial data for Trafford areas
= st_buffer(st_union(wz), 2000)
wz_buffer # clip out Trafford stores
= stores_sf[wz_buffer,] stores_sf
We can examine these existing locations:
tm_shape(stores_sf) + tm_dots(col = "red", size = 1) +
tm_shape(wz) + tm_borders()
Next, we need to create some potential locations for the new stores. The st_make_grid
functions is very handy for this. The code below defines a hexagonal grid based on the extent of the wz
layer, adds a type
variable (potential) and joins this to the stores
layer defined above, before existing and potential store locations are mapped as in Figure 9.5.
= st_make_grid(wz, 500, what = "centers", crs = 27700, square = F)
grid # make an attribute table for grid
= data.frame(type = rep("potential",length(grid)))
sup_pt # add the grid geometry to it
st_geometry(sup_pt) = grid
# join to existing
= rbind(stores_sf,sup_pt)
supply # clip to 2km buffer
= supply[wz_buffer,]
supply # and plot
= tm_shape(wz) + tm_borders() +
p1 tm_shape(supply %>% filter(type == "existing") ) + tm_dots(col = "red", size = 0.3) +
tm_layout(title = "Existing")
# potential sites
= tm_shape(wz) + tm_borders() +
p2 tm_shape(supply %>% filter(type == "potential") ) + tm_dots(col = "blue", size = 0.05) +
tm_layout(title = "Potential")
tmap_arrange(p1,p2)
9.4.2 Weighted Distance
In order to undertake the location allocation we need a create some kind of distance weighted measure between the supply and demand points. What the allocation will do is try to minimise the sum of this (i.e. the sum of the weighted distance between all supply and the allocated demand). This is important to bare in mind as there may be some situations where you may wish to maximise some objective function. This is needed to evaluate potential supply location (as in the existing and potential supply locations) with respect to demand (as in the workplace zones). The code below create a distance matrix between the workplace zones and the supply locations:
# create distance matrix
<- (st_distance(wz, supply)/1000)
d.mat dim(d.mat)
## [1] 256 1051
dim(wz)
## [1] 256 5
dim(supply)
## [1] 1051 2
This is a non-symmetrical matrix of the distances (in km) between each Workplace Zone (rows) and each Supply point (columns). Have a look at the first 6 rows and columns:
1:6, 1:6] d.mat[
## Units: [m]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 4.473153 4.210723 1.5730894 6.833619 7.399387 4.625622
## [2,] 3.000913 2.545830 0.9863612 5.145449 8.485449 5.048200
## [3,] 2.431667 2.231559 1.6593746 4.588194 8.512392 4.911704
## [4,] 2.733473 2.968681 1.9862941 5.232414 7.831771 4.339954
## [5,] 3.055995 3.133407 1.8667712 5.488943 7.708793 4.316246
## [6,] 1.770215 2.719927 2.7966096 4.569147 7.903153 4.235837
Now the distances need to be weighted by the demand, in this case the TotalWP
variable in wz
:
# determine the demand weights
<- wz$TotalWP
w # have a look
head(w)
## [1] 9547 1001 886 335 378 200
And the distance matrix is multiplied such that it records ‘person distances’. In this case these are in person kilometres but the units are not important. Instead is the relative measures that the location-allocation algorithm uses.
<- d.mat * w d.mat
Again have a look at the first 6 rows and columns and consider how they relate to weight data
1:6, 1:6] d.mat[
9.4.3 Undertaking the allocation
We need an index of the existing stores to pass to the allocation function. The idea is that the existing stores are kept in the solution as the algorithm searches to identifies sets of new locations.
= which(supply$type == "existing")
keep.index head(keep.index)
## [1] 1 2 3 4 5 6
length(keep.index)
## [1] 178
There is a small further preparatory step: there are 115 existing stores. If we want to find the locations for 3 new stores
we need to set the parameter p
to be \(115+3\) as well as passing the index of the ones we are are keeping (in keep.index
).
# define how many to keep
= length(keep.index) + 3 p
We can now pass the population / demand weighted distance matrix into the allocate
function, noting that the spatial data have to be converted from SF to SP format for this. This may take some time to run.
# allocate
= allocate(swdf1 = as(wz, "Spatial"),
allocations.list2 swdf2 = as(supply, "Spatial"),
force = keep.index,
p=p,
metric = d.mat,
verbose = F)
# determine the indicates of the new locations
= setdiff(allocations.list2, keep.index) new_store_index
The new store locations can be plotted as well as the areas and supply points:
tm_shape(supply) + tm_dots(col = "white") +
tm_shape(wz) + tm_polygons(col = "lightyellow") +
tm_shape(supply[c(allocations.list2, keep.index),]) +
tm_dots("type", title = "Type of Site", size = 0.3, palette = "Set1") +
tm_layout(title = "New Supply Sites",
frame = T, legend.outside = F,
legend.hist.width = 1,
legend.format = list(digits = 1),
legend.position = c("left", "top"),
legend.text.size = 0.7,
legend.title.size = 1)
In this case it is possible to plot the allocations as before due, but this time including the supply layer as well as the second argument. However, many of the lines are short, due to the high number of existing supply locations and the low number of allocations - they are not that dramatic!
# create allocations
<- allocations(as(wz, "Spatial"), as(supply, "Spatial"),
wz_allocs p=73,force=unique(allocations.list2))
# convert to sf
= st_as_sf(wz_allocs)
wz_allocs # star lines
<- create_star_lines(wz_allocs, supply) star_lines
## Warning in st_centroid.sf(sf1): st_centroid assumes attributes are constant over
## geometries of x
## Warning in st_centroid.sf(sf2): st_centroid assumes attributes are constant over
## geometries of x
= st_union_by(wz, allocations.list2)
zones
= tm_shape(wz) + tm_borders(col = "lightgrey") +
p1 tm_shape(star_lines)+tm_lines(col = "darkred", lwd = 2)
# distance to allocation
= tm_shape(wz_allocs) +
p2 tm_polygons(col = "allocdist",
title = "Distance",
palette = "Reds", n = 7, style = "kmeans")
tmap_arrange(p1, p2)
Finally we can determine the allocation of populations (demand) to the supply. The allocations.list2
object indicates the grouping of the Workplace Zones and they have the customer population
# 1. the original stores
$allocation = allocations.list2
wz%>% st_drop_geometry() %>% group_by(allocation) %>%
wz summarise(Customers = sum(TotalWP)) %>% data.frame() -> store_customers
summary(store_customers$Customers)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 294 702 1130 1780 1885 11224
# 2. the new stores
$allocation = match(allocations.list2, new_store_index)
wz%>% st_drop_geometry() %>% group_by(allocation) %>%
wz summarise(Customers = sum(TotalWP)) %>% na.omit() %>% data.frame() -> new_store_customers
new_store_customers
## allocation Customers
## 1 1 4340
## 2 2 2948
## 3 3 3726
They all look pretty healthy!