Chapter 1 About this book

This is an on-line book written to support the practicals for the GEOG3915 GeoComputation and Spatial Analysis module, delivered by Lex Comber of the School of Geography, from the University of Leeds. It is based on An Introduction to Spatial Analysis and Mapping by Brunsdon and Comber (2018) (link here) which provides a foundation for spatial data handling, GIS-related operations and spatial analysis in R.

1.1 Structure

The chapters in this book contain a individual Practical with a discrete set of activities, with links to any required data and R packages. Each chapter, except the Chapter 2 has the same typical structure:

  • a description of a practical technique for undertaking GeoComputation and Spatial Analysis (e.g. Geographically Weighted Regression or Areal Interpolation);
  • an example application (e.g. in analysis Public Health or Demography) to illustrate the technique;
  • a summary section with links to further reading, the data (and other) requirements for using the approach, the relative advantages and disadvantages of the approach and alternative techniques that could used to answer the same question or address the same problem;
  • the code (an R script) for undertaking the practical work, that you should be able to copy and paste to your RStudio session.

Each chapter is self contained and will contain instructions for loading and data and packages as needed.

1.2 R Refresh (if you have not used R before)

The aim of this section is to briefly provide an introduction to working with data in R/RStudio. You will need to use R/RStudio each week in this module.

1.2.1 Part 1. Background

1.2.1.1 The RStudio interface

Most of you will be using the RStudio interface to R, although you can of course still use R. RStudio provides a good interface to the different things that R can do via the 4 panes: the Console where code is entered (bottom left), a Source pane with R scripts (top left), the variables in the working Environment (top right), Files, Plots, Help etc (bottom right) - see the RStudio environment below.

In the figure above of the RStudio interface, a new script has been opened, a line of code had been written and then run in the console. The code assigns a value of 100 to x. The file has been saved into the current working environment. You are expected to define a similar set up for each practical as you work through the code. Note that in the script, anything that follows a # is a comment and ignored by R.

Users can set up their personal preferences for how they like their RStudio interface. Similar to straight R, there are very few pull-down menus in R, and therefore you will type lines of code into your script and run these in what is termed a command line interface (the console). Like all command line interfaces, the learning curve is steep but the interaction with the software is more detailed which allows greater flexibility and precision in the specification of commands.

Beyond this there are further choices to be made. Commands can be entered in two forms: directly into the R console window or as a series of commands into a script window. We strongly advise that all code should be written in script (an .R file) and then run from the script.

1.2.1.2 Ways of working

The first set of consideration relate to how you should work in R/RStudio. The key things to remember are:

  • R is a learning curve if you have never done anything like this before. It can be scary. It can be intimidating. But once you have a bit of familiarity with how things work, it is incredibly powerful.

  • You will be working from practical worksheets which will have all the code you need. Your job is to try to understand what the code is doing and not to remember the code. Comments in your code really help.

  • To help you do this, the very strong suggestion is use the R scripts that are provided, and that you add your own comments to help you understand what is going on when you return to them. Comments are prefaced by a hash (#) that is ignored by R. Then you can save your code (with comments), run it and return to it later and modify at your leisure.

The module places a strong emphasis placed on learning by doing, which means that you encouraged to unpick the code that you are given, adapt it and play with it. It is not about remembering or being able to recall each function used but about understanding what is being done. If you can remember what you did previously (i.e. the operations you undertook) and understand what you did, you will be able to return to your code the next time you want to do something similar. To help you with this you should:

  1. Always run your code from an R script… always! These are provided for each practical;
  2. Annotate you scripts with comments. These prefixed by a hash (#) in the code;
  3. Save your R script to your folder;
  4. In your RStudio session, you should set the working directory to the folder location.

To summarise…

  • You should always write your code into a script which can be saved and then re-run at a later date
  • To load an existing script file go to File > Open File and then navigate to your file or if you have recently opened the file File > Recent Files >
  • To run code in a script, place the cursor on the line of code and then run by pressing the ‘Run’ icon at the top left of the script pane, or by pressing Ctrl Enter (PC) or Cmd Enter (Mac).

1.2.2 Part 2. Your first R code

1.2.2.1 Assignment

Here you will create a new R script. Open a new script and save it as self_directed.R in the working directory you are using for this practical. I suggest that you create a separate folder for each week’s practical.

The command line prompt in the Console window, the >, is an invitation to start typing in your commands.

Write the following into your script and run it: 2+2 and run it. Recall that this is done by either by pressing the Run icon at the top left of the script pane, or by pressing Ctrl Enter (PC) or Cmd Enter (Mac).

2+2 
## [1] 4

Here the result is 4. The [1] that precedes the output it formally indicates, first requested element will follow. In this case there is just one element. The > indicates that R is ready for another command.

Now type the following in and run it:

y <- 2+2
y
## [1] 4

Here the value of the 2+2 has been assigned to y, or y gets 2+2. When y is invoked its value is returned (4).

Now copy and paste the following into your R script and run both lines:

x <- matrix(c(1,2,3,4,5,6,7,8), nrow = 4)
y = matrix(1:8, nrow = 4, byrow = T)

For the purposes of this module, in R the equals sign (=) is the same as <-, a left diamond bracket < followed by a minus sign -. This too is interpreted by R as is assigned to or gets when the code is read right to left

You should see the x appear with the y in the Environment pane. y has now been overwritten with a new assignment. If you click on the icon next to them, you will get a ‘spreadsheet’ view of the data you have created.

Of course you can also enter the following in the console and see what is returned:

x
y

Note In the code snippets above you have used parentheses - round brackets. Different kinds of brackets are used in different ways in R. Parentheses are used with functions, and contain the arguments that are passed to the function, separated by commas (,). In this case the function are c() and matrix(). In the line of code x = matrix(c(1,2,3,4,5,6,7,8), nrow = 4), the arguments passed to the matrix() function are the vector c(1,2,3,4,5,6,7,8) and nrow = 4. Other kinds of brackets are used in different ways as you will see later.

1.2.2.2 Data Tables

Here we will load a data table (spreadsheet) in R/RStudio. R has number of un-built datasets that we can use the code below loads one of these:

data(mtcars)

Have a look at what is loaded by listing the objects in the current R session

ls()
## [1] "not_installed" "packages"      "x"             "y"

You should see the mtcars object. You can examine this data in a number of ways

# the structure of mtcars
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
# the first six rows (or head) of mtcars
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The mtcars object is a data.frame, a kind of data table, and it has a number of attributes which are all numeric. The code below prints it all out to the console:

mtcars

Data frames are ‘flat’ in that they typically have a rectangular layout like a spreadsheet, with rows typically relating to observations (individuals, areas, people, houses, etc) and columns their properties or attributes (height, age, etc). The observations are census areas in this case. The columns in data frames can be of different types: vectors of numbers, factors (classes) or text strings. Data frames are central to the way that all of the more recent R routines process data. For now, think of data frames as like spreadsheets, where the rows are observations and the columns are fields or variables.

1.2.2.3 Plotting the data: ‘Hello World!’

The code below creates a plot of 2 variables counts in the data: mpg and disp.

plot(disp ~ mpg,  data = mtcars, pch=16)

The option pch=16 sets the plotting character to a solid black dot. More plot characters are available - examine the help for points() to see these:

?points

This plot can be improved greatly. We can specify more informative axis labels, change size of the text and of the plotting symbol, and so on.

We can also specify the same plot by passing named variables to the plot function directly as well as other parameters, as in the figure (notice how the syntax for this is different)

plot(x = mtcars$mpg, y = mtcars$disp,   pch = 1, col = "dodgerblue", 
     cex = 1.5, xlab = "Miles per Gallon", ylab = "Displacement", main = "Hello World!")
A scatter plot.

Figure 1.1: A scatter plot.

Notice how the dollar sign ($) is used to access variables in the mtcars data table are accessed with the $ compared to the previous plot command.

1.2.2.4 Data summaries and indexing

We may for example require information on variables in mtcars. The summary function is very useful:

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

This shows different summaries of the individual attributes in mtcars.

The main R graphics function is plot(). In addition to plot() there are functions for adding points and lines to existing graphs, for placing text at specified positions, for specifying tick marks and tick labels, for labelling axes, and so on.

There are various other alternative helpful forms of graphical summary. A helpful graphical summary for the mtcars data frame is the scatterplot matrix, shown in the figure.

names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
names(mtcars)[c(3:7)]
## [1] "disp" "hp"   "drat" "wt"   "qsec"
c(3:7)
## [1] 3 4 5 6 7
plot(mtcars[,c(3:7)], cex = 0.3, 
     col = grey(0.145,alpha=0.5), upper.panel=panel.smooth)
Multiple scatterplots.

Figure 1.2: Multiple scatterplots.

The results show the correlations between the variables in the mtcars data frame, and the trend of their relationship is included with the upper.panel=panel.smooth parameter passed to plot.

There are number of things to notice here (as well as the figure). In particular note the use of the vector c(2:9) to subset the columns of mtcars:

  • In the second line is was used to subset the vector of column names created by names(mtcars).
  • In the third line it was printed out. Notice how 3:7 printed out all the number between 3 and 7 - very useful.
  • For the plot, the vector was passed to the second argument, after the comma, in the square brackets [,] to indicate which columns were to be plotted. The vector is contained in the round brackets preceded by a c which tells R to combine all of the arguments passed to it c(...)

The referencing in this way (or indexing) is very important: the individual rows and columns of 2 dimensional data structures like data frames, matrices, tibbles etc can be accessed by passing references to them in the square brackets.

# 1st row
mtcars[1,]
# 3rd column
mtcars[,3]
# a selection of rows
mtcars[c(3:5,8),]

Such indexing could of course have been assigned to a R object and used to do the subsetting:

x = c(3:7)
names(mtcars)[x]
plot(mtcars[,x], cex = 0.3, col = grey(0.145,alpha=0.5))

Thus indexing allows specific rows and columns to be extracted from the data as required.

Note You have encountered a second type of brackets, square brackets [ ]. These are used to reference or index positions in a vector or a data table. Consider the variable x above. It contains a vector of values, 3,4,5,6,7. Entering x[1] would extract the first element of x, in this case 3. Similarly x[4] would return the 4th element and x[c(1,4)] would return the 1st and 4th elements of x. However, in the examples above that index the 2-dimensional mtcars object, the square brackets are used to index row and column positions. The syntax for this is [rows, columns]. We will be using such indexing throughout this module.

1.2.3 Some Tasks:

Recall that a data.frame is a rectangular array of columns of data. Here you will create a data frame of two columns containing numeric values. The following data gives the distance that an elastic band moves when released for each amount it is stretched over the end of a ruler:

elasticband <- data.frame(stretch=c(46,54,48,50,44,42,52),
                          distance=c(148,182,173,166,109,141,166))

The function data.frame() can be used to input these (or other) data directly into data.frame objects.

Task 1 Plot distance against stretch from the elasticband data frame.

Task 2 Use the hist() command to plot a histogram of the age_pop values in the mtcars data frame (hints: a) think about how the Hello World plot was parameterised and the fact that histograms are constructed from a single variable, and b) examine the help for hist by entering ?hist at the console)

Task 2 Repeat 2 after taking logarithms of disp cover using the log() function:

1.2.4 Packages

The base installation of R includes many functions and commands. However, more often we are interested in using some particular functionality, encoded into packages contributed by the R developer community. Installing packages for the first time can be done at the command line in the R console using the install.packages command as in the example below to install the tmap library or via the RStudio menu via Tools > Install Packages.

When you install these packages it is strongly suggested you also install the dependencies. These are other packages that are required by the package that is being installed. This can be done by selecting check the box in the menu or including dep=TRUE in the command line as below (don’t run this yet!):

install.packages("tmap", dep = TRUE)

You may have to set a mirror site from which the packages will be downloaded to your computer. Generally you should pick one that is nearby to you.

Further descriptions of packages, their installation and their data structures will be given as needed in the practicals. There are literally 1000s of packages that have been contributed to the R project by various researchers and organisations. These can be located by name at http://cran.r-project.org/web/packages/available_packages_by_name.html if you know the package you wish to use. It is also possible to search the CRAN website to find packages to perform particular tasks at http://www.r-project.org/search.html. Additionally many packages include user guides and vignettes as well as a PDF document describing the package and listed at the top of the index page of the help files for the package.

As well as tmap you should install and the following the following packages and dependencies:

You could do this in one go:

install.packages(c("sf", "tidyverse", "tmap"), dep = TRUE)

Remember: you will only have to install a package once!! So when the above code has run in your script you should comment it out. For example you might want to include something like the below in your R script.

# packages only need to be loaded once
# install.packages(c("sf", "tidyverse", "tmap"), dep = TRUE)

Once the package has been installed on your computer then the package can be called using the library() function into each of your R sessions as below.

library(sf)

1.2.5 Answers to Tasks

Task 1 Plot distance against stretch from the elasticband data frame.

plot(stretch~distance, data = elasticband)
# or
plot(elasticband$stretch, elasticband$distance)

Task 2 Use the hist() command to plot a histogram of the mpg values in the mtcars data frame (hints: a) think about how the Hello World plot was parameterised and the fact that histograms are constructed from a single variable, and b) examine the help for hist by entering ?hist at the console)

hist(mtcars$mpg)

Of course, some refinement is possible.

hist(mtcars$mpg, xlab='Miles per Gallon', 
     main='Histogram of MPG', 
     breaks = 15,
     col = 'DarkRed')

The code below plots a probability density of the same data. Essentially what this does is normalize the histogram total are to 1

hist(mtcars$mpg, prob = T, 
     xlab='Miles per Gallon', 
     main='Histogram of MPG', 
     breaks = 15,
     col = 'DarkRed',
     border = "#FFFFBF")
# add the probability density trend
lines(density(mtcars$mpg, na.rm=T),col='salmon',lwd=2)
# show the frequencies at the bottom - like a rug!
rug(mtcars$mpg)

Task 3 Repeat 2 after taking logarithms of mpg cover using the log() function:

hist(log(mtcars$mpg))

1.3 Summary

The aim of this section has been to familiarise you with the R environment if you have not used R before. If you have but not for a while them hopefully this has acted as a refresher.

Other good on-line get started in R guides include:

And of course there are my own offerings:

Comber and Brunsdon (2021), provide a through grounding of the topics covered in this module. Chapter 1 will support this introduction, and the code for each chapter is here: https://study.sagepub.com/comber/student-resources/code-library

Brunsdon and Comber (2018), provides a comprehensive introduction to R and spatial data: see https://uk.sagepub.com/en-gb/eur/an-introduction-to-r-for-spatial-analysis-and-mapping/book258267.

References

Brunsdon, Chris, and Lex Comber. 2018. An Introduction to r for Spatial Analysis and Mapping (2e). Sage.