Chapter 1 Introduction

This is an on-line book written to support the practicals for the GEOG3915 GeoComputation and Spatial Analysis module, delivered by Lex Comber of the School of Geography, from the University of Leeds. It draws from An Introduction to Spatial Analysis and Mapping by Brunsdon and Comber (2018) (link here) which provides a foundation for spatial data handling, GIS-related operations and spatial analysis in R.

There is a chapter for each practical and each has the same structure, containing:

  • an Introduction or Overview of the practical, its aims and objectives.
  • an R file for you to download and open in RStudio (with a file suffix of .R) containing code for you to run.
  • some Tasks for you to undertake to test your learning.
  • a Summary of the week with relevant references.

Each chapter is self contained and with instructions for loading and data and packages as needed and some have an additional optional exercise that extend the technique being illustrated or describe important related approaches. The sequence of practicals and topics will build as we progress through the module.

BUT In some weeks, there may be more text and detailed information in the practical online web page than others. For example, this may include equations (sorry!), discussions about the techniques being explored, links to references of related work, etc. These provide you with the context and wider understanding around the techniques that are being applied in the practical.

For these reasons it is strongly recommended that you read through the practical before attending the practical session.

1.1 Overview

The aim of this preliminary practical is to make sure that you are able to access RStudio, have a secure understanding of some core concepts related to data structures and data operations, and are generally ready to go for the rest of the module. The text below introduces the RStudio interface, some ways of working and some strong suggestions about how you should organise your working folder. Finally, you should download the week0.R file, open it in RStudio and work work through the R code.

The RStudio Interface

It is expected that you will use the RStudio interface to R as it provides RStudio provides an IDE (an integrated development environment) via the 4 panes: the Console where code is entered (bottom left), a Source pane with R code (top left), the variables in the working Environment (top right), Files, Plots, Help etc (bottom right) - see the RStudio environment below. Users can set up their personal preferences for how they like their RStudio interface, by playing with the pane options at:

Tools > Global Options > Pane Layout

The RStudio interface.

Figure 1.1: The RStudio interface.

In the figure above of the RStudio interface, an R file has been opened (but not saved!), and a line of code has been run. The code appears in the console pane (bottom left). The command line prompt in the Console window, the >, is an invitation to start entering your commands. The object x appears in the environment pane. The current working folder is shown in the bottom right pane. There is a comment in the the R script and note that anything that follows a # is a comment and is ignored by R.

Ways of working

It is important you develop rigorous ways of working within the RStudio environment.

  • R is a learning curve if you have never done anything like this before. It can be scary. It can be intimidating. But once you have a bit of familiarity with how things work, it is incredibly powerful.

  • You will be working from practical worksheets which will have all the code you need. Your job is to try to understand what the code is doing and not to remember the code. Comments in your code really help.

  • To help you do this, you should load the R files into your RStudio session, and add your own comments to help you understand what is going on. This will really help when you return to them at a later date. Comments are prefaced by a hash (#) that is ignored by R. Then you can save your code (with comments), run it and return to it later and modify at your leisure.

The module places a strong emphasis placed on learning by doing, which means that you encouraged to unpick the code that you are given, adapt it and play with it. It is not about remembering or being able to recall each function used but about understanding what is being done. If you can remember what you did previously (i.e. the operations you undertook) and understand what you did, you will be able to return to your code the next time you want to do something similar. To help you with this you should:

  1. Always run your code from an R script… always! These are provided for each practical;
  2. Annotate you scripts with comments. These prefixed by a hash (#) in the code;
  3. Save your R script to your folder;
  4. In your RStudio session, you should set the working directory to the folder location.

Projects, Files and Folder Management

You should create a separate directory for each week’s practical. Then you should copy the R file to that folder, open the RStudio App (Do not just double click on the .R file), create a new project, navigating to the folder you have just created:

File > New Project > Existing Directory

Projects provide a container for the work you have done. They have a .Rproj file extension and keep everything you need for that piece of work is in one place, separate from other projects that you are working on (see https://r4ds.had.co.nz/workflow-projects.html).

After you have done the practical, ran and created code, put comments in etc, you can save the R file. When you close R Studio it will give you the option to Save Current Workspace. If you do this, then the next time you open that project the working environment will have all the data and variables you loaded or created in your session.

1.2 Practical

The practical is contained in an R file. This has code for you to run (this is not a typing class), modify and play around with, as well as comments with explanations of what is being done. You should:

  • Create a folder for this practical. It should be in suitable place, e.g. in your geog3195 folder) and should have a suitable name (e.g. week0). Note that generally you should avoid spaces, hyphens and non-text characters in file and folder names.
  • Go to the VLE and download the week0.R file, and move it to your folder.
  • Open RStudio and create a new project, navigating to your folder and give the project a sensible name (e.g. week0 - the .Rproj bit is automatically added).
  • Highlight blocks of code in your script that you want to run. This can be done in a number of ways if the code is highlighted or the cursor is on the line of code
    • click on the Run icon at the top left of the script pane.
    • entering / pressing Ctrl + Enter keys on a PC or Cmd + Enter on a Mac.
  • Generally avoid entering code directly into the console: the point is to create and modify the code in the script and run it from there.
  • Undertake the tasks in the worksheet.

1.3 Answers to Tasks

Task 1 Plot efficiency against size from the cars data frame.

plot(efficiency~size, data = cars)
# or
plot(cars$efficiency, cars$size)

Task 2 Use the hist() command to plot a histogram of the efficiency values in the cars data frame (hints: a) think about how the Hello World plot was parameterised and the fact that histograms are constructed from a single variable, and b) examine the help for hist by entering ?hist at the console)

hist(cars$efficiency)

Of course, some refinement is possible.

hist(cars$efficiency, xlab='Miles per Gallon', 
     main='Histogram of MPG', 
     breaks = 15,
     col = 'DarkRed')

The code below plots a probability density of the same data. Essentially what this does is normalize the histogram total to sum to 1, giving the probability of a value falling within a particular histogram bin.

hist(cars$efficiency, prob = T, 
     xlab='Miles per Gallon', 
     main='Histogram of MPG', 
     breaks = 15,
     col = 'DarkRed',
     border = "#FFFFBF")
# add the probability density trend
lines(density(cars$efficiency, na.rm=T),col='salmon',lwd=2)
# show the frequencies at the bottom - like a rug!
rug(cars$efficiency)

Task 3 Repeat 2 after taking logarithms of mpg cover using the log() function:

hist(log(cars$efficiency))

The R script highlighted the importance of Indexing and different types of Brackets. These are explained below.

Note 1 Brackets In the code snippets above you have used parentheses - round brackets. Different kinds of brackets are used in different ways in R. Parentheses are used with functions, and contain the arguments that are passed to the function, separated by commas (,). In this case the function are c() and matrix(). In the line of code x = matrix(c(1,2,3,4,5,6,7,8), nrow = 4), the arguments passed to the matrix() function are the vector c(1,2,3,4,5,6,7,8) and nrow = 4. Other kinds of brackets are used in different ways as you will see later.


Note 2: Indexing You have encountered a second type of brackets, square brackets [ ]. These are used to reference or index positions in a vector or a data table. Consider the variable x above. It contains a vector of values, 3,4,5,6,7. Entering x[1] would extract the first element of x, in this case 3. Similarly x[4] would return the 4th element and x[c(1,4)] would return the 1st and 4th elements of x. However, in the examples above that index the 2-dimensional mtcars object, the square brackets are used to index row and column positions. The syntax for this is [rows, columns]. We will be using such indexing throughout this module.

1.4 Summary

1.4.1 Further Resources

The aim of this week’s practical has been to familiarise you with the R environment, if you have not used R before and to make sure you are up and running. If you have but not for a while them hopefully this has acted as a refresher. If this is new to you then you should consider exploring R in a bit. ore detail before we get going in anger next week. Other good on-line get started in R guides include:

And of course there are my own offerings, particularly Chapter 1 in Comber and Brunsdon (2021), and the early chapters of Brunsdon and Comber (2018).

1.4.2 Packages

Next week we will start to work with some proper geographical data and to make some maps. You should familiarise yourself with packages.

The base installation of R includes many functions and commands. However, more often we are interested in using some particular functionality, encoded into packages contributed by the R developer community. Installing packages for the first time can be done at the command line in the R console using the install.packages command as in the example below to install the tmap library or via the RStudio menu via Tools > Install Packages.

When you install these packages it is strongly suggested you also install the dependencies. These are other packages that are required by the package that is being installed. This can be done by selecting check the box in the menu or including dep=TRUE in the command line as below (don’t run this yet!):

install.packages("tidyverse", dep = TRUE)

You may have to set a mirror site from which the packages will be downloaded to your computer. Generally you should pick one that is nearby to you.

Further descriptions of packages, their installation and their data structures will be given as needed in the practicals. There are literally 1000s of packages that have been contributed to the R project by various researchers and organisations. These can be located by name at http://cran.r-project.org/web/packages/available_packages_by_name.html if you know the package you wish to use. It is also possible to search the CRAN website to find packages to perform particular tasks at http://www.r-project.org/search.html. Additionally many packages include user guides and vignettes as well as a PDF document describing the package and listed at the top of the index page of the help files for the package.

As well as tidyverse you should install sf for spatial data and spatial objects and tmap for mapping. This can be done together as below:

install.packages(c("sf", "tmap"), dep = TRUE)

Remember: you will only have to install a package once!! So when the above code has run in your script you should comment it out. For example you might want to include something like the below in your R script.

# packages only need to be loaded once
# install.packages(c("sf", "tmap"), dep = TRUE)

Once the package has been installed on your computer then the package can be called using the library() function into each of your R sessions as below.

library(tidyverse)
library(sf)
library(tmap)

We will use tools (functions) in these packages next week in the first formal practical to load data, undertake some GIS operations and export graphics.

References

Brunsdon, Chris, and Lex Comber. 2018. An Introduction to r for Spatial Analysis and Mapping (2e). Sage.
Comber, Lex, and Chris Brunsdon. 2021. Geographical Data Science and Spatial Data Analysis: An Introduction in r. Sage.