Chapter 1 Prerequisites

There is an assumption of some previous experience in R with this tutorial. If you have not used R before I would start with Chapter 1 of the free, and excellent textbook R for Data Science.

The GIS operations in R from the sf package are designed to integrate well with the tidyverse suite of R packages. We will make use of some basic functionality from the dplyr package and will be using pipes (%>%) to sequence multiple operations. If you are unfamiliar with dplyr and pipes I would go through the base vignette before starting Chapter 3 (Wickham et al. 2021).

1.1 Download Data

Download data here

Unzip the data and save to a local folder on your computer.

1.2 What is GIS?

GIS is an acronym that is typically used to refer to a geographic information system (but sometimes geographic information science). For our purposes I will define a GIS narrowly to refer to any software that can read, store, analyze, combine and otherwise manipulate spatially referenced data (data that is tied to location(s) on the earth’s surface). The main conceptual feature of a geographic information system is that it can layer data based on their co-location such that seemingly unrelated data can be combined based on the fact they are located in the same area (Figure 1.1). What the software then allows us to do is to organize the spatial information of an area, analyze and visualize patterns to obtain insight into relationships between and within geographic features.

Conceptual diagram of a geographic information system. [Source](https://upload.wikimedia.org/wikipedia/commons/9/9e/Figure_1-_Visual_Representation_of_Data_Themes_in_a_Geographic_Information_System_%2816793207609%29.jpg)

Figure 1.1: Conceptual diagram of a geographic information system. Source

Note that figure 1.1 is somewhat misleading in that the way the layers of data are represented in the actual software are much simpler (we will tackle this in Chapter 3).

1.3 Why use R?

Traditional GIS software such as ArCGIS and QGIS are fantastic programs that use a graphical user interface to be able to access their functionality. This means that when we work with data in these programs we have to run tools by pointing and clicking and be dilligent in recording the steps we took in order to reproduce the analysis if need be. This is the main disadvantage of using these programs, while the benefits are being to explore and map data with ease interactively (E.g. zoom in and out, click on features etc.).

In R we have to use a command-line interface to access the functionality. Working with data in this way has a very steep learning curve but is well worth it in applied research as it enables a clear and transparent record of the analysis conducted and reproducible results. I have also found it to be a much more effective way of conducting research as it makes suggestions for changes from committee members/supervisors/coauthors so much easier to implement! Hopefully this tutorial will make the learning curve a little less steep for using R as a GIS.

References

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.