GEOG5917 Big Data & Consumer Analytics - RStudio Practicals
This is an on-line book written to support the practicals for the GEOG5917 Big Data and Consumer Analytics module, delivered by Lex Comber of the School of Geography, from the University of Leeds. A real book was written based on the materials developed for this module: Geographical Data Science and Spatial Data Analysis: An Introduction in R (Comber and Brunsdon 2021 - link here) and the module also draws from An Introduction to Spatial Analysis and Mapping in R (Brunsdon and Comber 2018 - link here) which provides a foundation for spatial data handling, GIS-related operations and spatial analysis in R.
The chapters in this online practical book each contain an individual practical with a discrete set of activities, and links to any required data and R packages. Each chapter, is self contained and includes extensive descriptions of the techniques being illustrated, instructions for running code and for loading and data and packages as needed. An R script is provided for each session. An R script is a text file with code, with the file extension
This chapter contains information about how to access RStudio on a university PC via AppsAnywhere and some core considerations about working with R.
RStudio provides a convenient graphical interface to R. It can be accessed through AppsAnywhere. You need to log on to a University PC and in AppsAnywhere start R first before you launch RStudio.
The steps are:
- Search for “Cran” and R is listed under Cran R 4.2.0 x64.
- Click on Launch and then minimise the RGui window after it opens (NB this should be minimised and not closed).
- Search for RStudio which is listed under Rstudio 2022.
- Again click Launch and ignore any package or software updates.
You should have a new RStudio session.
This process is summarised in Figure 1.1.
Now file management is really important. In Windows Explorer you should create a folder for your module practical work on your M-drive if you have not done so already. I suggest that you create a folder called
GEOG5917. In this you should create sub-folders for each practical session
For this session, create a sub-folder called
Week14 in your
GEOG5917 folder on your M-Drive. This folder will store the data for this practical and the R script. It is imperative that you practice good file management!
RStudio provides an interface to the different things that R can do via the 4 panes: the Console where code is entered (bottom left), a Source pane with R scripts (top left), the variables in the working Environment (top right), Files, Plots, Help etc (bottom right) - see the RStudio environment in Figure 1.2 below.
In the figure above of the RStudio interface, a new script has been opened, a line of code had been written and then run in the console. The code assigns a value of 100 to
x. The file has been saved into the current working environment. You are expected to define a similar set up for each practical as you work through the code. Note that in the script, anything that follows a
# is a comment and ignored by R.
Users can set up their personal preferences for how they like their RStudio interface. Similar to straight R, there are very few pull-down menus in R, and therefore you will type lines of code into your script and run these in what is termed a command line interface (the console). Like all command line interfaces, the learning curve is steep but the interaction with the software is more detailed which allows greater flexibility and precision in the specification of commands.
Beyond this there are further choices to be made. Commands can be entered in two forms: directly into the R console window or as a series of commands into a script window. We strongly advise that all code should be written in a script - (a
.R file) and then run from the script. - To run code in a script, place the cursor on the line of code and then run by pressing the ‘Run’ icon at the top left of the script pane, or by pressing Ctrl Enter (PC) (or Cmd Enter on a Mac).
The first set of consideration relate to how you should work in R/RStudio. The key things to remember are:
R is a learning curve if you have never done anything like this before. It can be scary. It can be intimidating. But once you have a bit of familiarity with how things work, it is incredibly powerful.
You will be working from practical worksheets which will have all the code you need. Your job is to try to understand what the code is doing and not to remember the code. Comments in your code really help.
To help you do this, the very strong suggestion is use the R scripts that are provided, and that you add your own comments to help you understand what is going on when you return to them. Comments are prefaced by a hash (
#) that is ignored by R. Then you can save your code (with comments), run it and return to it later and modify at your leisure.
The module places a strong emphasis placed on learning by doing, which means that you encouraged to unpick the code that you are given, adapt it and play with it. It is not about remembering or being able to recall each function used but about understanding what is being done. If you can remember what you did previously (i.e. the operations you undertook) and understand what you did, you will be able to return to your code the next time you want to do something similar. To help you with this you should:
- Always run your code from an R script… always! These are provided for each practical;
- Annotate you scripts with comments. These are prefixed by a hash (
#) in the code;
- Save your R script to your folder;
- You should always use a script (a text file containing code) for your code which can be saved and then re-run at a later date.
- You can write your own code into a script, copy and paste code into it or use an existing script (for example as provided for each of the R/RStudio practicals in this module).
- To open a new R script go to File > New File > R Script to open a new R file, and save it with a sensible name.
- To load an existing script file go to File > Open File and then navigate to your file. Or, if you have recently opened the file, go to File > Recent Files >.
- It is good practice to set the working directory at the beginning of your R session. This can be done via the menu in RStudio Session > Set Working Directory > …. This points the R session to the folder you choose and will ensure that any files you wish to read, write or save are placed in this directory.
- To run code in a script, place the cursor on the line of code and then run by pressing the ‘Run’ icon at the top left of the script pane, or by pressing Ctrl Enter (PC) or Cmd Enter (Mac).