Running the examples and practice exercises in this book requires installation of the R software environment if it is not already on your computer. RStudio, an integrated development environment (IDE) that makes R easier to use, is also highly recommended. R and RStudio are both free and open source, which means that the software is available to all users at no cost and can be redistributed and modified.
First, download and install R from CRAN, the Comprehensive R Archive Network.
Go to https://cloud.r-project.org/ and click Download R for Windows, then under “Subdirectories” on the next page click base. Finally, click Download R “X.X.X” for Windows, where “X.X.X” is the current R version.
To install R on a Mac, choose Download R for (Mac) OS X and click R-X.X.X.pkg on the next page to download the installer.
After downloading the installer, click it to begin the installation process. If you are prompted with the message “The publisher could not be verified. Are you sure you want to run this software?”, then click Run. Next, select English as the language you want R to use. As you proceed through the installation screens, it is recommended to leave all options on their default settings.
RStudio is an integrated development environment (IDE) for R. After you have installed R, download and install the free version of RStudio Desktop from http://www.rstudio.com/download. Click the link to download the free version of RStudio Desktop. On the next page, click the button to download RStudio for Windows. This page also contains links to download RStudio installers for Mac and Linux systems.
After downloading the installer, click it to begin the installation process. If you are prompted with the message “Do you want to run this file?”, then click Run. As you proceed through the installation screens, it is recommended to leave all options on their default settings.
If you already have RStudio installed on your computer, you can update it to the newest version by following the directions above or by running RStudio and going to Help > Check for Updates.
When you start RStudio, R will also start automatically and run within the RStudio interface.
Throughout the class, you will need to install and use a variety of R packages. An R package is a collection of functions, data, and documentation that extends the capabilities of base R. In this class, we will use a number of classes that provide functions for reading and processing geospatial datasets, implementing various spatial analysis techniques, and visualizing the results of these analyses.
You can install a package with a single line of code:
R will download the packages from CRAN and install them on to your computer. If you have problems installing, make sure that you are connected to the internet, and that https://cloud.r-project.org/ isn’t blocked by your firewall or proxy. Note that in RStudio, you can also search for and install packages by selecting Tools > Install Packages… from the menu.
You will not be able to use the functions, objects, and help files in a package until you load it. Once you have installed a package, you can load it with the
The messages tell you that R is loading the ggplot2 package, which we will use in one of the first labs.
The main R installation, as well as most R packages are updated frequently, with updates occurring several times a year. It is important to keep your software up to date to be sure that it is bug-free and that you are working with the most recent versions of critical packages.
The simplest way to update R is by going to https://cloud.r-project.org/, downloading and installing the latest version of R if it is newer than the version currently on your computer. It is usually important to update following a major version change (e.g., a change from R version 4.0.3 to version 4.1.1). However, it may be less critical to update following a minor version change (e.g., from R version 4.0.3 to 4.0.4).
You can also use the
updateR() function in the installr(Galili 2021) package to update R automatically. To update using this function, you should run it from the R GUI, not in RStudio. The function offers some handy options, including an option to copy the R packages from the library of your existing R installation to the new one. However, this option does not always work correctly. It is often more straightforward just to reinstall any packages that you need after updating your R installation.
In some cases, you may need to update one or more of your packages to a later version without installing a new version of R. You can accomplish this task with the
update.packages() function. The following function will display each package on the screen and prompt the user to select yes (y), no (N), or cancel (c).
ask = FALSE argument will automatically update all packages without prompting the user.
update.packages(ask = FALSE)
Note that in RStudio, your can also update packages by selecting Tools > Check for Package Updates… from the menu. This approach is particularly handy if you just need to update one or a few packages.
Unfortunately, there is no straightforward way to transfer all of your packages from an old version of R to a new version of R. As mentioned earlier, the
updateR() function in the installr package can sometimes copy and update your packages automatically when you install a new R version, but it doesn’t always work depending on where your packages are stored and how the permissions on your computer are set up. Another option is to simply reinstall all your packages in the new R installation. However, this can be time-consuming to do manually. The following code uses the
installed.packages() function to extract package information and automatically install the packages in a new version of R.
Run the following code in your old version of R.
# Store information about installed packages in a data frame <- as.data.frame(installed.packages()) mypackages # Explore the data frame if you wish View(mypackages) # Save the data to a comma delimited text file write.csv(mypackages, 'old_packages.csv')
Then close the old version of R, open the new version, copy
old_packages.csv into the working directory, and run the following code.
# Read in the save list of old package <- read.csv('old_packages.csv') oldpackages # Read in the list of base R packages in the new version <- as.data.frame(installed.packages()) curpackages # Generate a vector of add-on packages to be installed <- setdiff(oldpackages$Package, curpackages$Package) newpackages # Install the packages install.packages(newpackages)
When using R and RStudio, you will end up working with a variety of different computer files. RStudio allows users to create projects to help manage all the files associated with a particular workflow in a single folder. These include:
RStudio project file (.RProj)
Workspace file (.RData)
History file (.Rhistory)
Script files (.R and .Rmd files)
Input data files (possibly including .csv and .xlsx files for tabular data, .tif files for gridded data, and ESRI shapefiles for vector data)
Output files (possibility including all input data file formats, .jpg, .png, or .tif files for graphics, and .pdf or .html files for formatted reports)
The recommended approach for setting up an RStudio project involves creating a folder for the project and then saving all project files in that folder. This is a relatively simple approach that has the advantage of being totally self-contained. To move or copy a project, all you need to do is move or copy the folder and everything will still work. You do not need to specify directory paths in your code - by default, R input and output functions will work with files in the main project directory. Eventually, you may need to develop more complex scripts that specify explicit paths to other directories in your file system. However, this simpler method is highly recommended for those learning R and RStudio.
The recommended steps are as follows:
Start by creating a new folder for the project.
In the RStudio menu bar, go to File>Project and select Existing Directory in the Create Project box (sometimes it takes a while for this box to pop up).
Navigate to the folder that you just created and select Create Project.
The folder in which the RStudio project was created should contain the following items:
A hidden .Rproj.user folder that you don’t need to worry about.
An .RData file that contains the saved R workspace.
An .Rhistory file that contains the history of all the code that has been executed in the project.
The R Project file - DemoProject.Rproj in this example.
- To open an RStudio project, you can do one of the following:
Select File>Open Project from the RStudio menu bar, navigate to the project directory, and select the .Rproj file.
Navigate to the project directory in RStudio and double-click on the .Rproj file.
The folder in which the RStudio project was created also serves as the R working directory. The .RData and .Rhistory files will be saved here by default. When data are imported, R will automatically look for the input data in the working directory unless a different path is specified. When data are exported, R will automatically put the output in the working directory unless a different path is specified.
The most critical components of your projects are your R script files and your input data. If you have these files, then you can always run your code again to generate your outputs. You should save your R scripts files frequently while you are working, and it is also advisable to save backup copies before making major changes. When you quit RStudio, you will typically see a prompt that shows you any unsaved files and asks if you want to save them.
It is usually a good idea to save any unsaved script files so that you don’t lose your most recent work. However, in most cases, it is better to not save the workspace image (.RData) file. Instead, you can just re-run your script and regenerate the workspace the next time you open the project. Using this approach, you can keep the focus on maintaining your code instead of trying to keep track of all the R objects that are generated when the code runs.
One of the trickiest challenges in working with R is dealing with conflicts between packages that have the same function names. This issue can result in strange errors that are very difficult to diagnose. Consider the following example. We start by loading tidyr.
The tidyr package has a handy function called
extract() that splits a data frame column into multiple columns based on a regular expression. This example splits a column of strings into the values before and after the dash.
<- data.frame(x = c(NA, "a-b", "a-d", "b-c", "d-e")) df df## x ## 1 <NA> ## 2 a-b ## 3 a-d ## 4 b-c ## 5 d-e extract(df, x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)") ## A B ## 1 <NA> <NA> ## 2 a b ## 3 a d ## 4 b c ## 5 d e
But perhaps we also need to load the terra package to analyze some raster data.
library(terra) ## terra 1.5.34 ## ## Attaching package: 'terra' ## The following object is masked from 'package:tidyr': ## ## extract
extract() function returns an error.
extract(df, x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")
What is happening here? After the packages have been loaded, they are visible in the list of attached packages and objects, which can be viewed with the
search() ##  ".GlobalEnv" "package:terra" ##  "package:tidyr" "package:ggplot2" ##  "package:stats" "package:graphics" ##  "package:grDevices" "package:utils" ##  "package:datasets" "package:methods" ##  "Autoloads" "package:base"
When a function is called, R goes through all available packages in memory to find one that contains the function. If there are functions with the same name in more than one package, then R will run the function from the first package found in the search list. The other functions are “masked,” meaning they are not called by default. The tidyr package has an
extract() function, but so does terra. If terra comes before tidyr in the search list, then the terra
extract() function will be run, and the tidyr
extract() function will be masked.
If you want to choose a function from a particular library, you can call it explicitly using the double-colon
:: operator, e.g.,
terra::extract(). Note that the order of packages in the search list is the opposite of the order that they are loaded - the most recently loaded packages mask previously loaded packages.
::extract(df, x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)") tidyr## A B ## 1 <NA> <NA> ## 2 a b ## 3 a d ## 4 b c ## 5 d e
These function conflicts are a common source of errors in R programming. One way to minimize them is to load your most important packages last instead of first. Also, if you are using a function with a generic name like
extract() that is found in multiple packages, it is good practice to call it explicitly with the
:: operator. To see if a particular function is present in multiple packages, you can use the
help() function with the package name as an argument. If that function is present in two or more loaded packages, RStudio will list them in the Help window. Try this out with
help(select). You can also look for messages about ‘masked’ packages that are returned after loadings packages with the