Data Analysis with R
Getting Set Up
Accessing RStudio
R is an open-source programming language that is popular among statisticians and data scientists. We’ll be using the software RStudio to write and run R code. There are two ways to access RStudio for free. You can choose either of the following options.
Download R and RStudio to your own computer. Visit https://posit.co/download/rstudio-desktop/ and click the buttons to start the two required installations.
Access Posit Cloud (formerly RStudio Cloud) online. Visit https://posit.cloud/ and click “Get Started,” then choose the free plan on the next page. You’ll be asked to create an account, and once you do, you’ll be able to access RStudio over the internet; no downloads are required.
Either of these options is fine. Posit Cloud gives you more flexibility and avoids having to download anything, but it starts up slowly and sometimes crashes. You also have a limited amount of usage time each month. The RStudio desktop IDE is often faster and more reliable and allows unlimited usage, but you can only access it on your own computer.
Basic Scripting
Once you have RStudio set up, you can get started right away. From the “File” menu, choose “New File,” then “R Script.” A window will open in the upper left quadrant of the screen where you can start typing R code. Test it by typing the following:
2+3
To execute this code, hold down “Ctrl” and hit “Enter.” You should see the following appear in the lower left quadrant window (the console):
## [1] 5
You can also assign a value to a variable and then use the variable in place of the specific value. Suppose we want to assign the number 7 to a variable named x
. We can do so as follows:
<- 7 x
The <-
can either be typed as the sequence <
followed by -
, or you can hold down “Alt” and hit -
. Either way, you can now use x
in place of the number 7:
<- 7
x ^2 x
## [1] 49
It’s good practice to include comments with your code, which are statements that the code user can see but which are not executed with the code itself. The way to enter a comment is to precede it with a pound sign #
. For example:
# This is a short script that applies the Quadratic Formula.
# We first specify the values of a, b, and c in the equation ax^2 + bx + c = 0:
<- 2
a <- 3
b <- -4
c
# Then we calculate the two roots of the equation:
-b + sqrt(b^2 - 4*a*c))/(2*a) (
## [1] 0.8507811
-b - sqrt(b^2 - 4*a*c))/(2*a) (
## [1] -2.350781
Installing Packages
Right now, the functionality available to you is that of “base R.” This means that you have not yet installed and loaded any extra software packages written by independent developers. We’ll often have the occasion to use these extra packages, though. Most important to us will be the tidyverse package, which actually bundles several of the most commonly used R packages together. Almost everything we do in this course will require tidyverse. Here’s how to install it:
install.packages("tidyverse")
If you’re using the desktop version of RStudio, you’ll only have to do this once. If you’re using the online version, you might have to re-install it periodically.
Once you’ve installed the package, you have to load the library of functions it offers:
library(tidyverse)
You’ll have to load the libraries you’ll be using at the beginning of every R session. This should be your first step any time you start up R.
Workflow
Setting the Working Directory
If you’re using Posit Cloud, then when you save your work, a .R file is saved to your cloud account. You can see your cloud files by clicking the “Files” tab in the lower right quadrant window of RStudio.
If you’re using the desktop version, then before saving anything, you should set your working directory. This should be a folder on your hard drive that you’ll be able to find easily. Once you have a folder set up to store your files, from the “Tools” menu, choose “Global Options.” Then in the field labeled “Default working directory (when not in a project),” browse to your desired folder. Your files will now automatically be stored there, and this directory will show up when you click “Files” in the lower right quadrant window of RStudio.
If you’ve already started an R script file, you should try saving it now. Then close it and re-open it to make sure you can find it and that it saved correctly.
Soft-Wrapping
Often when you’re writing code, the line will extend beyond the right edge of the window and you’ll have to scroll to the right to see it all. The default in RStudio is to not wrap the code to the next line to stay within the window. You can override this by going to the “Code” menu and clicking “Soft Wrap Long Lines.” You’ll have to choose this option every time you begin a new .R file.
R Notebooks
Script files (the ones with the .R extension) are useful for writing and testing code, but they are not set up for interweaving code with written narrative. When preparing a report or homework assignment, you should always use an R Notebook rather than a script file. These are created by opening the “File” menu, then choosing “New File” and then “R Notebook.”
R Notebooks use a markup language for text formatting called R Markdown. R Markdown handles all of the usual formatting devices (bold face, italics, enumerated lists, tables, etc) and also allows for chunks of executable R code. Create a new R Notebook file and examine the sample content included. You will see how to enter a title, specify the output format (which, for us, will always be an HTML file), enter code chunks, and see a preview of your compiled HTML document.
When saving an R Notebook, use the .Rmd extension. The file will be saved in your working directory.
Previewing and Knitting
The preview feature in an R Notebook does not run any of the code; it only shows you what your document will look like with the output most recently created in the editor displayed. To create an HTML file that formats your text and runs and displays the output of all the code chunks, instead choose the knit option. This is found in the drop-down menu under “Preview.” In particular, choose “Knit to HTML.” An HTML file with all text formatted and all code chunks executed with output displayed will be created. The HTML file will be saved to your working directory. (You’ll have to save your Notebook before previewing or knitting.)
Visual Editor
Another way to preview your document as you create it is to use the Visual Editor. This shows you the formatted version of your R Notebook as you edit, without having to click the Preview button. You can turn on the Visual Editor by clicking the drop-down menu with the gear icon (just to the right of the Preview/Knit button). Choose the “Use Visual Editor” option.
Self-containment
R Notebook files should be entirely self-contained, meaning that any libraries or variables referred to in the document should be included in the R Notebook. This means, for example, your R Notebook should contain a code chunk in which the tidyverse library is loaded (if you intend to use any of its functionality). Also, if you refer to a variable, for instance called x
, in your R Notebook, then x
should be defined somewhere in the Notebook. An R Notebook is not able to access variables created outside of itself, such as in a scratch work R script file. However, note that you do not have to install any packages within your R Notebook. Once a package has been installed using the method described above, it’s saved on your hard drive or cloud account and doesn’t have to be re-installed every time you want to use it.
We’ll learn many coding conventions, shortcuts, and other ins and outs of RStudio as we proceed, but the above is enough to get started on some data analysis.