Data Analysis with R
R is an open-source programming language that is popular among statisticians and data scientists. We’ll be using the software RStudio to write and run R code. There are two ways to access RStudio for free. You can choose either of the following options.
Download R and RStudio to your own computer. If you choose this option, you’ll have to download the following:
- R version 3.3 or higher. Visit https://cran.rstudio.com/ and choose the download option for your operating system. Run the .exe file once it’s downloaded and follow the prompts.
- The RStudio IDE (integrated development environment). Visit https://www.rstudio.com/products/rstudio/download/ and download the free desktop version. Run the .exe file and follow the prompts.
Access RStudio Cloud online. Visit https://rstudio.cloud/ and click “Get started for free.” You’ll be asked to create an account, and once you do, you’ll be able to access RStudio over the internet; no downloads are required.
Either of these options is fine. RStudio Cloud gives you more flexibility and avoids having to download anything, but it starts up slowly and sometimes crashes. The RStudio desktop IDE is faster and more reliable, but you can only access it on your own computer.
Once you have RStudio set up, you can get started right away. From the “File” menu, choose “New File,” then “R Script.” A window will open in the upper left quadrant of the screen where you can start typing R code. Test it by typing the following:
To execute this code, hold down “Ctrl” and hit “Enter.” You should see the following appear in the lower left quadrant window (the console):
##  5
You can also assign values to variables and then use the variables in place of a specific value. Suppose we want to assign the number 7 to a variable named
x. We can do so as follows:
<- can either be typed as the sequence
< followed by
-, or you can hold down “Alt” and hit
-. Either way, you can now use
x in place of the number 7:
<- 7 x ^2x
##  49
It’s good coding practice to include comments with your code, which are statements that the code user can see but which are not executed with the code itself. The way to enter a comment is to precede it with a pound sign
#. For example:
# This is a short script that applies the Quadratic Formula. # We first specify the values of a, b, and c in the equation ax^2 + bx + c = 0: <- 2 a <- 3 b <- -4 c # Then we calculate the two roots of the equation: -b + sqrt(b^2 - 4*a*c))/(2*a)(
##  0.8507811
-b - sqrt(b^2 - 4*a*c))/(2*a)(
##  -2.350781
Right now, the functionality available to you is that of “base R.” This means that you have not yet installed and loaded any extra software packages written by independent developers. We’ll often have the occasion to use these extra packages, though. Most important to us will be the tidyverse package, which actually bundles several of the most commonly used R packages together. Almost everything we do in this course will require tidyverse. Here’s how to install it:
If you’re using the desktop version of RStudio, you’ll only have to do this once. If you’re using RStudio Cloud, you might have to re-install it periodically.
Once you’ve installed the package, you have to load the library of functions it offers:
You’ll have to load the libraries you’ll be using at the beginning of every R session. This should be your first step any time you start up R.
If you’re using RStudio Cloud, then when you save your work, a .R file is saved to your cloud account. You can see your cloud files by clicking the “Files” tab in the lower right quadrant window of RStudio.
If you’re using the desktop version, then before saving anything, you should set your working directory. This should be a folder on your hard drive that you’ll be able to find easily. Once you have a folder set up to store your files, from the “Tools” menu, choose “Global Options.” Then in the field labeled “Default working directory (when not in a project),” browse to your desired folder. Your files will now automatically be stored there, and this directory will show up when you click “Files” in the lower right quadrant window of RStudio.
One more tip as you get started: Often when you’re writing code, the line will extend beyond the right edge of the window and you’ll have to scroll to the right to see it all. The default in RStudio is to not wrap the code to the next line to stay within the window. You can override this by going to the “Code” menu and clicking “Soft Wrap Long Lines.” You’ll have to choose this option every time you begin a new .R file.
We’ll learn many other coding conventions, shortcuts, and other ins and outs of RStudio as we proceed, but the above is enough to get started.