Chapter 2 R Basics

Hello!

This tutorial is for those who are completely new to R and programming.

In R, we have “objects” (which are like nouns) and “functions” (which are like verbs). Objects can include numbers, characters, strings (as in, a string of characters), booleans, datasets, and more. In R, we create objects using the <- arrow (you can also use =, but it is not common practice).

a <- 1
b = 1

When you run a <- 1, you should be able to see a value object named “a” that contains the number 1. You can print the number using the name, like below.

a
## [1] 1

Here’s a cool thing I recently learned: you can also use the arrow the other way around.

1 -> c

You can distinguish functions from objects because functions are succeeded by parentheses, e.g. function(). One function you’ll use often is str(), which is short for “structure.” This function allows you to learn more about your object. Functions usually require the user to provide an “argument”, which can include objects or function settings.

Let’s try to use the str() function now. There is only one required argument in str(): you must designate the object that you want to see the structure of.

str(a)
##  num 1

You can also learn more about str() by typing ?str in the console. This will automatically bring up your Help tab, which will provide you the documentation of this function. Documentations for a function typically includes a description, a vignette with the possible arguments, both required and optionals, and a list of the arguments. Often, functions will also include additional examples at the bottom. It can take a bit of time to understand the help documentation, so please be patient with yourself!

Based on the results of str(a), you can tell that the object a is a number (more specifically, the number 1). But what if we want a to be the written word “one”?

a <- "one" 
a
## [1] "one"

There are several important things to note. First, to tell R that I was using characters, I used double quotes (single quotes also technically work). Secondly, you’ll notice that by writing a <- "one", the original information stored in the object a is replaced. In other words, when you use the name of an already existing object, you replace the information in that object.

Let’s see how R reads the new object.

str(a)
##  chr "one"

Aha! Now it says chr (for character) instead of num (for numeric).

To learn more about objects and functions, see R4DS Chapter 4.

2.1 Using Packages

You can learn more about packages in R4DS Chapter 1.

A lot of the cool functions you will use in R will probably be in a packages (a collection of functions that do something). There are many, many packages in R and, in fact, the R programming community is uniquely good at maintaining a repository of good packages (called CRAN). Libraries that are accepted to CRAN can be installed easily into the R enviornment (but you can also install R packages that you find on any ol’ website, like GitHub).

To use a package, you only need to install it once. However, you need to “call” it everytime you open R. Below, we’ll install and use the package praise. You can read more about praise here: https://cran.r-project.org/web/packages/praise/praise.pdf

To install this package, you will use the install.packages() function, which takes the name of the package as an argument. Note that you will write the package name in quotes, as a string.

install.packages("praise")

As I mentioned above, you only need to install the package once. But everytime you close and re-open R, you’ll need to use the library() function to load the package into R. Notice that in this argument, you no longer need to use quotes around the word “praise”.

library(praise)

Feel free to check the praise package through the help screen, or by typing ?praise in the console. Simply put, praise is a package with the function praise(), which will compliment you. praise() doesn’t require an argument.

praise()
## [1] "You are groundbreaking!"

Try to install another package. For example, the R4DS textbook relies heavily on the tidyverse package, which is like a package of packages developed and maintained by Rstudio. tidyverse contains many packages for data management, including cleaning data, creating figures, and managing different types of objects.

install.packages("tidyverse")
library(tidyverse)
praise()
## [1] "You are brilliant!"

2.2 Commenting

In code, you can comment lines using a hashtag/pound sign (#), either in front of the line (to stop that line from running) or after a code chunk (if you wanted to take notes about what a function means). Commenting on your code is always a great idea because it allows you to add additional details, notes, or information as you are writing your code. Below, I write comments for each line of code

abc <- "six" #in this line, I create an object called "abc" with the string "six"
cde <- 3 + 3 #in this line, I create an object called "cde" with the numbers "3 + 3" which will equal 6

abc #in this line, I print the object abc
## [1] "six"
cde #in this line, I print the object cde
## [1] 6

2.3 Data

Last thing! Let’s talk really quickly about uploading data. Some packages have toy datasets that you can use to practice your coding. Once you’ve uploaded this package, the datasets will be readily available to you.

install.packages("datasets")
library(datasets)
#library(help = "datasets") #you can run this line of code to see the datasets available from this library

You now have these datasets uploaded into your R enviornment. For example, check out the dataset mtcars.

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

You can use the head() function to see only the first 5 (or first n) lines. The opposite of this (seeing the last 5 or last n lines) is tail()

head(mtcars) #the default number of lines displayed is 6
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
head(mtcars, n = 10) #notice that I included an optional argument here for the number of lines I wanted to see
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
tail(mtcars)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

To manipulate the dataset, I recommend you saving the dataset as an object.

new_object <- mtcars

Unlike the other objects we have created thus far, the new_object object is considered “Data”, not “Values”. In the enviornment, you can click Data objects and R studio will take you to a visual display of your dataframe.

What happens if you use str() on a dataset?

str(new_object)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Wow! There’s a lot more information here. One thing you’ll immediately notice is the structure of the object: it is a data.frame type. In the data frame, there are 11 variables (columns) and 32 car observations (rows). Each variable has a “type”–in this dataset, all variables are numbers.

Another function you’ll get to know is the summary() function, which displays the median and mean (amongst other things) for each variable in a dataframe. Let’s use it now on the new_object object we created.

summary(new_object)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000