Chapter 2 R Basics
Hello!
This tutorial is for those who are completely new to R and programming.
In R, we have “objects” (which are like nouns) and “functions” (which are like verbs). Objects can include numbers, characters, strings (as in, a string of characters), booleans, datasets, and more. In R, we create objects using the <-
arrow (you can also use =
, but it is not common practice).
When you run a <- 1
, you should be able to see a value object named “a” that contains the number 1. You can print the number using the name, like below.
## [1] 1
Here’s a cool thing I recently learned: you can also use the arrow the other way around.
You can distinguish functions from objects because functions are succeeded by parentheses, e.g. function()
. One function you’ll use often is str()
, which is short for “structure.” This function allows you to learn more about your object. Functions usually require the user to provide an “argument”, which can include objects or function settings.
Let’s try to use the str()
function now. There is only one required argument in str()
: you must designate the object that you want to see the structure of.
## num 1
You can also learn more about str()
by typing ?str
in the console. This will automatically bring up your Help tab, which will provide you the documentation of this function. Documentations for a function typically includes a description, a vignette with the possible arguments, both required and optionals, and a list of the arguments. Often, functions will also include additional examples at the bottom. It can take a bit of time to understand the help documentation, so please be patient with yourself!
Based on the results of str(a)
, you can tell that the object a
is a number (more specifically, the number 1). But what if we want a to be the written word “one”?
## [1] "one"
There are several important things to note. First, to tell R that I was using characters, I used double quotes (single quotes also technically work). Secondly, you’ll notice that by writing a <- "one"
, the original information stored in the object a
is replaced. In other words, when you use the name of an already existing object, you replace the information in that object.
Let’s see how R reads the new object.
## chr "one"
Aha! Now it says chr
(for character) instead of num
(for numeric).
To learn more about objects and functions, see R4DS Chapter 4.
2.1 Using Packages
You can learn more about packages in R4DS Chapter 1.
A lot of the cool functions you will use in R will probably be in a packages (a collection of functions that do something). There are many, many packages in R and, in fact, the R programming community is uniquely good at maintaining a repository of good packages (called CRAN). Libraries that are accepted to CRAN can be installed easily into the R enviornment (but you can also install R packages that you find on any ol’ website, like GitHub).
To use a package, you only need to install it once. However, you need to “call” it everytime you open R. Below, we’ll install and use the package praise
. You can read more about praise
here: https://cran.r-project.org/web/packages/praise/praise.pdf
To install this package, you will use the install.packages()
function, which takes the name of the package as an argument. Note that you will write the package name in quotes, as a string.
As I mentioned above, you only need to install the package once. But everytime you close and re-open R, you’ll need to use the library()
function to load the package into R. Notice that in this argument, you no longer need to use quotes around the word “praise”.
Feel free to check the praise
package through the help screen, or by typing ?praise
in the console. Simply put, praise
is a package with the function praise()
, which will compliment you. praise()
doesn’t require an argument.
## [1] "You are groundbreaking!"
Try to install another package. For example, the R4DS textbook relies heavily on the tidyverse
package, which is like a package of packages developed and maintained by Rstudio. tidyverse
contains many packages for data management, including cleaning data, creating figures, and managing different types of objects.
## [1] "You are brilliant!"
2.2 Commenting
In code, you can comment lines using a hashtag/pound sign (#
), either in front of the line (to stop that line from running) or after a code chunk (if you wanted to take notes about what a function means). Commenting on your code is always a great idea because it allows you to add additional details, notes, or information as you are writing your code. Below, I write comments for each line of code
abc <- "six" #in this line, I create an object called "abc" with the string "six"
cde <- 3 + 3 #in this line, I create an object called "cde" with the numbers "3 + 3" which will equal 6
abc #in this line, I print the object abc
## [1] "six"
## [1] 6
2.3 Data
Last thing! Let’s talk really quickly about uploading data. Some packages have toy datasets that you can use to practice your coding. Once you’ve uploaded this package, the datasets will be readily available to you.
install.packages("datasets")
library(datasets)
#library(help = "datasets") #you can run this line of code to see the datasets available from this library
You now have these datasets uploaded into your R enviornment. For example, check out the dataset mtcars
.
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
You can use the head()
function to see only the first 5 (or first n) lines. The opposite of this (seeing the last 5 or last n lines) is tail()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(mtcars, n = 10) #notice that I included an optional argument here for the number of lines I wanted to see
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
To manipulate the dataset, I recommend you saving the dataset as an object.
Unlike the other objects we have created thus far, the new_object
object is considered “Data”, not “Values”. In the enviornment, you can click Data objects and R studio will take you to a visual display of your dataframe.
What happens if you use str()
on a dataset?
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Wow! There’s a lot more information here. One thing you’ll immediately notice is the structure of the object: it is a data.frame
type. In the data frame, there are 11 variables (columns) and 32 car observations (rows). Each variable has a “type”–in this dataset, all variables are numbers.
Another function you’ll get to know is the summary()
function, which displays the median and mean (amongst other things) for each variable in a dataframe. Let’s use it now on the new_object
object we created.
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000