2.2 R basics
So let’s get started with R programming!
2.2.1 R as a calculator
In its most basic usage, we can use R as a calculator. Basic algebraic operations can be carried out as you would expect. The symbol +
is for sum, -
for subtraction, *
for multiplication and /
for division. Here are some examples:
4 + 2
## [1] 6
4 - 2
## [1] 2
4 * 2
## [1] 8
5 / 2
## [1] 2.5
2.2.2 Variable assignment
In R the symbol <-
is used to assign a quantity to a variable. For instance, a <- 4
assigns the number 4
to the variable a
and b <- 3
assigns the number 3
to b
. It is much more common to work with variables in programming. Basic operations can then be performed over variables.
<- 4
a <- 3
b + b a
## [1] 7
- b a
## [1] 1
Notice for example that the code a <- 4
does not show us the value of the variable a
. It only creates this assignment. If we want to print the value of a variable, we have to explictly type the name of the variable.
a
## [1] 4
2.2.3 Data types
In the previous examples we worked with numbers, but variables could be assigned other types of information. There are four basic types:
Logicals or Booleans: corresponding to
TRUE
andFALSE
, also abbreviated asT
andF
respectively;Doubles: real numbers;
Characters: strings of text surrounded by
"
(for example"hi"
) or by'
(for example ‘by’);Integers: integer numbers. If you type an integer in R, as before 3 or 4, it will usually be stored as a double unless explicitly defined.
Examples:
<- TRUE
a a
## [1] TRUE
<- "hello"
b b
## [1] "hello"
2.2.4 Vectors
In all previous examples the variables included one element only. More generally we can define sequences of elements or so-called vectors. They can be defined with the command c
, which stands for combine.
<- c(1,3,5,7)
vec vec
## [1] 1 3 5 7
So vec
includes the sequence of numbers 1, 3, 5, 7. Notice that a vector can only include one data type. Consider the following:
<- c(1, "hello", TRUE)
vec vec
## [1] "1" "hello" "TRUE"
We created a variable vec
where the first entry is a number, then a character string, then a Boolean. When we print vec
, we get that its elements are "1"
, "hello"
and "TRUE"
: it has transformed the number 1
into the string "1"
and the Boolean TRUE
into "TRUE"
.
2.2.5 Matrices
Matrices are tables of elements that are organized in rows and columns. You can think of them as an arrangement of vectors into a table. Matrices must have the same data type in all its entries, as for vectors. Matrices can be constructed in multiple ways. One way is by stacking vectors into a matrix row-by-row with the command rbind
. Consider the following example.
<- c(1,2,3)
row1 <- c(4,5,6)
row2 <- c(7,8,9)
row3 <- rbind(row1,row2,row3)
mat mat
## [,1] [,2] [,3]
## row1 1 2 3
## row2 4 5 6
## row3 7 8 9
So first we created vectors row1 = (1,2,3)
, row2 = (4,5,6)
and row3 = (7,8,9)
and then organizing them together into the matrix mat
.
The following code follows the same procedure but now organizes vectors by columns instead using the command cbind
.
<- c(1,2,3)
col1 <- c(4,5,6)
col2 <- c(7,8,9)
col3 <- cbind(col1,col2,col3)
mat mat
## col1 col2 col3
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Last, there is also a command called matrix
to create a matrix. It takes a vector, defined using the command c
and stores its entries into a matrix of nrow
rows and ncol
columns. Consider the following example.
<- c(1,2,3,4,5,6,7,8,9)
vec <- matrix(vec, nrow = 3, ncol = 3)
mat mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
So first we created a vector vec
with numbers from 1 to 9 and then stored them in a matrix with 3 rows and 3 columns. Number are stored by column: the first element of vec
is in entry (1,1), the second element of vec
is in entry (2,1), and so on.
2.2.6 Dataframes
Dataframes are very similar as matrices, they are tables organized in rows and columns. However, different to matrices they can have columns with different data types. They can be created with the command data.frame
.
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
data X3 = c("male","male","female"))
data
## X1 X2 X3
## 1 1 TRUE male
## 2 2 FALSE male
## 3 3 FALSE female
The dataframe data
includes three columns: the first column X1
of numbers, the second column X2
of Boolean and the third column X3
of characters. Dataframes are the objects that are most commonly used in real world data analysis.
2.2.7 NULL
and NA
The expression NA
is used in R to denote a missing value. Consider the following example.
<- c(3, NA, 5)
vec vec
## [1] 3 NA 5
Although the second element of vec
is the expression NA
, R recognizes that it is used for missing value and therefore the elements 3 and 5 are still considered numbers: indeed they are not printed as "3"
and "5"
.
NULL
is an additional datatype. This can have various uses. For instance, it is associated to a vector with no entries.
c()
## NULL