2.2 R basics
So let’s get started with R programming!
2.2.1 R as a calculator
In its most basic usage, we can use R as a calculator. Basic algebraic operations can be carried out as you would expect. The symbol
+ is for sum,
- for subtraction,
* for multiplication and
/ for division. Here are some examples:
4 + 2
##  6
4 - 2
##  2
4 * 2
##  8
5 / 2
##  2.5
2.2.2 Variable assignment
In R the symbol
<- is used to assign a quantity to a variable. For instance,
a <- 4 assigns the number
4 to the variable
b <- 3 assigns the number
b. It is much more common to work with variables in programming. Basic operations can then be performed over variables.
<- 4 a <- 3 b + ba
##  7
##  1
Notice for example that the code
a <- 4 does not show us the value of the variable
a. It only creates this assignment. If we want to print the value of a variable, we have to explictly type the name of the variable.
##  4
2.2.3 Data types
In the previous examples we worked with numbers, but variables could be assigned other types of information. There are four basic types:
Logicals or Booleans: corresponding to
FALSE, also abbreviated as
Doubles: real numbers;
Characters: strings of text surrounded by
"hi") or by
'(for example ‘by’);
Integers: integer numbers. If you type an integer in R, as before 3 or 4, it will usually be stored as a double unless explicitly defined.
<- TRUE a a
##  TRUE
<- "hello" b b
##  "hello"
In all previous examples the variables included one element only. More generally we can define sequences of elements or so-called vectors. They can be defined with the command
c, which stands for combine.
<- c(1,3,5,7) vec vec
##  1 3 5 7
vec includes the sequence of numbers 1, 3, 5, 7. Notice that a vector can only include one data type. Consider the following:
<- c(1, "hello", TRUE) vec vec
##  "1" "hello" "TRUE"
We created a variable
vec where the first entry is a number, then a character string, then a Boolean. When we print
vec, we get that its elements are
"TRUE": it has transformed the number
1 into the string
"1" and the Boolean
Matrices are tables of elements that are organized in rows and columns. You can think of them as an arrangement of vectors into a table. Matrices must have the same data type in all its entries, as for vectors. Matrices can be constructed in multiple ways. One way is by stacking vectors into a matrix row-by-row with the command
rbind. Consider the following example.
<- c(1,2,3) row1 <- c(4,5,6) row2 <- c(7,8,9) row3 <- rbind(row1,row2,row3) mat mat
## [,1] [,2] [,3] ## row1 1 2 3 ## row2 4 5 6 ## row3 7 8 9
So first we created vectors
row1 = (1,2,3),
row2 = (4,5,6) and
row3 = (7,8,9) and then organizing them together into the matrix
The following code follows the same procedure but now organizes vectors by columns instead using the command
<- c(1,2,3) col1 <- c(4,5,6) col2 <- c(7,8,9) col3 <- cbind(col1,col2,col3) mat mat
## col1 col2 col3 ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
Last, there is also a command called
matrix to create a matrix. It takes a vector, defined using the command
c and stores its entries into a matrix of
nrow rows and
ncol columns. Consider the following example.
<- c(1,2,3,4,5,6,7,8,9) vec <- matrix(vec, nrow = 3, ncol = 3) mat mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
So first we created a vector
vec with numbers from 1 to 9 and then stored them in a matrix with 3 rows and 3 columns. Number are stored by column: the first element of
vec is in entry (1,1), the second element of
vec is in entry (2,1), and so on.
Dataframes are very similar as matrices, they are tables organized in rows and columns. However, different to matrices they can have columns with different data types. They can be created with the command
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE), data X3 = c("male","male","female")) data
## X1 X2 X3 ## 1 1 TRUE male ## 2 2 FALSE male ## 3 3 FALSE female
data includes three columns: the first column
X1 of numbers, the second column
X2 of Boolean and the third column
X3 of characters. Dataframes are the objects that are most commonly used in real world data analysis.
NA is used in R to denote a missing value. Consider the following example.
<- c(3, NA, 5) vec vec
##  3 NA 5
Although the second element of
vec is the expression
NA, R recognizes that it is used for missing value and therefore the elements 3 and 5 are still considered numbers: indeed they are not printed as
NULL is an additional datatype. This can have various uses. For instance, it is associated to a vector with no entries.