2.2 R basics

So let’s get started with R programming!

2.2.1 R as a calculator

In its most basic usage, we can use R as a calculator. Basic algebraic operations can be carried out as you would expect. The symbol + is for sum, - for subtraction, * for multiplication and / for division. Here are some examples:

4 + 2
## [1] 6
4 - 2
## [1] 2
4 * 2
## [1] 8
5 / 2
## [1] 2.5

2.2.2 Variable assignment

In R the symbol <- is used to assign a quantity to a variable. For instance, a <- 4 assigns the number 4 to the variable a and b <- 3 assigns the number 3 to b. It is much more common to work with variables in programming. Basic operations can then be performed over variables.

a <- 4
b <- 3
a + b
## [1] 7
a - b
## [1] 1

Notice for example that the code a <- 4 does not show us the value of the variable a. It only creates this assignment. If we want to print the value of a variable, we have to explictly type the name of the variable.

a
## [1] 4

2.2.3 Data types

In the previous examples we worked with numbers, but variables could be assigned other types of information. There are four basic types:

  • Logicals or Booleans: corresponding to TRUE and FALSE, also abbreviated as T and F respectively;

  • Doubles: real numbers;

  • Characters: strings of text surrounded by " (for example "hi") or by ' (for example ‘by’);

  • Integers: integer numbers. If you type an integer in R, as before 3 or 4, it will usually be stored as a double unless explicitly defined.

Examples:

a <- TRUE
a
## [1] TRUE
b <- "hello"
b
## [1] "hello"

2.2.4 Vectors

In all previous examples the variables included one element only. More generally we can define sequences of elements or so-called vectors. They can be defined with the command c, which stands for combine.

vec <- c(1,3,5,7)
vec
## [1] 1 3 5 7

So vec includes the sequence of numbers 1, 3, 5, 7. Notice that a vector can only include one data type. Consider the following:

vec <- c(1, "hello", TRUE)
vec
## [1] "1"     "hello" "TRUE"

We created a variable vec where the first entry is a number, then a character string, then a Boolean. When we print vec, we get that its elements are "1", "hello" and "TRUE": it has transformed the number 1 into the string "1" and the Boolean TRUE into "TRUE".

2.2.5 Matrices

Matrices are tables of elements that are organized in rows and columns. You can think of them as an arrangement of vectors into a table. Matrices must have the same data type in all its entries, as for vectors. Matrices can be constructed in multiple ways. One way is by stacking vectors into a matrix row-by-row with the command rbind. Consider the following example.

row1 <- c(1,2,3)
row2 <- c(4,5,6)
row3 <- c(7,8,9)
mat <- rbind(row1,row2,row3)
mat
##      [,1] [,2] [,3]
## row1    1    2    3
## row2    4    5    6
## row3    7    8    9

So first we created vectors row1 = (1,2,3), row2 = (4,5,6) and row3 = (7,8,9) and then organizing them together into the matrix mat.

The following code follows the same procedure but now organizes vectors by columns instead using the command cbind.

col1 <- c(1,2,3)
col2 <- c(4,5,6)
col3 <- c(7,8,9)
mat <- cbind(col1,col2,col3)
mat
##      col1 col2 col3
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Last, there is also a command called matrix to create a matrix. It takes a vector, defined using the command c and stores its entries into a matrix of nrow rows and ncol columns. Consider the following example.

vec <- c(1,2,3,4,5,6,7,8,9)
mat <- matrix(vec, nrow = 3, ncol = 3)
mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

So first we created a vector vec with numbers from 1 to 9 and then stored them in a matrix with 3 rows and 3 columns. Number are stored by column: the first element of vec is in entry (1,1), the second element of vec is in entry (2,1), and so on.

2.2.6 Dataframes

Dataframes are very similar as matrices, they are tables organized in rows and columns. However, different to matrices they can have columns with different data types. They can be created with the command data.frame.

data <- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
                   X3 = c("male","male","female"))
data
##   X1    X2     X3
## 1  1  TRUE   male
## 2  2 FALSE   male
## 3  3 FALSE female

The dataframe data includes three columns: the first column X1 of numbers, the second column X2 of Boolean and the third column X3 of characters. Dataframes are the objects that are most commonly used in real world data analysis.

2.2.7 NULL and NA

The expression NA is used in R to denote a missing value. Consider the following example.

vec <- c(3, NA, 5)
vec
## [1]  3 NA  5

Although the second element of vec is the expression NA, R recognizes that it is used for missing value and therefore the elements 3 and 5 are still considered numbers: indeed they are not printed as "3" and "5".

NULL is an additional datatype. This can have various uses. For instance, it is associated to a vector with no entries.

c()
## NULL