2.3 Accessing and manipulating variables

Now that we have described the main objects we will work with in R, we can discuss how to access specific information.

2.3.1 Accessing a single element

Given a vector vec we can access its i-th entry with vec[i].

vec <- c(1,3,5)
vec[2]

## [1] 3

For a matrix or a dataframe we need to specify the associated row and column. If we have a matrix mat we can access the element in entry (i,j) with mat[i,j].

mat <- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat[1,3]

## [1] 7

2.3.2 Acessing multiple entries

To access multiple entries we can on the other hand define a vector of indexes of the elements we want to access. Consider the following examples:

vec <- c(1,3,5)
vec[c(1,2)]

## [1] 1 3

The above code accesses the first two entries of the vector vec. To do this we had to define a vector using c(1,2) stating the entries we wanted to look at. For matrices consider:

mat <- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat[c(1,2),c(2,3)]

##      [,1] [,2]
## [1,]    4    7
## [2,]    5    8

The syntax is very similar as before. We defined to index vectors, one for the rows and one for columns. The two statements c(1,2) and c(2,3) are separated by a comma to denote that the first selects the first and second row, whilst the second selects the second and third column.

If one wants to access full rows or full columns, the argument associated to rows or columns is left blank. Consider the following examples.

mat <- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat[1,]

## [1] 1 4 7

mat[,c(1,2)]

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

The code mat[1,] selects the first full row of mat. The code mat[,c(1,2)] selects the first and second column of mat. Notice that the comma has always to be included!

To access multiple entries it is often useful to define sequences of number quickly. The following command defines the sequence of integer numbers from 1 to 9.

1:9

## [1] 1 2 3 4 5 6 7 8 9

More generally, one can define sequences of numbers using seq (see ?seq).

2.3.3 Accessing entries with logical operators

If we want to access elements of an object based on a condition it is often easier to use logical operators. This means comparing entries using the comparisons you would usually use in mathematical reasoning, for instance being equal to, or being larger to. The syntax is as follows:

== to check equality (notice the two equal signs)
!= to check non-equality
> bigger to
>= bigger or equal to
< less to
<= less or equal to

Let’s see some examples.

vec <- c(2,3,4,5,6)
vec > 4

## [1] FALSE FALSE FALSE  TRUE  TRUE

We constructed a vector vec and check which entries were larger than 4. The output is a Boolean vector with the same number of entries as vec where only the last two entries are TRUE. Similarly,

vec <- c(2,3,4,5,6)
vec == 4

## [1] FALSE FALSE  TRUE FALSE FALSE

has a TRUE in the third entry only.

So if we were to be interested in returning the elements of vec that are larger than 4 we could use the code

vec <- c(2,3,4,5,6)
vec[vec > 4]

## [1] 5 6

So we have a vector with only elements 5 and 6.

2.3.4 Manipulating dataframes

We have seen in the previous section that dataframes are special types of matrices where columns can include a different data type. For this reason they have special way to manipulate and access their entries.

First, specific columns of a dataframe can be accessed using its name and the $ sign as follows.

data <- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
                   X3 = c("male","male","female"))
data$X1

## [1] 1 2 3

data$X3

## [1] male   male   female
## Levels: female male

So using the name of the dataframe data followed by $ and then the name of the column, for instance X1, we access that specific column of the dataframe.

Second, we can use the $ sign to add new columns to a dataframe. Consider the following code.

data <- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
                   X3 = c("male","male","female"))
data$X4 <- c("yes","no","no")
data

##   X1    X2     X3  X4
## 1  1  TRUE   male yes
## 2  2 FALSE   male  no
## 3  3 FALSE female  no

data now includes a fourth column called X4 coinciding to the vector c("yes","no","no").

Third, we can select specific rows of a dataframe using the command subset. Consider the following example.

data <- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
                   X3 = c("male","male","female"))
subset(data, X1 <= 2)

##   X1    X2   X3
## 1  1  TRUE male
## 2  2 FALSE male

The above code returns the rows of data such that X1 is less or equal to 2. More complex rules to subset a dataframe can be combined using the and operator & and the or operator |. Let’s see an example.

data <- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
                   X3 = c("male","male","female"))
subset(data, X1 <= 2 & X2 == TRUE)

##   X1   X2   X3
## 1  1 TRUE male

So the above code selects the rows such that X1 is less or equal to 2 and X2 is TRUE. This is the case only for the first row of data.

2.3.5 Information about objects

Here is a list of functions which are often useful to get information about objects in R.

length returns the number of entries in a vector.
dim returns the number of rows and columns of a matrix or a dataframe
unique returns the unique elements of a vector or the unique rows of a matrix or a dataframe.
head returns the first entries of a vector or the first rows of a matrix or a dataframe
order returns a re-ordering of a vector or a data.frame in ascending order.

Let’s see some examples.

vec <- c(4,2,7,5,5)
length(vec)

## [1] 5

unique(vec)

## [1] 4 2 7 5

order(vec)

## [1] 2 1 4 5 3

length gives the number of elements of vec, unique returns the different values in vec (so 5 is not repeated), order returns in entry i the ordering of the i-th entry of vec. So the first entry of order(vec) is 2 since 4 is the second-smallest entry of vec.

data <- data.frame(X1 = c(1,2,3,4), X2 = c(TRUE,FALSE,FALSE,FALSE),
                   X3 = c("male","male","female","female"))
dim(data)

## [1] 4 3

So dim tells us that data has four rows and three columns.