2.3 Accessing and manipulating variables
Now that we have described the main objects we will work with in R, we can discuss how to access specific information.
2.3.1 Accessing a single element
Given a vector vec
we can access its i-th entry with vec[i]
.
<- c(1,3,5)
vec 2] vec[
## [1] 3
For a matrix or a dataframe we need to specify the associated row and column. If we have a matrix mat
we can access the element in entry (i,j) with mat[i,j]
.
<- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat 1,3] mat[
## [1] 7
2.3.2 Acessing multiple entries
To access multiple entries we can on the other hand define a vector of indexes of the elements we want to access. Consider the following examples:
<- c(1,3,5)
vec c(1,2)] vec[
## [1] 1 3
The above code accesses the first two entries of the vector vec
. To do this we had to define a vector using c(1,2)
stating the entries we wanted to look at. For matrices consider:
<- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat c(1,2),c(2,3)] mat[
## [,1] [,2]
## [1,] 4 7
## [2,] 5 8
The syntax is very similar as before. We defined to index vectors, one for the rows and one for columns. The two statements c(1,2)
and c(2,3)
are separated by a comma to denote that the first selects the first and second row, whilst the second selects the second and third column.
If one wants to access full rows or full columns, the argument associated to rows or columns is left blank. Consider the following examples.
<- matrix(c(1,2,3,4,5,6,7,8,9), ncol=3, nrow =3)
mat 1,] mat[
## [1] 1 4 7
c(1,2)] mat[,
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
The code mat[1,]
selects the first full row of mat
. The code mat[,c(1,2)]
selects the first and second column of mat
. Notice that the comma has always to be included!
To access multiple entries it is often useful to define sequences of number quickly. The following command defines the sequence of integer numbers from 1 to 9.
1:9
## [1] 1 2 3 4 5 6 7 8 9
More generally, one can define sequences of numbers using seq
(see ?seq
).
2.3.3 Accessing entries with logical operators
If we want to access elements of an object based on a condition it is often easier to use logical operators. This means comparing entries using the comparisons you would usually use in mathematical reasoning, for instance being equal to, or being larger to. The syntax is as follows:
==
to check equality (notice the two equal signs)!=
to check non-equality>
bigger to>=
bigger or equal to<
less to<=
less or equal to
Let’s see some examples.
<- c(2,3,4,5,6)
vec > 4 vec
## [1] FALSE FALSE FALSE TRUE TRUE
We constructed a vector vec
and check which entries were larger than 4. The output is a Boolean vector with the same number of entries as vec
where only the last two entries are TRUE
. Similarly,
<- c(2,3,4,5,6)
vec == 4 vec
## [1] FALSE FALSE TRUE FALSE FALSE
has a TRUE
in the third entry only.
So if we were to be interested in returning the elements of vec
that are larger than 4 we could use the code
<- c(2,3,4,5,6)
vec > 4] vec[vec
## [1] 5 6
So we have a vector with only elements 5 and 6.
2.3.4 Manipulating dataframes
We have seen in the previous section that dataframes are special types of matrices where columns can include a different data type. For this reason they have special way to manipulate and access their entries.
First, specific columns of a dataframe can be accessed using its name and the $
sign as follows.
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
data X3 = c("male","male","female"))
$X1 data
## [1] 1 2 3
$X3 data
## [1] male male female
## Levels: female male
So using the name of the dataframe data
followed by $
and then the name of the column, for instance X1
, we access that specific column of the dataframe.
Second, we can use the $
sign to add new columns to a dataframe. Consider the following code.
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
data X3 = c("male","male","female"))
$X4 <- c("yes","no","no")
data data
## X1 X2 X3 X4
## 1 1 TRUE male yes
## 2 2 FALSE male no
## 3 3 FALSE female no
data
now includes a fourth column called X4
coinciding to the vector c("yes","no","no")
.
Third, we can select specific rows of a dataframe using the command subset
. Consider the following example.
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
data X3 = c("male","male","female"))
subset(data, X1 <= 2)
## X1 X2 X3
## 1 1 TRUE male
## 2 2 FALSE male
The above code returns the rows of data
such that X1
is less or equal to 2. More complex rules to subset a dataframe can be combined using the and operator &
and the or operator |
. Let’s see an example.
<- data.frame(X1 = c(1,2,3), X2 = c(TRUE,FALSE,FALSE),
data X3 = c("male","male","female"))
subset(data, X1 <= 2 & X2 == TRUE)
## X1 X2 X3
## 1 1 TRUE male
So the above code selects the rows such that X1
is less or equal to 2 and X2
is TRUE
. This is the case only for the first row of data
.
2.3.5 Information about objects
Here is a list of functions which are often useful to get information about objects in R.
length
returns the number of entries in a vector.dim
returns the number of rows and columns of a matrix or a dataframeunique
returns the unique elements of a vector or the unique rows of a matrix or a dataframe.head
returns the first entries of a vector or the first rows of a matrix or a dataframeorder
returns a re-ordering of a vector or a data.frame in ascending order.
Let’s see some examples.
<- c(4,2,7,5,5)
vec length(vec)
## [1] 5
unique(vec)
## [1] 4 2 7 5
order(vec)
## [1] 2 1 4 5 3
length
gives the number of elements of vec
, unique
returns the different values in vec
(so 5 is not repeated), order
returns in entry i the ordering of the i-th entry of vec
. So the first entry of order(vec)
is 2 since 4 is the second-smallest entry of vec
.
<- data.frame(X1 = c(1,2,3,4), X2 = c(TRUE,FALSE,FALSE,FALSE),
data X3 = c("male","male","female","female"))
dim(data)
## [1] 4 3
So dim
tells us that data
has four rows and three columns.