3.3 Linear data structures
Atomic vectors and lists are both linear data structures insofar as they only have one dimension (1D) and are characterized by their length. However, atomic vectors and lists differ in the type(s) of data that they can accommodate:
in atomic vectors, all elements are of the same data type
in lists, elements can have different data types
3.3.1 Atomic vectors
In Chapter 2, we discussed vectors as the primary data structure in R (see Section 2.3). To repeat, a vector is an ordered sequence of elements with three common properties:
- its type of elements (tested by
typeof()
);
- its length (tested by
length()
); - optional attributes or meta-data (tested by
attributes()
).
As a straightforward way of representing some facts about five people, we could store their name
, gender
, and age
as vectors:
<- c("Adam", "Ben", "Cecily", "David", "Evelyn")
name <- c("male", "male", "female", "male", "misc")
gender <- c(21, 19, 20, 48, 45) age
Here, name
and gender
are character vectors, whereas age
is a numeric vector.
The following expression creates a logical vector that indicates which values of age
are at least 21 and the corresponding names:
>= 21
age #> [1] TRUE FALSE FALSE TRUE TRUE
>= 21]
name[age #> [1] "Adam" "David" "Evelyn"
If any of these terms are unclear, consider revisiting the introductory Chapter 2 on Vectors (Section 2.3) and [logical indexing] (Section 2.3.2).
The vast majority of vectors we encounter are atomic vectors (i.e., all elements of the same type), but lists are often used in R for storing a variety of data types in a common object (e.g., in statistical analyses). It is important to understand that the term “atomic” in “atomic vectors” refers to the type of the vector, rather than its shape or length: Atomic vectors can contain one or more objects of any type (i.e., can have multiple lengths), but not multiple types.
In the context of data structures, we extend the basic data structure of atomic vectors in several ways:
Lists are complex/hierarchical vectors that accept multiple data types
Matrices are atomic vectors with additional shape attributes
Rectangular tables (data frame and tibbles) are lists of atomic vectors
3.3.2 Lists
Beyond atomic vectors, R provides lists as yet another data structure to store linear sequences of elements.
Internally, lists in R actually are vectors (see Wickham, 2014a for details).
However, rather than atomic vectors, lists are complex vectors (aka. hierarchical or recursive vectors) that can contain elements of multiple data types/modes and various shapes.
Thus, the key feature of lists is that their elements can contain objects of different data types.
Defining lists
Lists are sequential data structures in which each element can have an internal structure.
Thus, lists are similar to atomic vectors (e.g., in having a linear shape that is characterized by its length()
).
Crucially, different elements of a list can be of different data types (or modes).
As every element of a list can also be a complex (rather than an elementary) object, lists are also described as “hierarchical” data structures.
We can create a list by applying the list()
function to a sequence of elements:
# lists can contain a mix of data shapes:
<- list(1, 2, 3) # 3 elements (all numeric scalars)
l_1
l_1#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
<- list(1, c(2, 3)) # 2 elements (of different lengths)
l_2
l_2#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
The objects l_1
and l_2
are both lists and contain the same three numeric elements, but differ in the representation’s shape:
l_1
is a list with three elements, each of which is a scalar (i.e., vector of length 1).
l_2
is a list with two elements. The first is a scalar, but the second is a (numeric) vector (of length 2).
Technically, a list is implemented in R as a vector of the mode “list”:
<- vector(length = 2, mode = "list")
vl
is.vector(vl)
#> [1] TRUE
is.list(vl)
#> [1] TRUE
The fact that lists are also implemented as vectors (albeit hierarchical or recursive vectors) justifies the statement that vectors are the fundamental data structure in R.
Due to their hiearchical nature, lists are more flexible, but also more complex than the other data shapes we encountered so far. Unlike atomic vectors (i.e., vectors that only contain one type of data), lists can contain a mix of data shapes and data types. A simple example that combines multiple data types (here: “numeric”/“double” and “character”) is the following:
# lists can contain a multiple data types:
<- list(1, "B", 3))
(l_3 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3
The ability to store a mix of data shapes and types in a list allows creating complex representations. The following list contains both a mix of data types and shapes:
# lists can contain a mix of data types and shapes:
<- list(1, "B", c(3, 4), c(TRUE, TRUE, FALSE)))
(l_4 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3 4
#>
#> [[4]]
#> [1] TRUE TRUE FALSE
As lists can contain other lists, they can be used to construct arbitrarily complex data structures (like tables or tree-like hierarchies):
# lists can contain other lists:
<- list(l_1, l_2, l_3, l_4) l_5
Finally, the elements of lists can be named.
As with vectors, the names()
function is used to both retrieve and assign names:
# names of list elements:
names(l_5) # check names => no names yet:
#> NULL
names(l_5) <- c("uno", "two", "trois", "four") # assign names:
names(l_5) # check names:
#> [1] "uno" "two" "trois" "four"
As with atomic vectors, two basic questions for working with lists are:
- How can we inspect a list?
- How can we access the elements of a list?
We will briefly introduce some functions for addressing these tasks.
Inspecting lists
The is.list()
function allows checking whether some R object is a list:
is.list(l_3) # a list
#> [1] TRUE
is.list(1:3) # a vector
#> [1] FALSE
is.list("A") # a scalar
#> [1] FALSE
Whereas atomic vectors are not lists, lists are also vectors (as lists are hierarchical vectors):
is.vector(l_3)
#> [1] TRUE
As lists are linear data structures, we can always inspect their length:
length(l_3)
#> [1] 3
length(l_4)
#> [1] 4
As the hierarchical nature of lists makes them objects with a potentially interesting structure, a useful function for inspecting lists is str()
:
str(l_3)
#> List of 3
#> $ : num 1
#> $ : chr "B"
#> $ : num 3
str(l_4)
#> List of 4
#> $ : num 1
#> $ : chr "B"
#> $ : num [1:2] 3 4
#> $ : logi [1:3] TRUE TRUE FALSE
Lists are powerful structures for representing data of various types and shapes, but can easily get complicated. In practice, we will rarely need lists, as vectors and tables are typically sufficient for our purposes. However, as we occasionally will encouter lists (e.g., as the output of statistical functions), it is good to be aware of them and know how to access their elements.
Accessing list elements
As lists are implemented as vectors, accessing list elements is similar to indexing vectors, but needs to account for an additional layer of complexity. This is achieved by distinguishing between single square brackets (i.e., []
) and double square brackets ([[]]
):
x[i]
returns the i-th sub-list of a listx
(as a list);x[[i]]
removes a level of the hierarchy and returns the i-th element of a listx
(as an object).
The distinction between single and double square brackets is important when working with lists:
[]
always returns a smaller (sub-)list, whereas
[[]]
removes a hierarchy level to return list elements.
Thus, what is achieved by []
with vectors is achieved by [[]]
with lists. An example illustrates the difference:
# a list
l_4 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3 4
#>
#> [[4]]
#> [1] TRUE TRUE FALSE
3] # get 3rd sub-list (a list with 1 element)
l_4[#> [[1]]
#> [1] 3 4
3]] # get 3rd list element (a vector)
l_4[[#> [1] 3 4
For named lists, there is another way of accessing list elements that is similar to accessing the named variables (columns) of a data frame:
x$n
selects a list element (like[[]]
) with the namen
.
# l_5 # a list with named elements
$two # get element with the name "two"
l_5#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
Importantly, using [[]]
and $n
both return list elements that can be of various data types and shapes. In the case of l_5
, the 2nd element named “two” happens to be a list:
identical(l_5$two, l_5[[2]])
#> [1] TRUE
For additional details on lists, as well as helpful analogies and visualizations, see 20.5 Recursive vectors (lists) of r4ds (Wickham & Grolemund, 2017).
Using lists or vectors?
Due to their flexibility, any data structure used above can be re-represented as a list.
For instance, the name
or age
vectors could easily be transformed into lists with 5 elements:
as.list(name)
as.list(age)
However, this re-representation would only add complexity without a clear benefit. As long as we just want to store elements of a single data type, using an atomic vector is not only sufficient, but simpler and usually better than using a list.
What could justify using a list? Like vectors, lists store linear sequences of elements. But lists are only needed when storing heterogeneous data, i.e., data of different types (e.g., numbers and text) or sequences of elements of different shapes (e.g., both scalars and vectors). For instance, statistical functions often use lists to return a lot of information about an analysis in one complex object.
From linear data structures (like atomic vectors and lists), it is only a small step to study rectangular data structures known as matrices and tables.