3.3 Linear data structures

Atomic vectors and lists are both linear data structures insofar as they only have one dimension (1D) and are characterized by their length. However, atomic vectors and lists differ in the type(s) of data that they can accommodate:

  • in atomic vectors, all elements are of the same data type

  • in lists, elements can have different data types

3.3.1 Atomic vectors

In Chapter 2, we discussed vectors as the primary data structure in R (see Section 2.3). To repeat, a vector is an ordered sequence of elements with three common properties:

  1. its type of elements (tested by typeof());
  2. its length (tested by length());
  3. optional attributes or meta-data (tested by attributes()).

As a straightforward way of representing some facts about five people, we could store their name, gender, and age as vectors:

name   <- c("Adam", "Ben", "Cecily", "David", "Evelyn")
gender <- c("male", "male", "female", "male", "misc")
age    <- c(21, 19, 20, 48, 45)

Here, name and gender are character vectors, whereas age is a numeric vector. The following expression creates a logical vector that indicates which values of age are at least 21 and the corresponding names:

age >= 21
#> [1]  TRUE FALSE FALSE  TRUE  TRUE
name[age >= 21] 
#> [1] "Adam"   "David"  "Evelyn"

If any of these terms are unclear, consider revisiting the introductory Chapter 2 on Vectors (Section 2.3) and [logical indexing] (Section 2.3.2).

The vast majority of vectors we encounter are atomic vectors (i.e., all elements of the same type), but lists are often used in R for storing a variety of data types in a common object (e.g., in statistical analyses). It is important to understand that the term “atomic” in “atomic vectors” refers to the type of the vector, rather than its shape or length: Atomic vectors can contain one or more objects of any type (i.e., can have multiple lengths), but not multiple types.

In the context of data structures, we extend the basic data structure of atomic vectors in several ways:

  1. Lists are complex/hierarchical vectors that accept multiple data types

  2. Matrices are atomic vectors with additional shape attributes

  3. Rectangular tables (data frame and tibbles) are lists of atomic vectors

3.3.2 Lists

Beyond atomic vectors, R provides lists as yet another data structure to store linear sequences of elements. Internally, lists in R actually are vectors (see Wickham, 2014a for details).
However, rather than atomic vectors, lists are complex vectors (aka. hierarchical or recursive vectors) that can contain elements of multiple data types/modes and various shapes. Thus, the key feature of lists is that their elements can contain objects of different data types.

Defining lists

Lists are sequential data structures in which each element can have an internal structure. Thus, lists are similar to atomic vectors (e.g., in having a linear shape that is characterized by its length()). Crucially, different elements of a list can be of different data types (or modes). As every element of a list can also be a complex (rather than an elementary) object, lists are also described as “hierarchical” data structures.

We can create a list by applying the list() function to a sequence of elements:

# lists can contain a mix of data shapes:
l_1 <- list(1, 2, 3)     # 3 elements (all numeric scalars)
l_1
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
l_2 <- list(1, c(2, 3))  # 2 elements (of different lengths)
l_2
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2 3

The objects l_1 and l_2 are both lists and contain the same three numeric elements, but differ in the representation’s shape:

  • l_1 is a list with three elements, each of which is a scalar (i.e., vector of length 1).
  • l_2 is a list with two elements. The first is a scalar, but the second is a (numeric) vector (of length 2).

Technically, a list is implemented in R as a vector of the mode “list”:

vl <- vector(length = 2, mode = "list")

is.vector(vl)
#> [1] TRUE
is.list(vl)
#> [1] TRUE

The fact that lists are also implemented as vectors (albeit hierarchical or recursive vectors) justifies the statement that vectors are the fundamental data structure in R.

Due to their hiearchical nature, lists are more flexible, but also more complex than the other data shapes we encountered so far. Unlike atomic vectors (i.e., vectors that only contain one type of data), lists can contain a mix of data shapes and data types. A simple example that combines multiple data types (here: “numeric”/“double” and “character”) is the following:

# lists can contain a multiple data types:
(l_3 <- list(1, "B", 3))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "B"
#> 
#> [[3]]
#> [1] 3

The ability to store a mix of data shapes and types in a list allows creating complex representations. The following list contains both a mix of data types and shapes:

# lists can contain a mix of data types and shapes:
(l_4 <- list(1, "B", c(3, 4), c(TRUE, TRUE, FALSE)))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "B"
#> 
#> [[3]]
#> [1] 3 4
#> 
#> [[4]]
#> [1]  TRUE  TRUE FALSE

As lists can contain other lists, they can be used to construct arbitrarily complex data structures (like tables or tree-like hierarchies):

# lists can contain other lists: 
l_5 <- list(l_1, l_2, l_3, l_4)

Finally, the elements of lists can be named. As with vectors, the names() function is used to both retrieve and assign names:

# names of list elements:
names(l_5)  # check names => no names yet: 
#> NULL
names(l_5) <- c("uno", "two", "trois", "four")  # assign names:

names(l_5)  # check names:
#> [1] "uno"   "two"   "trois" "four"

As with atomic vectors, two basic questions for working with lists are:

  • How can we inspect a list?
  • How can we access the elements of a list?

We will briefly introduce some functions for addressing these tasks.

Inspecting lists

The is.list() function allows checking whether some R object is a list:

is.list(l_3)  # a list
#> [1] TRUE
is.list(1:3)  # a vector
#> [1] FALSE
is.list("A")  # a scalar
#> [1] FALSE

Whereas atomic vectors are not lists, lists are also vectors (as lists are hierarchical vectors):

is.vector(l_3)
#> [1] TRUE

As lists are linear data structures, we can always inspect their length:

length(l_3)
#> [1] 3
length(l_4)
#> [1] 4

As the hierarchical nature of lists makes them objects with a potentially interesting structure, a useful function for inspecting lists is str():

str(l_3)
#> List of 3
#>  $ : num 1
#>  $ : chr "B"
#>  $ : num 3
str(l_4)
#> List of 4
#>  $ : num 1
#>  $ : chr "B"
#>  $ : num [1:2] 3 4
#>  $ : logi [1:3] TRUE TRUE FALSE

Lists are powerful structures for representing data of various types and shapes, but can easily get complicated. In practice, we will rarely need lists, as vectors and tables are typically sufficient for our purposes. However, as we occasionally will encouter lists (e.g., as the output of statistical functions), it is good to be aware of them and know how to access their elements.

Accessing list elements

As lists are implemented as vectors, accessing list elements is similar to indexing vectors, but needs to account for an additional layer of complexity. This is achieved by distinguishing between single square brackets (i.e., []) and double square brackets ([[]]):

  • x[i] returns the i-th sub-list of a list x (as a list);

  • x[[i]] removes a level of the hierarchy and returns the i-th element of a list x (as an object).

The distinction between single and double square brackets is important when working with lists:

  • [] always returns a smaller (sub-)list, whereas
  • [[]] removes a hierarchy level to return list elements.

Thus, what is achieved by [] with vectors is achieved by [[]] with lists. An example illustrates the difference:

l_4       # a list
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "B"
#> 
#> [[3]]
#> [1] 3 4
#> 
#> [[4]]
#> [1]  TRUE  TRUE FALSE
l_4[3]    # get 3rd sub-list (a list with 1 element)
#> [[1]]
#> [1] 3 4
l_4[[3]]  # get 3rd list element (a vector)
#> [1] 3 4

For named lists, there is another way of accessing list elements that is similar to accessing the named variables (columns) of a data frame:

  • x$n selects a list element (like [[]]) with the name n.
# l_5      # a list with named elements
l_5$two    # get element with the name "two"
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2 3

Importantly, using [[]] and $n both return list elements that can be of various data types and shapes. In the case of l_5, the 2nd element named “two” happens to be a list:

identical(l_5$two, l_5[[2]])
#> [1] TRUE

For additional details on lists, as well as helpful analogies and visualizations, see 20.5 Recursive vectors (lists) of r4ds (Wickham & Grolemund, 2017).

Using lists or vectors?

Due to their flexibility, any data structure used above can be re-represented as a list. For instance, the name or age vectors could easily be transformed into lists with 5 elements:

as.list(name)
as.list(age)

However, this re-representation would only add complexity without a clear benefit. As long as we just want to store elements of a single data type, using an atomic vector is not only sufficient, but simpler and usually better than using a list.

What could justify using a list? Like vectors, lists store linear sequences of elements. But lists are only needed when storing heterogeneous data, i.e., data of different types (e.g., numbers and text) or sequences of elements of different shapes (e.g., both scalars and vectors). For instance, statistical functions often use lists to return a lot of information about an analysis in one complex object.

From linear data structures (like atomic vectors and lists), it is only a small step to study rectangular data structures known as matrices and tables.

References

Wickham, H. (2014a). Advanced R (1st ed.). Chapman; Hall/CRC. http://adv-r.had.co.nz/
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media, Inc. http://r4ds.had.co.nz