## 3.3 Linear data structures

Atomic vectors and lists are both linear data structures insofar as they only have one dimension (1D) and are characterized by their *length*.
However, atomic vectors and lists differ in the type(s) of data that they can accommodate:

in

*atomic vectors*, all elements are of the same data typein

*lists*, elements can have different data types

### 3.3.1 Atomic vectors

In Chapter 2, we discussed vectors as the primary data structure in R (see Section 2.3). To repeat, a vector is an ordered sequence of elements with three common properties:

- its
*type*of elements (tested by`typeof()`

);

- its
*length*(tested by`length()`

); - optional
*attributes*or meta-data (tested by`attributes()`

).

As a straightforward way of representing some facts about five people, we could store their `name`

, `gender`

, and `age`

as vectors:

```
<- c("Adam", "Ben", "Cecily", "David", "Evelyn")
name <- c("male", "male", "female", "male", "misc")
gender <- c(21, 19, 20, 48, 45) age
```

Here, `name`

and `gender`

are character vectors, whereas `age`

is a numeric vector.
The following expression creates a logical vector that indicates which values of `age`

are at least 21 and the corresponding names:

```
>= 21
age #> [1] TRUE FALSE FALSE TRUE TRUE
>= 21]
name[age #> [1] "Adam" "David" "Evelyn"
```

If any of these terms are unclear, consider revisiting the introductory Chapter 2 on Vectors (Section 2.3) and [logical indexing] (Section 2.3.2).

The vast majority of vectors we encounter are *atomic vectors* (i.e., all elements of the same type), but *lists* are often used in R for storing a variety of data types in a common object (e.g., in statistical analyses).
It is important to understand that the term “atomic” in “atomic vectors” refers to the *type* of the vector, rather than its *shape* or *length*: Atomic vectors can contain one or more objects of any type (i.e., can have multiple lengths), but not multiple types.

In the context of data structures, we extend the basic data structure of atomic vectors in several ways:

*Lists*are complex/hierarchical vectors that accept multiple data types*Matrices*are atomic vectors with additional shape attributes*Rectangular tables*(data frame and tibbles) are lists of atomic vectors

### 3.3.2 Lists

Beyond atomic vectors, R provides *lists* as yet another data structure to store linear sequences of elements.
Internally, lists in R actually *are* vectors (see Wickham, 2014a for details).

However, rather than *atomic vectors*, lists are *complex vectors* (aka. *hierarchical* or *recursive vectors*) that can contain elements of multiple data types/modes and various shapes.
Thus, the key feature of lists is that their elements can contain objects of different data types.

#### Defining lists

*Lists* are sequential data structures in which each element can have an internal structure.
Thus, lists are similar to atomic vectors (e.g., in having a linear shape that is characterized by its `length()`

).
Crucially, different elements of a list can be of different data types (or modes).
As every element of a list can also be a complex (rather than an elementary) object, lists are also described as “hierarchical” data structures.

We can create a list by applying the `list()`

function to a sequence of elements:

```
# lists can contain a mix of data shapes:
<- list(1, 2, 3) # 3 elements (all numeric scalars)
l_1
l_1#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
<- list(1, c(2, 3)) # 2 elements (of different lengths)
l_2
l_2#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
```

The objects `l_1`

and `l_2`

are both lists and contain the same three numeric elements, but differ in the representation’s shape:

`l_1`

is a list with three elements, each of which is a scalar (i.e., vector of length 1).

`l_2`

is a list with two elements. The first is a scalar, but the second is a (numeric) vector (of length 2).

Technically, a list is implemented in R as a *vector* of the mode “list”:

```
<- vector(length = 2, mode = "list")
vl
is.vector(vl)
#> [1] TRUE
is.list(vl)
#> [1] TRUE
```

The fact that lists are also implemented as vectors (albeit hierarchical or recursive vectors) justifies the statement that vectors are the fundamental data structure in R.

Due to their hiearchical nature, lists are more flexible, but also more complex than the other data shapes we encountered so far.
Unlike atomic vectors (i.e., vectors that only contain one type of data), lists can contain a *mix* of data shapes and data types. A simple example that combines multiple data types (here: “numeric”/“double” and “character”) is the following:

```
# lists can contain a multiple data types:
<- list(1, "B", 3))
(l_3 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3
```

The ability to store a mix of data shapes and types in a list allows creating complex representations. The following list contains both a mix of data types and shapes:

```
# lists can contain a mix of data types and shapes:
<- list(1, "B", c(3, 4), c(TRUE, TRUE, FALSE)))
(l_4 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3 4
#>
#> [[4]]
#> [1] TRUE TRUE FALSE
```

As lists can contain other lists, they can be used to construct arbitrarily complex data structures (like tables or tree-like hierarchies):

```
# lists can contain other lists:
<- list(l_1, l_2, l_3, l_4) l_5
```

Finally, the elements of lists can be named.
As with vectors, the `names()`

function is used to both retrieve and assign names:

```
# names of list elements:
names(l_5) # check names => no names yet:
#> NULL
names(l_5) <- c("uno", "two", "trois", "four") # assign names:
names(l_5) # check names:
#> [1] "uno" "two" "trois" "four"
```

As with atomic vectors, two basic questions for working with lists are:

- How can we inspect a list?
- How can we access the elements of a list?

We will briefly introduce some functions for addressing these tasks.

#### Inspecting lists

The `is.list()`

function allows checking whether some R object is a list:

```
is.list(l_3) # a list
#> [1] TRUE
is.list(1:3) # a vector
#> [1] FALSE
is.list("A") # a scalar
#> [1] FALSE
```

Whereas atomic vectors are not lists, lists *are* also vectors (as lists are *hierarchical* vectors):

```
is.vector(l_3)
#> [1] TRUE
```

As lists are linear data structures, we can always inspect their length:

```
length(l_3)
#> [1] 3
length(l_4)
#> [1] 4
```

As the hierarchical nature of lists makes them objects with a potentially interesting *structure*, a useful function for inspecting lists is `str()`

:

```
str(l_3)
#> List of 3
#> $ : num 1
#> $ : chr "B"
#> $ : num 3
str(l_4)
#> List of 4
#> $ : num 1
#> $ : chr "B"
#> $ : num [1:2] 3 4
#> $ : logi [1:3] TRUE TRUE FALSE
```

Lists are powerful structures for representing data of various types and shapes, but can easily get complicated. In practice, we will rarely need lists, as vectors and tables are typically sufficient for our purposes. However, as we occasionally will encouter lists (e.g., as the output of statistical functions), it is good to be aware of them and know how to access their elements.

#### Accessing list elements

As lists are implemented as vectors, accessing list elements is similar to indexing vectors, but needs to account for an additional layer of complexity. This is achieved by distinguishing between single square brackets (i.e., `[]`

) and double square brackets (`[[]]`

):

`x[i]`

returns the i-th sub-list of a list`x`

(as a list);`x[[i]]`

removes a level of the hierarchy and returns the i-th element of a list`x`

(as an object).

The distinction between single and double square brackets is important when working with lists:

`[]`

always returns a smaller (sub-)list, whereas

`[[]]`

removes a hierarchy level to return list elements.

Thus, what is achieved by `[]`

with vectors is achieved by `[[]]`

with lists. An example illustrates the difference:

```
# a list
l_4 #> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] 3 4
#>
#> [[4]]
#> [1] TRUE TRUE FALSE
3] # get 3rd sub-list (a list with 1 element)
l_4[#> [[1]]
#> [1] 3 4
3]] # get 3rd list element (a vector)
l_4[[#> [1] 3 4
```

For *named* lists, there is another way of accessing list elements that is similar to accessing the named variables (columns) of a data frame:

`x$n`

selects a list element (like`[[]]`

) with the name`n`

.

```
# l_5 # a list with named elements
$two # get element with the name "two"
l_5#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
```

Importantly, using `[[]]`

and `$n`

both return list *elements* that can be of various data types and shapes. In the case of `l_5`

, the 2nd element named “two” happens to be a list:

```
identical(l_5$two, l_5[[2]])
#> [1] TRUE
```

For additional details on lists, as well as helpful analogies and visualizations, see 20.5 Recursive vectors (lists) of r4ds (Wickham & Grolemund, 2017).

#### Using lists or vectors?

Due to their flexibility, any data structure used above can be re-represented as a list.
For instance, the `name`

or `age`

vectors could easily be transformed into lists with 5 elements:

```
as.list(name)
as.list(age)
```

However, this re-representation would only add complexity without a clear benefit. As long as we just want to store elements of a single data type, using an atomic vector is not only sufficient, but simpler and usually better than using a list.

What could justify using a list? Like vectors, lists store linear sequences of elements. But lists are only needed when storing heterogeneous data, i.e., data of different types (e.g., numbers and text) or sequences of elements of different shapes (e.g., both scalars and vectors). For instance, statistical functions often use lists to return a lot of information about an analysis in one complex object.

From linear data structures (like atomic vectors and lists), it is only a small step to study rectangular data structures known as matrices and tables.

### References

*Advanced R*(1st ed.). Chapman; Hall/CRC. http://adv-r.had.co.nz/

*R for data science: Import, tidy, transform, visualize, and model data*. O’Reilly Media, Inc. http://r4ds.had.co.nz