3.2 Overview

In Chapter 2, we learned that data objects are characterized by their shape and by their type. To create and use more flexible and more powerful data objects, we need the additional term of data structures: Data structures are constructs that store particular combinations of data types and shapes. The range of possible shapes of a data object is determined by its data structure, but data structures are more general constructs that affect both the types and the shapes of data objects.

In R, different data structures are distinguished based on the fact whether they contain only a single or multiple data types. Thus, the columns of Table 3.1 distinguish between data structures for “homogeneous” vs. “heterogeneous” data types.

Table 3.1: Overview of R data structures (i.e., combinations of data shapes and data types).
Dimensions Homogeneous data types Heterogeneous data types
1D atomic vector list
2D matrix table (data frame/tibble)
nD array

Although Table 3.1 contains five different data structures, only two of them are by far the most important ones for our purposes:

  • vectors are linear (1-dimensional) data structures. So-called atomic vectors only contain a single data type and have a length of 1 or more elements.

  • tables are rectangular (2-dimensional) data structures that can contain data of different types (in different columns). The terms data frames and tibbles denote two slightly different types of tables.

A good question is: Where are scalar objects in Table 3.1? The answer is: R is a vector-based language. Thus, even scalar objects are represented as (atomic, i.e., homogeneous) vectors (of length \(1\)).