12 Using tibbles

Having learned how to import data into R (in Chapter 11), we now take a closer look at the rectangular data structure known as a tibble, which is a simplified version of an R data frame.

The key topic of this chapter is:

Tibbles are the main rectangular data structure used in the tidyverse (Wickham et al., 2019). When using only base R, we could get by just fine by only using data frames, rather than tibbles. But tibbles have some advantages that make them more convenient and sometimes safer to use.

While these packages are more consistent and convenient than the corresponding base R functions, the utils package also contains generic functions for reading and writing files, and we could

An important precondition for working productively with R (or any other programming language) is that we have some basic understanding of file systems and storage locations. Thus, this chapter needs to briefly explain the notion of (absolute or relative) paths and how to organize R projects.

Preflections

Before reading, please take some time to reflect upon the following questions:

i2ds: Preflexions

  • How can we turn an R data frame into a tibble?

  • How can we turn non-rectangular data (e.g., R vectors) into a tibble?

  • How could we enter data from scratch?

As tibbles make our lives easier, these questions correspond to three simple tibble functions.

12.1 Introduction

Data is rarely entered directly into R. When we analyze data, getting data into R can either imply

  1. importing data from some file or server (see Chapter 11), or

  2. creating data from scratch (see below).

In both cases, we aim to end up with rectangular data structure known as a “tibble”, which is a simplified type of data frame, used in the tidyverse (Wickham et al., 2019).

Key concepts

Key concepts of this chapter on creating tibbles include:

  • rectangular data structures (with rows/cases and columns/variables)
  • data frames vs. tibbles
  • data types: logical, character, or numeric variables vs. factors

Resources

Resources for this chapter on creating tibbles include:

12.2 Creating tibbles

In Chapter 3 on Data structures, we combined vectors of the same length into a special type of rectangular data structure that was called a data frame. Internally, such tables are represented as a list of vectors and their elements (i.e., cells, columns, or rows) and can be accessed by logical or numeric subsetting. Whenever we load (i.e., parse or read) rectangular data into R, a desirable data structure of the resulting object is a data frame or a tibble.

Tibbles are simple data tables and the primary way of representing data (in rows of cases and columns of variables) in the tidyverse (Wickham, 2023b). Internally, tibbles are a special, simplified form of a data frame.

How can we create tibbles from other rectangular data structures (e.g., vectors or data frames)? And what can we do not yet have another rectangular table, but want to create a tibble from scratch?

Key tibble functions include:

  • as_tibble() converts (or coerces) an existing rectangle of data (e.g., a data frame or matrix) into a tibble.

  • tibble() converts several vectors into (the columns of) a tibble.

  • tribble() converts a table (entered row-by-row) into a tibble.

Thus, the three functions differ by the types of inputs they expect, but have in common that they create a tibble as their output.

While some R veterans may prefer data frames, tibbles are becoming increasingly popular. But if we ever need to transform a tibble tb into a data frame, we can always use as.data.frame(tb).

These commands and corresponding examples are illustrated in Section 5.2 Essential tibble commands of the ds4psy book (Neth, 2023a).

12.3 Conclusion

12.3.1 Summary

An R process always runs in some working directory. Getting data into R (or any other computing system) assumes a basic understanding of files, folders, and paths (i.e., descriptions of their locations, typically in the form of character strings).

The R packages readr and the tibble provide functions for creating data structures known as “tibbles” in R. For our purposes, tibbles are a simpler and well-behaved data frames. More specifically,

  • readr provides functions for reading data (stored as vectors or rectangular tables) into tibbles;

  • tibble provides functions for getting tibbles from other data structures (data frames, vectors, or row-wise tables).

12.3.2 Resources

i2ds: Links to resources, etc.

Figure 12.1 shows the back of the cheatsheet on the tidyr package:

Essential tibble commands on the RStudio cheatsheet on the tidyr package.

Figure 12.1: Essential tibble commands on the RStudio cheatsheet on the tidyr package.

The fact that both tibble and tidyr are covered by the same RStudio cheatsheet on Data tidying show the close correspondence between both topics.

12.3.3 Preview

The next two chapters will introduce the dplyr package and the tidyr package for data transformation. Just like readr and tibble, those packages are key components of the tidyverse (Wickham, 2023b).

12.4 Exercises

i2ds: Exercises

The following exercises link to the corresponding exercises of Chapter 5: Tibbles.

12.4.1 Flower power

Parts 1, 3–5 (i.e., all except for Part 2) of Exercise 1

12.4.2 Rental accounting

Part 1 of Exercise 2

12.4.3 False positive psychology

Exercise 4

This concludes our exercises on creating tibbles.