Chapter 10 Tibbles

Author: Chunji Wang
Reviewer:

10.1

10.1.1

library(tidyverse)

10.2 Creating tibbles

Useful functions:

as_tibble(iris)
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
##  1          5.1         3.5          1.4         0.2  setosa
##  2          4.9         3.0          1.4         0.2  setosa
##  3          4.7         3.2          1.3         0.2  setosa
##  4          4.6         3.1          1.5         0.2  setosa
##  5          5.0         3.6          1.4         0.2  setosa
##  6          5.4         3.9          1.7         0.4  setosa
##  7          4.6         3.4          1.4         0.3  setosa
##  8          5.0         3.4          1.5         0.2  setosa
##  9          4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa
## # ... with 140 more rows

10.3

10.3.1 Printing

nycflights13::flights %>% 
  print(n = 3, width = Inf)
## # A tibble: 336,776 x 19
##    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest air_time distance  hour minute           time_hour
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>  <chr> <chr>    <dbl>    <dbl> <dbl>  <dbl>              <dttm>
## 1  2013     1     1      517            515         2      830            819        11      UA   1545  N14228    EWR   IAH      227     1400     5     15 2013-01-01 05:00:00
## 2  2013     1     1      533            529         4      850            830        20      UA   1714  N24211    LGA   IAH      227     1416     5     29 2013-01-01 05:00:00
## 3  2013     1     1      542            540         2      923            850        33      AA   1141  N619AA    JFK   MIA      160     1089     5     40 2013-01-01 05:00:00
## # ... with 3.368e+05 more rows

You can see a complete list of options by looking at the package help with package?tibble.

10.3.2 Subsetting

df %>% .$x is equal to df$x

10.4

10.5 Exercises

  1. How can you tell if an object is a tibble? (Hint: try printing mtcars, which is a regular data frame).
print(mtcars)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
print(as_tibble(mtcars))
## # A tibble: 32 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4
##  2  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4
##  3  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
##  4  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1
##  5  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2
##  6  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3     1
##  7  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4
##  8  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
##  9  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2
## 10  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4
## # ... with 22 more rows
  1. Compare and contrast the following operations on a data.frame and equivalent tibble. What is different? Why might the default data.frame behaviours cause you frustration?
df <- data.frame(abc = 1, xyz = "a")
tb <- tibble(abc = 1, xyz = "a")

Partial matching is annoying; string as factor is annoying.

df$x
## [1] a
## Levels: a
tb$x
## Warning: Unknown or uninitialised column: 'x'.
## NULL

Same operiation [] to data.frame, but results are categorically different.

df[, "xyz"]
## [1] a
## Levels: a
df[, c("abc", "xyz")]
##   abc xyz
## 1   1   a

tibble always return a tibble as a result of [] operation.

tb[, "xyz"]
## # A tibble: 1 x 1
##     xyz
##   <chr>
## 1     a
tb[, c("abc", "xyz")]
## # A tibble: 1 x 2
##     abc   xyz
##   <dbl> <chr>
## 1     1     a
  1. If you have the name of a variable stored in an object, e.g. var <- "mpg", how can you extract the reference variable from a tibble?
var <- "mpg"
tb <- tibble(mpg = 5, wheel = 4)
tb %>% .[[var]]
## [1] 5
  1. Practice referring to non-syntactic names in the following data frame by:
    1. Extracting the variable called 1.
    2. Plotting a scatterplot of 1 vs 2.
    3. Creating a new column called 3 which is 2 divided by 1.
    4. Renaming the columns to one, two and three.
annoying <- tibble(
  `1` = 1:10,
  `2` = `1` * 2 + rnorm(length(`1`))
)
annoying %>% .$`1`
##  [1]  1  2  3  4  5  6  7  8  9 10
ggplot2::ggplot(annoying, aes(x = `1`, y = `2`)) + ggplot2::geom_point()

annoying <- annoying %>% mutate(`3` = `2` / `1`)
print(annoying)
## # A tibble: 10 x 3
##      `1`       `2`      `3`
##    <int>     <dbl>    <dbl>
##  1     1  2.360411 2.360411
##  2     2  4.283710 2.141855
##  3     3  5.476050 1.825350
##  4     4  7.736711 1.934178
##  5     5  9.705020 1.941004
##  6     6 12.213779 2.035630
##  7     7 14.092068 2.013153
##  8     8 16.820143 2.102518
##  9     9 17.981380 1.997931
## 10    10 20.137297 2.013730
annoying <- annoying %>% rename(one = `1`, two = `2`, three = `3`)
print(annoying)
## # A tibble: 10 x 3
##      one       two    three
##    <int>     <dbl>    <dbl>
##  1     1  2.360411 2.360411
##  2     2  4.283710 2.141855
##  3     3  5.476050 1.825350
##  4     4  7.736711 1.934178
##  5     5  9.705020 1.941004
##  6     6 12.213779 2.035630
##  7     7 14.092068 2.013153
##  8     8 16.820143 2.102518
##  9     9 17.981380 1.997931
## 10    10 20.137297 2.013730
  1. What does tibble::enframe() do? When might you use it?
enframe(1:3)
## # A tibble: 3 x 2
##    name value
##   <int> <int>
## 1     1     1
## 2     2     2
## 3     3     3
  1. What option controls how many additional column names are printed at the footer of a tibble?
    tibble.max_extra_cols