21 Communication—table formatting
21.2 Introduction
The look of the default table in R markdown is not all that attractive. Here’s a summary table of the Oakland Athletics’ seasons from 1999-2008.
oakland_99 <- mlb_pay_wl |>
filter(tm == "OAK" & year_num < 2009)
oakland_99
## # A tibble: 10 × 8
## year_num tm attend_g est_payroll pay_index w l w_l_percent
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1999 OAK 17711 24831833 49.5 87 75 0.537
## 2 2000 OAK 19799 33172333 59.0 91 70 0.565
## 3 2001 OAK 26337 33810750 51.5 102 60 0.63
## 4 2002 OAK 26788 40004167 59.2 103 59 0.636
## 5 2003 OAK 27365 50260834 70.8 96 66 0.593
## 6 2004 OAK 27179 59425667 85.8 91 71 0.562
## 7 2005 OAK 26038 55425762 76.0 88 74 0.543
## 8 2006 OAK 24403 64843079 83.2 93 69 0.574
## 9 2007 OAK 23726 79366940 95.3 76 86 0.469
## 10 2008 OAK 20559 47967126 53.4 75 86 0.466
There are a few R packages that allow us to format a table like this, ready for publication. Here we will take a quick look at two of them. Both have many, many more options for formatting; see the reference materials at the top of this chapter for those details.
21.3 The kable package
First, {kableExtra} is an extension to the {kable} tool that is part of “knitting” our R Markdown files.
To create a kable table, use the kable()
function:
# print table with {kableExtra} formatting
oakland_99 |>
# create table
kable()
year_num | tm | attend_g | est_payroll | pay_index | w | l | w_l_percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17711 | 24831833 | 49.54511 | 87 | 75 | 0.537 |
2000 | OAK | 19799 | 33172333 | 59.03364 | 91 | 70 | 0.565 |
2001 | OAK | 26337 | 33810750 | 51.51234 | 102 | 60 | 0.630 |
2002 | OAK | 26788 | 40004167 | 59.15217 | 103 | 59 | 0.636 |
2003 | OAK | 27365 | 50260834 | 70.82774 | 96 | 66 | 0.593 |
2004 | OAK | 27179 | 59425667 | 85.76543 | 91 | 71 | 0.562 |
2005 | OAK | 26038 | 55425762 | 75.95992 | 88 | 74 | 0.543 |
2006 | OAK | 24403 | 64843079 | 83.20773 | 93 | 69 | 0.574 |
2007 | OAK | 23726 | 79366940 | 95.27085 | 76 | 86 | 0.469 |
2008 | OAK | 20559 | 47967126 | 53.41372 | 75 | 86 | 0.466 |
The package allow for additional formatting to be applied. In the code below, the bootstrap theme “striped” is applied, along with specific row formatting. You’ll notice that the header row is number “0”, and that the rows with data are numbered starting at “1”.
# print table with {kableExtra} formatting
oakland_99 |>
# create table, add title
kable(caption = "Oakland Athletics, 1999-2008") |>
kable_styling(bootstrap_options = "striped", font_size = 10) |>
# make variable names bold and large
row_spec(0, bold = T, font_size = 14) |>
# make 2002 season (row 4) bold
row_spec(4, bold = T)
year_num | tm | attend_g | est_payroll | pay_index | w | l | w_l_percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17711 | 24831833 | 49.54511 | 87 | 75 | 0.537 |
2000 | OAK | 19799 | 33172333 | 59.03364 | 91 | 70 | 0.565 |
2001 | OAK | 26337 | 33810750 | 51.51234 | 102 | 60 | 0.630 |
2002 | OAK | 26788 | 40004167 | 59.15217 | 103 | 59 | 0.636 |
2003 | OAK | 27365 | 50260834 | 70.82774 | 96 | 66 | 0.593 |
2004 | OAK | 27179 | 59425667 | 85.76543 | 91 | 71 | 0.562 |
2005 | OAK | 26038 | 55425762 | 75.95992 | 88 | 74 | 0.543 |
2006 | OAK | 24403 | 64843079 | 83.20773 | 93 | 69 | 0.574 |
2007 | OAK | 23726 | 79366940 | 95.27085 | 76 | 86 | 0.469 |
2008 | OAK | 20559 | 47967126 | 53.41372 | 75 | 86 | 0.466 |
Other {kableExtra} formatting resources are here:
Yihui Xie, Christophe Dervieux, Emily Riederer, “The kableExtra package”, R Markdown Cookbook
Hao Zhu, Create Awesome HTML Table with knitr::kable and kableExtra
Hao Zhu, Using kableExtra in Bookdown
21.4 The {gt} package
A recent package that provides a significant amount of formatting possibilities is {gt}.
We create a basic table object with the gt()
function:
oakland_99 |>
gt()
year_num | tm | attend_g | est_payroll | pay_index | w | l | w_l_percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17711 | 24831833 | 49.54511 | 87 | 75 | 0.537 |
2000 | OAK | 19799 | 33172333 | 59.03364 | 91 | 70 | 0.565 |
2001 | OAK | 26337 | 33810750 | 51.51234 | 102 | 60 | 0.630 |
2002 | OAK | 26788 | 40004167 | 59.15217 | 103 | 59 | 0.636 |
2003 | OAK | 27365 | 50260834 | 70.82774 | 96 | 66 | 0.593 |
2004 | OAK | 27179 | 59425667 | 85.76543 | 91 | 71 | 0.562 |
2005 | OAK | 26038 | 55425762 | 75.95992 | 88 | 74 | 0.543 |
2006 | OAK | 24403 | 64843079 | 83.20773 | 93 | 69 | 0.574 |
2007 | OAK | 23726 | 79366940 | 95.27085 | 76 | 86 | 0.469 |
2008 | OAK | 20559 | 47967126 | 53.41372 | 75 | 86 | 0.466 |
Additional formatting can then be applied to the parts of the table.
tab_header()
is how we apply the title and subtitle. Notice that the first and last years in the table are assigned as objects, which can then be added to the text using theglue::glue()
function.-
There are a variety of number formats available. In this case, the columns are specified but it is also possible to apply a global formatting to a variable (for example, two or more columns of dates can be formatted consistently without having to name them).
the
use_seps = TRUE
will place separators for thousands, millions, etcthe
suffixing = TRUE
automatically determines the length of the number and assigns a letter suffix. Here, the salary figures are in millions, but a similar thing would happen with thousands and billions as well.decimals =
allows us to control the number of decimal places shown.Formatting column data references (with links to each of the different number formats): https://gt.rstudio.com/reference/index.html#formatting-column-data
# define year range to add as variable to title
year_min <- min(oakland_99$year_num)
year_max <- max(oakland_99$year_num)
oakland_table <- oakland_99 |>
gt() |>
tab_header(
title = "Oakland Athletics",
subtitle = glue::glue("{year_min} to {year_max}")
) |>
# apply number formatting
fmt_integer(columns = attend_g, use_seps = TRUE) |>
fmt_number(columns = est_payroll, suffixing = TRUE) |>
fmt_number(columns = pay_index, decimals = 2)
oakland_table
Oakland Athletics | |||||||
1999 to 2008 | |||||||
year_num | tm | attend_g | est_payroll | pay_index | w | l | w_l_percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17,711 | 24.83M | 49.55 | 87 | 75 | 0.537 |
2000 | OAK | 19,799 | 33.17M | 59.03 | 91 | 70 | 0.565 |
2001 | OAK | 26,337 | 33.81M | 51.51 | 102 | 60 | 0.630 |
2002 | OAK | 26,788 | 40.00M | 59.15 | 103 | 59 | 0.636 |
2003 | OAK | 27,365 | 50.26M | 70.83 | 96 | 66 | 0.593 |
2004 | OAK | 27,179 | 59.43M | 85.77 | 91 | 71 | 0.562 |
2005 | OAK | 26,038 | 55.43M | 75.96 | 88 | 74 | 0.543 |
2006 | OAK | 24,403 | 64.84M | 83.21 | 93 | 69 | 0.574 |
2007 | OAK | 23,726 | 79.37M | 95.27 | 76 | 86 | 0.469 |
2008 | OAK | 20,559 | 47.97M | 53.41 | 75 | 86 | 0.466 |
In this next code chunk, we change the column headings to better describe the variables. cols_label()
gives us control over how the column labels are displayed, rather than changing the variable names.
-
cols_label()
reference page: https://gt.rstudio.com/reference/cols_label.html
oakland_table |>
# add column heading labels
cols_label(
year_num = "Year",
tm = "Team",
attend_g = "Average home game attendance",
est_payroll = "Estimated payroll",
pay_index = "Pay index (league average for season = 100)",
w = "Wins",
l = "Loses",
w_l_percent = "Win-Loss Percent"
)
Oakland Athletics | |||||||
1999 to 2008 | |||||||
Year | Team | Average home game attendance | Estimated payroll | Pay index (league average for season = 100) | Wins | Loses | Win-Loss Percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17,711 | 24.83M | 49.55 | 87 | 75 | 0.537 |
2000 | OAK | 19,799 | 33.17M | 59.03 | 91 | 70 | 0.565 |
2001 | OAK | 26,337 | 33.81M | 51.51 | 102 | 60 | 0.630 |
2002 | OAK | 26,788 | 40.00M | 59.15 | 103 | 59 | 0.636 |
2003 | OAK | 27,365 | 50.26M | 70.83 | 96 | 66 | 0.593 |
2004 | OAK | 27,179 | 59.43M | 85.77 | 91 | 71 | 0.562 |
2005 | OAK | 26,038 | 55.43M | 75.96 | 88 | 74 | 0.543 |
2006 | OAK | 24,403 | 64.84M | 83.21 | 93 | 69 | 0.574 |
2007 | OAK | 23,726 | 79.37M | 95.27 | 76 | 86 | 0.469 |
2008 | OAK | 20,559 | 47.97M | 53.41 | 75 | 86 | 0.466 |
To have more control over where the column label line breaks occur, we use the HTML tag <br>
, and add .fn = md
(which calls the md()
for markdown function.)
oakland_table <- oakland_table |>
# add column heading labels
cols_label(
year_num = "Year",
tm = "Team",
attend_g = "Average home<br>game attendance",
est_payroll = "Estimated payroll",
pay_index = "Pay index<br>(league average for<br>season = 100)",
w = "Wins",
l = "Loses",
w_l_percent = "Win-Loss Percent",
.fn = md
)
oakland_table
Oakland Athletics | |||||||
1999 to 2008 | |||||||
Year | Team | Average home game attendance |
Estimated payroll | Pay index (league average for season = 100) |
Wins | Loses | Win-Loss Percent |
---|---|---|---|---|---|---|---|
1999 | OAK | 17,711 | 24.83M | 49.55 | 87 | 75 | 0.537 |
2000 | OAK | 19,799 | 33.17M | 59.03 | 91 | 70 | 0.565 |
2001 | OAK | 26,337 | 33.81M | 51.51 | 102 | 60 | 0.630 |
2002 | OAK | 26,788 | 40.00M | 59.15 | 103 | 59 | 0.636 |
2003 | OAK | 27,365 | 50.26M | 70.83 | 96 | 66 | 0.593 |
2004 | OAK | 27,179 | 59.43M | 85.77 | 91 | 71 | 0.562 |
2005 | OAK | 26,038 | 55.43M | 75.96 | 88 | 74 | 0.543 |
2006 | OAK | 24,403 | 64.84M | 83.21 | 93 | 69 | 0.574 |
2007 | OAK | 23,726 | 79.37M | 95.27 | 76 | 86 | 0.469 |
2008 | OAK | 20,559 | 47.97M | 53.41 | 75 | 86 | 0.466 |
It is also possible to add summary rows that have calculations. In this instance, we will add a row with the average attendance.
- “Add grand summary rows using aggregation functions” reference page: https://gt.rstudio.com/reference/grand_summary_rows.html
oakland_table |>
grand_summary_rows(
columns = attend_g,
fns = list(fn ="mean", id = "attend_g", label = "average"),
fmt = ~ fmt_integer(.)
)
Oakland Athletics | ||||||||
1999 to 2008 | ||||||||
Year | Team | Average home game attendance |
Estimated payroll | Pay index (league average for season = 100) |
Wins | Loses | Win-Loss Percent | |
---|---|---|---|---|---|---|---|---|
1999 | OAK | 17,711 | 24.83M | 49.55 | 87 | 75 | 0.537 | |
2000 | OAK | 19,799 | 33.17M | 59.03 | 91 | 70 | 0.565 | |
2001 | OAK | 26,337 | 33.81M | 51.51 | 102 | 60 | 0.630 | |
2002 | OAK | 26,788 | 40.00M | 59.15 | 103 | 59 | 0.636 | |
2003 | OAK | 27,365 | 50.26M | 70.83 | 96 | 66 | 0.593 | |
2004 | OAK | 27,179 | 59.43M | 85.77 | 91 | 71 | 0.562 | |
2005 | OAK | 26,038 | 55.43M | 75.96 | 88 | 74 | 0.543 | |
2006 | OAK | 24,403 | 64.84M | 83.21 | 93 | 69 | 0.574 | |
2007 | OAK | 23,726 | 79.37M | 95.27 | 76 | 86 | 0.469 | |
2008 | OAK | 20,559 | 47.97M | 53.41 | 75 | 86 | 0.466 | |
average | — | — | 23,991 | — | — | — | — | — |
More resources are here:
Introduction to Creating gt Tables (from the package site)
21.5 The flextable package
Another table formatting package is {flextable}.
In this first code chunk, we create use flextable()
to turn our dataframe into a flextable object, and add a “Source” annotation as a footer to our table.
library(flextable)
oakland_99 |>
flextable() |>
add_footer_row(values = "Source: baseball-reference.com", colwidths = 8)
year_num |
tm |
attend_g |
est_payroll |
pay_index |
w |
l |
w_l_percent |
---|---|---|---|---|---|---|---|
1,999 |
OAK |
17,711 |
24,831,833 |
49.54511 |
87 |
75 |
0.537 |
2,000 |
OAK |
19,799 |
33,172,333 |
59.03364 |
91 |
70 |
0.565 |
2,001 |
OAK |
26,337 |
33,810,750 |
51.51234 |
102 |
60 |
0.630 |
2,002 |
OAK |
26,788 |
40,004,167 |
59.15217 |
103 |
59 |
0.636 |
2,003 |
OAK |
27,365 |
50,260,834 |
70.82774 |
96 |
66 |
0.593 |
2,004 |
OAK |
27,179 |
59,425,667 |
85.76543 |
91 |
71 |
0.562 |
2,005 |
OAK |
26,038 |
55,425,762 |
75.95992 |
88 |
74 |
0.543 |
2,006 |
OAK |
24,403 |
64,843,079 |
83.20773 |
93 |
69 |
0.574 |
2,007 |
OAK |
23,726 |
79,366,940 |
95.27085 |
76 |
86 |
0.469 |
2,008 |
OAK |
20,559 |
47,967,126 |
53.41372 |
75 |
86 |
0.466 |
Source: baseball-reference.com |
You’ll notice that the year has a comma separating the thousands. This is because the “year” variable is numeric. In the chunk below, this is dealt with by using the as.character()
, before the flextable()
function is applied.
The code then adds a variety of formatting.
Note that the functions are format-first. For example, font size is the function fontsize()
and which row is an argument. This is opposite to what we see with {kableExtra}, where the row or column is specified with row_spec()
, and then the formatting applied.
It’s also possible to make formatting conditional on values in the table. In this code, the background colour is set using the bg()
function, but only for those cases where wins (the “w” variable) is 100 or higher.
## print table with {flextable} formatting
oakland_99 |>
mutate(year_num = as.character(year_num)) |>
## create table
flextable() |>
## add row striping
theme_zebra() |>
## set font size to 10
fontsize(size = 10, part = "all") |>
## but set variable names larger
fontsize(size = 14, part = "header") |>
## make variable names bold
bold(part = "header") |>
## make 2002 season (row 4) bold
# bold(i = 4) |>
## or with conditional specification: where year_num is 2002
bold(~ `year_num` == 2002) |>
## conditional background color as gold where wins are greater than 100
bg(~ w >= 100, bg = "gold") |>
## set nicer column labels
set_header_labels(year_num = "Year", tm = "Team", attend_g = "Attendance",
est_payroll = "Payroll ($)", pay_index = "Pay Index",
w = "Wins", l = "Losses", w_l_percent = "Win-Loss Percentage") |>
## add footer
add_footer_row(values = "Source: baseball-reference.com", colwidths = 8)
Year |
Team |
Attendance |
Payroll ($) |
Pay Index |
Wins |
Losses |
Win-Loss Percentage |
---|---|---|---|---|---|---|---|
1999 |
OAK |
17,711 |
24,831,833 |
49.54511 |
87 |
75 |
0.537 |
2000 |
OAK |
19,799 |
33,172,333 |
59.03364 |
91 |
70 |
0.565 |
2001 |
OAK |
26,337 |
33,810,750 |
51.51234 |
102 |
60 |
0.630 |
2002 |
OAK |
26,788 |
40,004,167 |
59.15217 |
103 |
59 |
0.636 |
2003 |
OAK |
27,365 |
50,260,834 |
70.82774 |
96 |
66 |
0.593 |
2004 |
OAK |
27,179 |
59,425,667 |
85.76543 |
91 |
71 |
0.562 |
2005 |
OAK |
26,038 |
55,425,762 |
75.95992 |
88 |
74 |
0.543 |
2006 |
OAK |
24,403 |
64,843,079 |
83.20773 |
93 |
69 |
0.574 |
2007 |
OAK |
23,726 |
79,366,940 |
95.27085 |
76 |
86 |
0.469 |
2008 |
OAK |
20,559 |
47,967,126 |
53.41372 |
75 |
86 |
0.466 |
Source: baseball-reference.com |
Additional {flextable} resources can be found here:
21.6 Another example
Earlier in this course, in the workflow example of the Statistics Canada New Housing Price Index (NHPI), we saw some other examples of data tables that were made publication-ready through {kableExtra} and {flextable} formatting. You are encouraged to review that code to see some of the other options that these packages make available.
-30-