Chapter 20 Building Table One for a Clinical Study

In most clinical research, whether in an epidemiologic study or in a clinical trial, it is important to present a summary of the baseline characteristics of the study sample. This is often referred to as a “Table 1” or “Table One”. You generally want to provide enough information about your study sample so that a reader can determine whether your study sample is similar to the population or the specific patient they are interested in. The reader should be able to determine if the study is generalizable to their popuation or their patient of interest.

There are a number of R packates that can help you build a Table 1. These include the appropriately named {tableone} package, the {arsenal} package, and the {compareGroups} package. But since 2020, the {gtsummary} package has become the most popular package for this purpose. This is in part because it is focused on clinical applications, and because it builds on the robust table infrastructure provided by the {gt} package, which is a modern approach to building tables in R.

In this chapter, we will focus on building a Table 1 using data from the {medicaldata} package, the {gtsummary} package, and the {arsenal} package. The {gtsummary} package is also generally useful for producing other tables and tables of regression results.

20.1 Packages Needed for this Chapter:

{gt}
{gtsummary}
{arsenal}
{medicaldata}
{tidyverse}

20.2 Pathway for this Chapter

Let’s start by looking at the mockstudy data from the {arsenal} package. This is a mock study of treatment of cholangiocarcinoma with 3 different chemotherapy regimens. We will generate the dimensions, then glimpse the data.

dim(mockstudy)

## [1] 1499   14

glimpse(mockstudy)

## Rows: 1,499
## Columns: 14
## $ case        <int> 110754, 99706, 105271, 105001, 112263, 86205, 99508, 90158…
## $ age         <int> 67, 74, 50, 71, 69, 56, 50, 57, 51, 63, 61, 59, 61, 59, 60…
## $ arm         <chr> "F: FOLFOX", "A: IFL", "A: IFL", "G: IROX", "F: FOLFOX", "…
## $ sex         <fct> Male, Female, Female, Female, Female, Male, Male, Male, Fe…
## $ race        <chr> "Caucasian", "Caucasian", "Caucasian", "Caucasian", NA, "C…
## $ fu.time     <int> 922, 270, 175, 128, 233, 120, 369, 421, 387, 363, 168, 106…
## $ fu.stat     <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2…
## $ ps          <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0…
## $ hgb         <dbl> 11.5, 10.7, 11.1, 12.6, 13.0, 10.2, 13.3, 12.1, 13.8, 12.1…
## $ bmi         <dbl> 25.09861, 19.49786, NA, 29.42922, 26.35352, 19.03673, 24.5…
## $ alk.phos    <int> 160, 290, 700, 771, 350, 569, 162, 152, 231, 492, 740, 239…
## $ ast         <int> 35, 52, 100, 68, 35, 27, 16, 12, 25, 18, 45, 16, 50, 80, 1…
## $ mdquality.s <int> NA, 1, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 0, 1, 1, 0, NA,…
## $ age.ord     <ord> 60-69, 70-79, 40-49, 70-79, 60-69, 50-59, 40-49, 50-59, 50…

20.3 Baseline Characteristics

For a Table One, you do not want outcomes like fu.time (followup time), but you do want particpant characteristics that were present at the start of the study. We also don’t need trivial information like case numbers or patient IDs. Let’s select for these with dplyr::select.

mockstudy_baseline <- mockstudy %>% 
  select(-fu.time, -fu.stat, -ps, -case, -mdquality.s, -age.ord)

20.4 Building Your Table 1

The main function for building a Table 1 with gtsummary is tbl_summary(). We can pipe the selected data into this function to get a first (ungrouped) version of the table.

mockstudy_baseline |> 
  tbl_summary()

Characteristic	N = 1,499¹
Age in Years	61 (52, 68)
Treatment Arm
A: IFL	428 (29%)
F: FOLFOX	691 (46%)
G: IROX	380 (25%)
sex
Male	916 (61%)
Female	583 (39%)
Race
African-Am	115 (7.7%)
Asian	18 (1.2%)
Caucasian	1,288 (86%)
Hawaii/Pacific	5 (0.3%)
Hispanic	54 (3.6%)
Native-Am/Alaska	5 (0.3%)
Other	7 (0.5%)
Unknown	7
hgb	12.30 (11.10, 13.50)
Unknown	266
Body Mass Index (kg/m^2)	26.3 (23.5, 30.2)
Unknown	33
alk.phos	123 (86, 207)
Unknown	266
ast	27 (20, 41)
Unknown	266
¹ Median (Q1, Q3); n (%)

20.4.1 Updating Variable Labels

This very quickly gives you a pretty version of a Table 1 that is publication-ready. However, you might not like the variable names that were easy to use in analysis. You might want longer or clearer versions for the table for publication. You can use the tbl_summary() function to rename variables. You can use the label argument to rename variables. Let’s rename the variables Sex and Race in the table to make them more publication-ready.

mockstudy_baseline |> 
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"))

Characteristic	N = 1,499¹
Age in Years	61 (52, 68)
Treatment Arm
A: IFL	428 (29%)
F: FOLFOX	691 (46%)
G: IROX	380 (25%)
Sex at Birth
Male	916 (61%)
Female	583 (39%)
Race Identified by Participant
African-Am	115 (7.7%)
Asian	18 (1.2%)
Caucasian	1,288 (86%)
Hawaii/Pacific	5 (0.3%)
Hispanic	54 (3.6%)
Native-Am/Alaska	5 (0.3%)
Other	7 (0.5%)
Unknown	7
hgb	12.30 (11.10, 13.50)
Unknown	266
Body Mass Index (kg/m^2)	26.3 (23.5, 30.2)
Unknown	33
alk.phos	123 (86, 207)
Unknown	266
ast	27 (20, 41)
Unknown	266
¹ Median (Q1, Q3); n (%)

20.4.2 Updating Variable Values

The value for Race of “African-Am” is shorthand, but not great for publication. Let’s fix this to “African-American” by using a mutate() function to change the value ofrace to “African-American” when it is “African-Am”. We will do this before we pipe the data into tbl_summary().

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"))

Characteristic	N = 1,499¹
Age in Years	61 (52, 68)
Treatment Arm
A: IFL	428 (29%)
F: FOLFOX	691 (46%)
G: IROX	380 (25%)
Sex at Birth
Male	916 (61%)
Female	583 (39%)
Race Identified by Participant
African-American	115 (7.7%)
Asian	18 (1.2%)
Caucasian	1,288 (86%)
Hawaii/Pacific	5 (0.3%)
Hispanic	54 (3.6%)
Native-Am/Alaska	5 (0.3%)
Other	7 (0.5%)
Unknown	7
hgb	12.30 (11.10, 13.50)
Unknown	266
Body Mass Index (kg/m^2)	26.3 (23.5, 30.2)
Unknown	33
alk.phos	123 (86, 207)
Unknown	266
ast	27 (20, 41)
Unknown	266
¹ Median (Q1, Q3); n (%)

20.4.3 Table 1 separated by Treatment Arm

Let’s make a Table 1 that is now separated into distinct columns by Treatment Arm, allowing readers to compare the two groups and get a sense of whether the randomization was balanced. As we are making many comparisons on each characteristic of the participants, it is NOT appropriate to list p values for each comparison. We will use the tbl_summary(by = arm) argument to separate the table by treatment arm. We will also add_overall() and add_n() as these are commonly used.

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm) |> 
  add_overall() |> 
  add_n()

Characteristic	N	Overall N = 1,499¹	A: IFL N = 428¹	F: FOLFOX N = 691¹	G: IROX N = 380¹
Age in Years	1,499	61 (52, 68)	61 (53, 68)	61 (52, 69)	61 (52, 68)
Sex at Birth	1,499
Male		916 (61%)	277 (65%)	411 (59%)	228 (60%)
Female		583 (39%)	151 (35%)	280 (41%)	152 (40%)
Race Identified by Participant	1,492
African-American		115 (7.7%)	39 (9.1%)	49 (7.2%)	27 (7.1%)
Asian		18 (1.2%)	1 (0.2%)	14 (2.0%)	3 (0.8%)
Caucasian		1,288 (86%)	371 (87%)	586 (86%)	331 (87%)
Hawaii/Pacific		5 (0.3%)	1 (0.2%)	3 (0.4%)	1 (0.3%)
Hispanic		54 (3.6%)	12 (2.8%)	28 (4.1%)	14 (3.7%)
Native-Am/Alaska		5 (0.3%)	2 (0.5%)	1 (0.1%)	2 (0.5%)
Other		7 (0.5%)	2 (0.5%)	4 (0.6%)	1 (0.3%)
Unknown		7	0	6	1
hgb	1,233	12.30 (11.10, 13.50)	12.10 (11.00, 13.50)	12.20 (11.10, 13.60)	12.40 (11.15, 13.65)
Unknown		266	69	141	56
Body Mass Index (kg/m^2)	1,466	26.3 (23.5, 30.2)	26.2 (23.6, 30.6)	26.5 (23.7, 30.1)	26.0 (23.2, 29.6)
Unknown		33	9	20	4
alk.phos	1,233	123 (86, 207)	133 (89, 219)	116 (85, 195)	122 (88, 212)
Unknown		266	69	141	56
ast	1,233	27 (20, 41)	29 (21, 42)	26 (19, 40)	27 (20, 41)
Unknown		266	69	141	56
¹ Median (Q1, Q3); n (%)

20.4.4 Styling our Table 1

Let’s redefine the Unknown values as Missing (within the tbl_summary function), and bold the labels.

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm,
              missing_text = "Missing") |> 
  add_overall() |> 
  add_n() |> 
  bold_labels()

Characteristic	N	Overall N = 1,499¹	A: IFL N = 428¹	F: FOLFOX N = 691¹	G: IROX N = 380¹
Age in Years	1,499	61 (52, 68)	61 (53, 68)	61 (52, 69)	61 (52, 68)
Sex at Birth	1,499
Male		916 (61%)	277 (65%)	411 (59%)	228 (60%)
Female		583 (39%)	151 (35%)	280 (41%)	152 (40%)
Race Identified by Participant	1,492
African-American		115 (7.7%)	39 (9.1%)	49 (7.2%)	27 (7.1%)
Asian		18 (1.2%)	1 (0.2%)	14 (2.0%)	3 (0.8%)
Caucasian		1,288 (86%)	371 (87%)	586 (86%)	331 (87%)
Hawaii/Pacific		5 (0.3%)	1 (0.2%)	3 (0.4%)	1 (0.3%)
Hispanic		54 (3.6%)	12 (2.8%)	28 (4.1%)	14 (3.7%)
Native-Am/Alaska		5 (0.3%)	2 (0.5%)	1 (0.1%)	2 (0.5%)
Other		7 (0.5%)	2 (0.5%)	4 (0.6%)	1 (0.3%)
Missing		7	0	6	1
hgb	1,233	12.30 (11.10, 13.50)	12.10 (11.00, 13.50)	12.20 (11.10, 13.60)	12.40 (11.15, 13.65)
Missing		266	69	141	56
Body Mass Index (kg/m^2)	1,466	26.3 (23.5, 30.2)	26.2 (23.6, 30.6)	26.5 (23.7, 30.1)	26.0 (23.2, 29.6)
Missing		33	9	20	4
alk.phos	1,233	123 (86, 207)	133 (89, 219)	116 (85, 195)	122 (88, 212)
Missing		266	69	141	56
ast	1,233	27 (20, 41)	29 (21, 42)	26 (19, 40)	27 (20, 41)
Missing		266	69	141	56
¹ Median (Q1, Q3); n (%)

20.4.5 Adding A Column Spanner

You can add column spanners to your table to group variables together. In this case, it would make sense to identify and group the three treatment arms with a column spanner.

tbl <- mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm,
              missing_text = "Missing") |> 
  add_overall() |> 
  add_n() |> 
  bold_labels() |>
  modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Treatment Received**") 

tbl

Characteristic	N	Overall N = 1,499¹	Treatment Received
Characteristic	N	Overall N = 1,499¹	A: IFL N = 428¹	F: FOLFOX N = 691¹	G: IROX N = 380¹
Age in Years	1,499	61 (52, 68)	61 (53, 68)	61 (52, 69)	61 (52, 68)
Sex at Birth	1,499
Male		916 (61%)	277 (65%)	411 (59%)	228 (60%)
Female		583 (39%)	151 (35%)	280 (41%)	152 (40%)
Race Identified by Participant	1,492
African-American		115 (7.7%)	39 (9.1%)	49 (7.2%)	27 (7.1%)
Asian		18 (1.2%)	1 (0.2%)	14 (2.0%)	3 (0.8%)
Caucasian		1,288 (86%)	371 (87%)	586 (86%)	331 (87%)
Hawaii/Pacific		5 (0.3%)	1 (0.2%)	3 (0.4%)	1 (0.3%)
Hispanic		54 (3.6%)	12 (2.8%)	28 (4.1%)	14 (3.7%)
Native-Am/Alaska		5 (0.3%)	2 (0.5%)	1 (0.1%)	2 (0.5%)
Other		7 (0.5%)	2 (0.5%)	4 (0.6%)	1 (0.3%)
Missing		7	0	6	1
hgb	1,233	12.30 (11.10, 13.50)	12.10 (11.00, 13.50)	12.20 (11.10, 13.60)	12.40 (11.15, 13.65)
Missing		266	69	141	56
Body Mass Index (kg/m^2)	1,466	26.3 (23.5, 30.2)	26.2 (23.6, 30.6)	26.5 (23.7, 30.1)	26.0 (23.2, 29.6)
Missing		33	9	20	4
alk.phos	1,233	123 (86, 207)	133 (89, 219)	116 (85, 195)	122 (88, 212)
Missing		266	69	141	56
ast	1,233	27 (20, 41)	29 (21, 42)	26 (19, 40)	27 (20, 41)
Missing		266	69	141	56
¹ Median (Q1, Q3); n (%)

Note that the standard column names for columns in tbl_summary() are

label
n
stat_0
stat_1
stat_2
stat_3, etc.

Note that you can confirm these with show_header_names() and you can change these with modify_header() to take control of the line breaks with
and bolding with **. Let’s take a look:

show_header_names(tbl)

## Column Name   Header                       level*            N*            n*            p*             
## label         "**Characteristic**"                           1,499 <int>                                
## n             "**N**"                                                                                   
## stat_0        "**Overall**  \nN = 1,499"     Overall <chr>   1,499 <int>   1,499 <int>    1.00 <dbl>    
## stat_1        "**A: IFL**  \nN = 428"         A: IFL <chr>   1,499 <int>     428 <int>   0.286 <dbl>    
## stat_2        "**F: FOLFOX**  \nN = 691"   F: FOLFOX <chr>   1,499 <int>     691 <int>   0.461 <dbl>    
## stat_3        "**G: IROX**  \nN = 380"       G: IROX <chr>   1,499 <int>     380 <int>   0.254 <dbl>

## * These values may be dynamically placed into headers (and other locations).
## ℹ Review the `modify_header()` (`?gtsummary::modify_header()`) help for
##   examples.

tbl |> 
  modify_header((update = list(
      stat_0 ~ "**Overall**<br>N = 1,499",
      stat_1 ~ "**A: IFL**<br>N = 428",
      stat_2 ~ "**F: FOLFOX**<br>N = 691",
      stat_3 ~ "**G: IROX**<br>N = 380")
  ))

## Warning: The `update` argument of `modify_header()` is deprecated as of gtsummary 2.0.0.
## ℹ Use `modify_header(...)` input instead. Dynamic dots allow for syntax like
##   `modify_header(!!!list(...))`.
## ℹ The deprecated feature was likely used in the gtsummary package.
##   Please report the issue at <https://github.com/ddsjoberg/gtsummary/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Characteristic	N	Overall N = 1,499¹	Treatment Received
Characteristic	N	Overall N = 1,499¹	A: IFL N = 428¹	F: FOLFOX N = 691¹	G: IROX N = 380¹
Age in Years	1,499	61 (52, 68)	61 (53, 68)	61 (52, 69)	61 (52, 68)
Sex at Birth	1,499
Male		916 (61%)	277 (65%)	411 (59%)	228 (60%)
Female		583 (39%)	151 (35%)	280 (41%)	152 (40%)
Race Identified by Participant	1,492
African-American		115 (7.7%)	39 (9.1%)	49 (7.2%)	27 (7.1%)
Asian		18 (1.2%)	1 (0.2%)	14 (2.0%)	3 (0.8%)
Caucasian		1,288 (86%)	371 (87%)	586 (86%)	331 (87%)
Hawaii/Pacific		5 (0.3%)	1 (0.2%)	3 (0.4%)	1 (0.3%)
Hispanic		54 (3.6%)	12 (2.8%)	28 (4.1%)	14 (3.7%)
Native-Am/Alaska		5 (0.3%)	2 (0.5%)	1 (0.1%)	2 (0.5%)
Other		7 (0.5%)	2 (0.5%)	4 (0.6%)	1 (0.3%)
Missing		7	0	6	1
hgb	1,233	12.30 (11.10, 13.50)	12.10 (11.00, 13.50)	12.20 (11.10, 13.60)	12.40 (11.15, 13.65)
Missing		266	69	141	56
Body Mass Index (kg/m^2)	1,466	26.3 (23.5, 30.2)	26.2 (23.6, 30.6)	26.5 (23.7, 30.1)	26.0 (23.2, 29.6)
Missing		33	9	20	4
alk.phos	1,233	123 (86, 207)	133 (89, 219)	116 (85, 195)	122 (88, 212)
Missing		266	69	141	56
ast	1,233	27 (20, 41)	29 (21, 42)	26 (19, 40)	27 (20, 41)
Missing		266	69	141	56
¹ Median (Q1, Q3); n (%)

20.4.6 Further Styling our Table 1

You can use {gtsummary} functions to modify header names, footnotes, captions, the number of digits in numbers, and the stats used. See the tutorial article here for more information. The pdf cheatsheet is also downloadable at the same website under Articles/cheat sheet, and is very helpful.

You can convert this table to a gt table with as_gt() and further style it, using all of the table styling functions available in the {gt} package. Lots of details on how to style {gt} tables can be found in the {gt} package documentation.

20.4.7 Your Turn

In the last code block, we saved the table as the object tbl. Take this tbl object and

fix up the labels for hgb, alk.phos, and ast to be more publication-ready.
add a caption to the table that says “Baseline Characteristics of Participants in the Mock Study by Treatment Arm”
add a footnote to the table that says “Note: Missing values are indicated as ‘Missing’ in the table.”
improve the value labels for ‘Hawaii/Pacific’, and ‘Native-Am/Alaska’ to be “Native Hawaiian or other Pacific Islander” and “American Indian or Alaska Native” respectively.
change the number of digits in the table to 2.
change the stats to show the mean and standard deviation for continuous variables, and the count and percent for categorical variables.
convert the table to a gt table and style it with the gt package. Add some color.

The code block below starts with tbl, unmodified. Add a pipe and start fixing it up, with the goals above. Refer to the {gtsummary} and {gt} documentation for help. Click on the links below as needed.

gtsummary: tbl_summary()
gt: gt()

tbl

Characteristic	N	Overall N = 1,499¹	Treatment Received
Characteristic	N	Overall N = 1,499¹	A: IFL N = 428¹	F: FOLFOX N = 691¹	G: IROX N = 380¹
Age in Years	1,499	61 (52, 68)	61 (53, 68)	61 (52, 69)	61 (52, 68)
Sex at Birth	1,499
Male		916 (61%)	277 (65%)	411 (59%)	228 (60%)
Female		583 (39%)	151 (35%)	280 (41%)	152 (40%)
Race Identified by Participant	1,492
African-American		115 (7.7%)	39 (9.1%)	49 (7.2%)	27 (7.1%)
Asian		18 (1.2%)	1 (0.2%)	14 (2.0%)	3 (0.8%)
Caucasian		1,288 (86%)	371 (87%)	586 (86%)	331 (87%)
Hawaii/Pacific		5 (0.3%)	1 (0.2%)	3 (0.4%)	1 (0.3%)
Hispanic		54 (3.6%)	12 (2.8%)	28 (4.1%)	14 (3.7%)
Native-Am/Alaska		5 (0.3%)	2 (0.5%)	1 (0.1%)	2 (0.5%)
Other		7 (0.5%)	2 (0.5%)	4 (0.6%)	1 (0.3%)
Missing		7	0	6	1
hgb	1,233	12.30 (11.10, 13.50)	12.10 (11.00, 13.50)	12.20 (11.10, 13.60)	12.40 (11.15, 13.65)
Missing		266	69	141	56
Body Mass Index (kg/m^2)	1,466	26.3 (23.5, 30.2)	26.2 (23.6, 30.6)	26.5 (23.7, 30.1)	26.0 (23.2, 29.6)
Missing		33	9	20	4
alk.phos	1,233	123 (86, 207)	133 (89, 219)	116 (85, 195)	122 (88, 212)
Missing		266	69	141	56
ast	1,233	27 (20, 41)	29 (21, 42)	26 (19, 40)	27 (20, 41)
Missing		266	69	141	56
¹ Median (Q1, Q3); n (%)

20.5 Try this with a new dataset

Now we will use the trial dataset from the {gtsummary} package to create a Table 1. The trial dataset is a simulated dataset of a clinical trial with 200 observations and 6 variables. Run the code block below. Which are baseline variables for Table 1, and which are outcome variables? Use your R skills to:

select the baseline variables
create a Table 1, divided by treatment (trt), and
set the missing_text to “Missing”.

trial

## # A tibble: 200 × 8
##    trt      age marker stage grade response death ttdeath
##    <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
##  1 Drug A    23  0.16  T1    II           0     0    24  
##  2 Drug B     9  1.11  T2    I            1     0    24  
##  3 Drug A    31  0.277 T1    II           0     0    24  
##  4 Drug A    NA  2.07  T3    III          1     1    17.6
##  5 Drug A    51  2.77  T4    III          1     1    16.4
##  6 Drug B    39  0.613 T4    I            0     1    15.6
##  7 Drug A    37  0.354 T1    II           0     0    24  
##  8 Drug A    32  1.74  T1    I            0     1    18.4
##  9 Drug A    31  0.144 T1    II           0     0    24  
## 10 Drug B    34  0.205 T3    I            0     1    10.5
## # ℹ 190 more rows

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing")

Characteristic	Drug A N = 98¹	Drug B N = 102¹
Age	46 (37, 60)	48 (39, 56)
Missing	7	4
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)
Missing	6	4
T Stage
T1	28 (29%)	25 (25%)
T2	25 (26%)	29 (28%)
T3	22 (22%)	21 (21%)
T4	23 (23%)	27 (26%)
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
¹ Median (Q1, Q3); n (%)

20.6 Making Modifications to the trial table

Let’s do some basics, like adding a column for N, adding the overall column, and bolding the labels.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing") |> 
  add_n() |>
  add_overall() |>
  bold_labels()

Characteristic	N	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹
Age	189	47 (38, 57)	46 (37, 60)	48 (39, 56)
Missing		11	7	4
Marker Level (ng/mL)	190	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)
Missing		10	6	4
T Stage	200
T1		53 (27%)	28 (29%)	25 (25%)
T2		54 (27%)	25 (26%)	29 (28%)
T3		43 (22%)	22 (22%)	21 (21%)
T4		50 (25%)	23 (23%)	27 (26%)
Grade	200
I		68 (34%)	35 (36%)	33 (32%)
II		68 (34%)	32 (33%)	36 (35%)
III		64 (32%)	31 (32%)	33 (32%)
¹ Median (Q1, Q3); n (%)

20.7 More Modifications to the trial table

Let’s change the Headers to make them have line breaks (and no commas), change the label Characteristic to ’Participant Characteristic, and add a column spanner for the treatment arms. Use flanking double asterisks**` to bold the headers as needed.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing") |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 200",
    stat_1 ~ "**Drug A**<br>N = 98",
    stat_2 ~ "**Drug B**<br>N = 102"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")

Participant Characteristic	N	Overall N = 200¹	Treatment Arm
Participant Characteristic	N	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹
Age	189	47 (38, 57)	46 (37, 60)	48 (39, 56)
Missing		11	7	4
Marker Level (ng/mL)	190	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)
Missing		10	6	4
T Stage	200
T1		53 (27%)	28 (29%)	25 (25%)
T2		54 (27%)	25 (26%)	29 (28%)
T3		43 (22%)	22 (22%)	21 (21%)
T4		50 (25%)	23 (23%)	27 (26%)
Grade	200
I		68 (34%)	35 (36%)	33 (32%)
II		68 (34%)	32 (33%)	36 (35%)
III		64 (32%)	31 (32%)	33 (32%)
¹ Median (Q1, Q3); n (%)

20.8 Taking Control of the Stats

You can take control of what statistics are shown. Let’s see an example where you require mean and SD for all continuous variables in the table, set digits to 2, and show fractions and percentages for all categorical values. These changes are made within the tbl_summary() function. Note how the footnote changes to keep up.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing",
               statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"
    ),
    digits = all_continuous() ~ 2,
    label = grade ~ "Tumor Grade",) |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 200",
    stat_1 ~ "**Drug A**<br>N = 98",
    stat_2 ~ "**Drug B**<br>N = 102"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")

Participant Characteristic	N	Overall N = 200¹	Treatment Arm
Participant Characteristic	N	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹
Age	189	47.24 (14.31)	47.01 (14.71)	47.45 (14.01)
Missing		11	7	4
Marker Level (ng/mL)	190	0.92 (0.86)	1.02 (0.89)	0.82 (0.83)
Missing		10	6	4
T Stage	200
T1		53 / 200 (27%)	28 / 98 (29%)	25 / 102 (25%)
T2		54 / 200 (27%)	25 / 98 (26%)	29 / 102 (28%)
T3		43 / 200 (22%)	22 / 98 (22%)	21 / 102 (21%)
T4		50 / 200 (25%)	23 / 98 (23%)	27 / 102 (26%)
Tumor Grade	200
I		68 / 200 (34%)	35 / 98 (36%)	33 / 102 (32%)
II		68 / 200 (34%)	32 / 98 (33%)	36 / 102 (35%)
III		64 / 200 (32%)	31 / 98 (32%)	33 / 102 (32%)
¹ Mean (SD); n / N (%)

20.8.1 Your Turn

The original streptomycin for tuberculosis trial never had a Table 1, because it was published in 1955, when these did not exist. Let’s make one now. Gender and the baseline_x variables in the strep_tb dataset will be the baseline variables. The arm variable will be the treatment variable. Use the strep_tb dataset from the {medicaldata} package to create a Table 1. You will need to run data(strep_tb) to load the dataset. Then pipe the strep_tb dataset into a select function, then into the tbl_summary() function. Improve the table from the baseline generic version. Add details and styling that you think would be helpful to the reader, and make it publication-ready. Consider cleaning up some of the value labels with mutate statements.

strep_tb |> 
  select(gender, starts_with("baseline"), arm) |>
  tbl_summary(by = arm, missing_text = "Missing",
              label = list(
                gender ~ "Gender",
                baseline_condition ~ "Baseline Condition",
                baseline_temp ~ "Baseline Temperature",
                baseline_esr ~ "Baseline ESR",
                baseline_cavitation ~ "Baseline Cavitation"
              )) |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 107",
    stat_1 ~ "**Control**<br>N = 52",
    stat_2 ~ "**Streptomycin**<br>N = 55"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")

Participant Characteristic	N	Overall N = 107¹	Treatment Arm
Participant Characteristic	N	Overall N = 107¹	Control N = 52¹	Streptomycin N = 55¹
Gender	107
F		59 (55%)	28 (54%)	31 (56%)
M		48 (45%)	24 (46%)	24 (44%)
Baseline Condition	107
1_Good		16 (15%)	8 (15%)	8 (15%)
2_Fair		37 (35%)	20 (38%)	17 (31%)
3_Poor		54 (50%)	24 (46%)	30 (55%)
Baseline Temperature	107
1_<=98.9F/37.2C		7 (6.5%)	4 (7.7%)	3 (5.5%)
2_99-99.9F/37.3-37.7C		24 (22%)	12 (23%)	12 (22%)
2_99-99.9F/37.3-37.7C/37.3-37.7C		1 (0.9%)	0 (0%)	1 (1.8%)
3_100-100.9F/37.8-38.2C		31 (29%)	17 (33%)	14 (25%)
3_100-100.9F/37.8-38.2C/37.8-38.2C		1 (0.9%)	0 (0%)	1 (1.8%)
4_>=101F/38.3C		43 (40%)	19 (37%)	24 (44%)
Baseline ESR	106
2_11-20		5 (4.7%)	2 (3.9%)	3 (5.5%)
3_21-50		36 (34%)	20 (39%)	16 (29%)
4_51+		65 (61%)	29 (57%)	36 (65%)
Missing		1	1	0
Baseline Cavitation	107	62 (58%)	30 (58%)	32 (58%)
¹ n (%)