Chapter 18 Building Table One for a Clinical Study

In most clinical research, whether in an epidemiologic study or in a clinical trial, it is important to present a summary of the baseline characteristics of the study sample. This is often referred to as a “Table 1” or “Table One”. You generally want to provide enough information about your study sample so that a reader can determine whether your study sample is similar to the population or the specific patient they are interested in. The reader should be able to determine if the study is generalizable to their popuation or their patient of interest.

There are a number of R packates that can help you build a Table 1. These include the appropriately named {tableone} package, the {arsenal} package, and the {compareGroups} package. But since 2020, the {gtsummary} package has become the most popular package for this purpose. This is in part because it is focused on clinical applications, and because it builds on the robust table infrastructure provided by the {gt} package, which is a modern approach to building tables in R.

In this chapter, we will focus on building a Table 1 using data from the {medicaldata} package, the {gtsummary} package, and the {arsenal} package. The {gtsummary} package is also generally useful for producing other tables and tables of regression results.

18.1 Packages Needed for this Chapter:

  • {gt}
  • {gtsummary}
  • {arsenal}
  • {medicaldata}
  • {tidyverse}

18.2 Pathway for this Chapter

Let’s start by looking at the mockstudy data from the {arsenal} package. This is a mock study of treatment of cholangiocarcinoma with 3 different chemotherapy regimens. We will generate the dimensions, then glimpse the data.

dim(mockstudy)
## [1] 1499   14
glimpse(mockstudy)
## Rows: 1,499
## Columns: 14
## $ case        <int> 110754, 99706, 105271, 105001, 112263,…
## $ age         <int> 67, 74, 50, 71, 69, 56, 50, 57, 51, 63…
## $ arm         <chr> "F: FOLFOX", "A: IFL", "A: IFL", "G: I…
## $ sex         <fct> Male, Female, Female, Female, Female, …
## $ race        <chr> "Caucasian", "Caucasian", "Caucasian",…
## $ fu.time     <int> 922, 270, 175, 128, 233, 120, 369, 421…
## $ fu.stat     <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ ps          <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1,…
## $ hgb         <dbl> 11.5, 10.7, 11.1, 12.6, 13.0, 10.2, 13…
## $ bmi         <dbl> 25.09861, 19.49786, NA, 29.42922, 26.3…
## $ alk.phos    <int> 160, 290, 700, 771, 350, 569, 162, 152…
## $ ast         <int> 35, 52, 100, 68, 35, 27, 16, 12, 25, 1…
## $ mdquality.s <int> NA, 1, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA…
## $ age.ord     <ord> 60-69, 70-79, 40-49, 70-79, 60-69, 50-…

18.3 Baseline Characteristics

For a Table One, you do not want outcomes like fu.time (followup time), but you do want particpant characteristics that were present at the start of the study. We also don’t need trivial information like case numbers or patient IDs. Let’s select for these with dplyr::select.

mockstudy_baseline <- mockstudy %>% 
  select(-fu.time, -fu.stat, -ps, -case, -mdquality.s, -age.ord)

18.4 Building Your Table 1

The main function for building a Table 1 with gtsummary is tbl_summary(). We can pipe the selected data into this function to get a first (ungrouped) version of the table.

mockstudy_baseline |> 
  tbl_summary()
Characteristic N = 1,4991
Age in Years 61 (52, 68)
Treatment Arm
    A: IFL 428 (29%)
    F: FOLFOX 691 (46%)
    G: IROX 380 (25%)
sex
    Male 916 (61%)
    Female 583 (39%)
Race
    African-Am 115 (7.7%)
    Asian 18 (1.2%)
    Caucasian 1,288 (86%)
    Hawaii/Pacific 5 (0.3%)
    Hispanic 54 (3.6%)
    Native-Am/Alaska 5 (0.3%)
    Other 7 (0.5%)
    Unknown 7
hgb 12.30 (11.10, 13.50)
    Unknown 266
Body Mass Index (kg/m^2) 26.3 (23.5, 30.1)
    Unknown 33
alk.phos 123 (86, 207)
    Unknown 266
ast 27 (20, 41)
    Unknown 266
1 Median (IQR); n (%)

18.4.1 Updating Variable Labels

This very quickly gives you a pretty version of a Table 1 that is publication-ready. However, you might not like the variable names that were easy to use in analysis. You might want longer or clearer versions for the table for publication. You can use the tbl_summary() function to rename variables. You can use the label argument to rename variables. Let’s rename the variables Sex and Race in the table to make them more publication-ready.

mockstudy_baseline |> 
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"))
Characteristic N = 1,4991
Age in Years 61 (52, 68)
Treatment Arm
    A: IFL 428 (29%)
    F: FOLFOX 691 (46%)
    G: IROX 380 (25%)
Sex at Birth
    Male 916 (61%)
    Female 583 (39%)
Race Identified by Participant
    African-Am 115 (7.7%)
    Asian 18 (1.2%)
    Caucasian 1,288 (86%)
    Hawaii/Pacific 5 (0.3%)
    Hispanic 54 (3.6%)
    Native-Am/Alaska 5 (0.3%)
    Other 7 (0.5%)
    Unknown 7
hgb 12.30 (11.10, 13.50)
    Unknown 266
Body Mass Index (kg/m^2) 26.3 (23.5, 30.1)
    Unknown 33
alk.phos 123 (86, 207)
    Unknown 266
ast 27 (20, 41)
    Unknown 266
1 Median (IQR); n (%)

18.4.2 Updating Variable Values

The value for Race of “African-Am” is shorthand, but not great for publication. Let’s fix this to “African-American” by using a mutate() function to change the value ofrace to “African-American” when it is “African-Am”. We will do this before we pipe the data into tbl_summary().

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"))
Characteristic N = 1,4991
Age in Years 61 (52, 68)
Treatment Arm
    A: IFL 428 (29%)
    F: FOLFOX 691 (46%)
    G: IROX 380 (25%)
Sex at Birth
    Male 916 (61%)
    Female 583 (39%)
Race Identified by Participant
    African-American 115 (7.7%)
    Asian 18 (1.2%)
    Caucasian 1,288 (86%)
    Hawaii/Pacific 5 (0.3%)
    Hispanic 54 (3.6%)
    Native-Am/Alaska 5 (0.3%)
    Other 7 (0.5%)
    Unknown 7
hgb 12.30 (11.10, 13.50)
    Unknown 266
Body Mass Index (kg/m^2) 26.3 (23.5, 30.1)
    Unknown 33
alk.phos 123 (86, 207)
    Unknown 266
ast 27 (20, 41)
    Unknown 266
1 Median (IQR); n (%)

18.4.3 Table 1 separated by Treatment Arm

Let’s make a Table 1 that is now separated into distinct columns by Treatment Arm, allowing readers to compare the two groups and get a sense of whether the randomization was balanced. As we are making many comparisons on each characteristic of the participants, it is NOT appropriate to list p values for each comparison. We will use the tbl_summary(by = arm) argument to separate the table by treatment arm. We will also add_overall() and add_n() as these are commonly used.

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm) |> 
  add_overall() |> 
  add_n()
Characteristic N Overall, N = 1,4991 A: IFL, N = 4281 F: FOLFOX, N = 6911 G: IROX, N = 3801
Age in Years 1,499 61 (52, 68) 61 (53, 68) 61 (52, 69) 61 (52, 68)
Sex at Birth 1,499



    Male
916 (61%) 277 (65%) 411 (59%) 228 (60%)
    Female
583 (39%) 151 (35%) 280 (41%) 152 (40%)
Race Identified by Participant 1,492



    African-American
115 (7.7%) 39 (9.1%) 49 (7.2%) 27 (7.1%)
    Asian
18 (1.2%) 1 (0.2%) 14 (2.0%) 3 (0.8%)
    Caucasian
1,288 (86%) 371 (87%) 586 (86%) 331 (87%)
    Hawaii/Pacific
5 (0.3%) 1 (0.2%) 3 (0.4%) 1 (0.3%)
    Hispanic
54 (3.6%) 12 (2.8%) 28 (4.1%) 14 (3.7%)
    Native-Am/Alaska
5 (0.3%) 2 (0.5%) 1 (0.1%) 2 (0.5%)
    Other
7 (0.5%) 2 (0.5%) 4 (0.6%) 1 (0.3%)
    Unknown
7 0 6 1
hgb 1,233 12.30 (11.10, 13.50) 12.10 (11.00, 13.45) 12.20 (11.10, 13.60) 12.40 (11.18, 13.63)
    Unknown
266 69 141 56
Body Mass Index (kg/m^2) 1,466 26.3 (23.5, 30.1) 26.2 (23.6, 30.6) 26.5 (23.7, 30.1) 26.0 (23.2, 29.6)
    Unknown
33 9 20 4
alk.phos 1,233 123 (86, 207) 133 (89, 217) 116 (85, 195) 122 (88, 210)
    Unknown
266 69 141 56
ast 1,233 27 (20, 41) 29 (21, 42) 26 (19, 40) 27 (20, 41)
    Unknown
266 69 141 56
1 Median (IQR); n (%)

18.4.4 Styling our Table 1

Let’s redefine the Unknown values as Missing (within the tbl_summary function), and bold the labels.

mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm,
              missing_text = "Missing") |> 
  add_overall() |> 
  add_n() |> 
  bold_labels() 
Characteristic N Overall, N = 1,4991 A: IFL, N = 4281 F: FOLFOX, N = 6911 G: IROX, N = 3801
Age in Years 1,499 61 (52, 68) 61 (53, 68) 61 (52, 69) 61 (52, 68)
Sex at Birth 1,499



    Male
916 (61%) 277 (65%) 411 (59%) 228 (60%)
    Female
583 (39%) 151 (35%) 280 (41%) 152 (40%)
Race Identified by Participant 1,492



    African-American
115 (7.7%) 39 (9.1%) 49 (7.2%) 27 (7.1%)
    Asian
18 (1.2%) 1 (0.2%) 14 (2.0%) 3 (0.8%)
    Caucasian
1,288 (86%) 371 (87%) 586 (86%) 331 (87%)
    Hawaii/Pacific
5 (0.3%) 1 (0.2%) 3 (0.4%) 1 (0.3%)
    Hispanic
54 (3.6%) 12 (2.8%) 28 (4.1%) 14 (3.7%)
    Native-Am/Alaska
5 (0.3%) 2 (0.5%) 1 (0.1%) 2 (0.5%)
    Other
7 (0.5%) 2 (0.5%) 4 (0.6%) 1 (0.3%)
    Missing
7 0 6 1
hgb 1,233 12.30 (11.10, 13.50) 12.10 (11.00, 13.45) 12.20 (11.10, 13.60) 12.40 (11.18, 13.63)
    Missing
266 69 141 56
Body Mass Index (kg/m^2) 1,466 26.3 (23.5, 30.1) 26.2 (23.6, 30.6) 26.5 (23.7, 30.1) 26.0 (23.2, 29.6)
    Missing
33 9 20 4
alk.phos 1,233 123 (86, 207) 133 (89, 217) 116 (85, 195) 122 (88, 210)
    Missing
266 69 141 56
ast 1,233 27 (20, 41) 29 (21, 42) 26 (19, 40) 27 (20, 41)
    Missing
266 69 141 56
1 Median (IQR); n (%)

18.4.5 Adding A Column Spanner

You can add column spanners to your table to group variables together. In this case, it would make sense to identify and group the three treatment arms with a column spanner.

tbl <- mockstudy_baseline |> 
  mutate(race = case_when(race == "African-Am" ~ "African-American",
                          .default = race)) |>
  tbl_summary(label = list( sex ~ "Sex at Birth",
                            race ~ "Race Identified by Participant"),
              by = arm,
              missing_text = "Missing") |> 
  add_overall() |> 
  add_n() |> 
  bold_labels() |>
  modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Treatment Received**") 

tbl
Characteristic N Overall, N = 1,4991 Treatment Received
A: IFL, N = 4281 F: FOLFOX, N = 6911 G: IROX, N = 3801
Age in Years 1,499 61 (52, 68) 61 (53, 68) 61 (52, 69) 61 (52, 68)
Sex at Birth 1,499



    Male
916 (61%) 277 (65%) 411 (59%) 228 (60%)
    Female
583 (39%) 151 (35%) 280 (41%) 152 (40%)
Race Identified by Participant 1,492



    African-American
115 (7.7%) 39 (9.1%) 49 (7.2%) 27 (7.1%)
    Asian
18 (1.2%) 1 (0.2%) 14 (2.0%) 3 (0.8%)
    Caucasian
1,288 (86%) 371 (87%) 586 (86%) 331 (87%)
    Hawaii/Pacific
5 (0.3%) 1 (0.2%) 3 (0.4%) 1 (0.3%)
    Hispanic
54 (3.6%) 12 (2.8%) 28 (4.1%) 14 (3.7%)
    Native-Am/Alaska
5 (0.3%) 2 (0.5%) 1 (0.1%) 2 (0.5%)
    Other
7 (0.5%) 2 (0.5%) 4 (0.6%) 1 (0.3%)
    Missing
7 0 6 1
hgb 1,233 12.30 (11.10, 13.50) 12.10 (11.00, 13.45) 12.20 (11.10, 13.60) 12.40 (11.18, 13.63)
    Missing
266 69 141 56
Body Mass Index (kg/m^2) 1,466 26.3 (23.5, 30.1) 26.2 (23.6, 30.6) 26.5 (23.7, 30.1) 26.0 (23.2, 29.6)
    Missing
33 9 20 4
alk.phos 1,233 123 (86, 207) 133 (89, 217) 116 (85, 195) 122 (88, 210)
    Missing
266 69 141 56
ast 1,233 27 (20, 41) 29 (21, 42) 26 (19, 40) 27 (20, 41)
    Missing
266 69 141 56
1 Median (IQR); n (%)

Note that the standard column names for columns in tbl_summary() are - label - n - stat_0 - stat_1 - stat_2 - stat_3, etc. - you can confirm these with show_header_names() and you can change these with modify_header() to take control of the line breaks with
and bolding with **. Let’s take a look:

show_header_names(tbl)
## ℹ As a usage guide, the code below re-creates the current column headers.
## modify_header(
##   label = '**Characteristic**',
##   n = '**N**',
##   stat_0 = '**Overall**, N = 1,499',
##   stat_1 = '**A: IFL**, N = 428',
##   stat_2 = '**F: FOLFOX**, N = 691',
##   stat_3 = '**G: IROX**, N = 380'
## )
## 
## 
## Column Name   Column Header          
## ------------  -----------------------
## label         **Characteristic**     
## n             **N**                  
## stat_0        **Overall**, N = 1,499 
## stat_1        **A: IFL**, N = 428    
## stat_2        **F: FOLFOX**, N = 691 
## stat_3        **G: IROX**, N = 380
tbl |> 
  modify_header((update = list(
      stat_0 ~ "**Overall**<br>N = 1,499",
      stat_1 ~ "**A: IFL**<br>N = 428",
      stat_2 ~ "**F: FOLFOX**<br>N = 691",
      stat_3 ~ "**G: IROX**<br>N = 380")
  ))
Characteristic N Overall
N = 1,499
1
Treatment Received
A: IFL
N = 428
1
F: FOLFOX
N = 691
1
G: IROX
N = 380
1
Age in Years 1,499 61 (52, 68) 61 (53, 68) 61 (52, 69) 61 (52, 68)
Sex at Birth 1,499



    Male
916 (61%) 277 (65%) 411 (59%) 228 (60%)
    Female
583 (39%) 151 (35%) 280 (41%) 152 (40%)
Race Identified by Participant 1,492



    African-American
115 (7.7%) 39 (9.1%) 49 (7.2%) 27 (7.1%)
    Asian
18 (1.2%) 1 (0.2%) 14 (2.0%) 3 (0.8%)
    Caucasian
1,288 (86%) 371 (87%) 586 (86%) 331 (87%)
    Hawaii/Pacific
5 (0.3%) 1 (0.2%) 3 (0.4%) 1 (0.3%)
    Hispanic
54 (3.6%) 12 (2.8%) 28 (4.1%) 14 (3.7%)
    Native-Am/Alaska
5 (0.3%) 2 (0.5%) 1 (0.1%) 2 (0.5%)
    Other
7 (0.5%) 2 (0.5%) 4 (0.6%) 1 (0.3%)
    Missing
7 0 6 1
hgb 1,233 12.30 (11.10, 13.50) 12.10 (11.00, 13.45) 12.20 (11.10, 13.60) 12.40 (11.18, 13.63)
    Missing
266 69 141 56
Body Mass Index (kg/m^2) 1,466 26.3 (23.5, 30.1) 26.2 (23.6, 30.6) 26.5 (23.7, 30.1) 26.0 (23.2, 29.6)
    Missing
33 9 20 4
alk.phos 1,233 123 (86, 207) 133 (89, 217) 116 (85, 195) 122 (88, 210)
    Missing
266 69 141 56
ast 1,233 27 (20, 41) 29 (21, 42) 26 (19, 40) 27 (20, 41)
    Missing
266 69 141 56
1 Median (IQR); n (%)

18.4.6 Further Styling our Table 1

You can use {gtsummary} functions to modify header names, footnotes, captions, the number of digits in numbers, and the stats used. See the tutorial article here for more information. The pdf cheatsheet is also downloadable at the same website under Articles/cheat sheet, and is very helpful.

You can convert this table to a gt table with as_gt() and further style it, using all of the table styling functions available in the {gt} package. Lots of details on how to style {gt} tables can be found in the {gt} package documentation.

18.4.7 Your Turn

In the last code block, we saved the table as the object tbl. Take this tbl object and

  • fix up the labels for hgb, alk.phos, and ast to be more publication-ready.
  • add a caption to the table that says “Baseline Characteristics of Participants in the Mock Study by Treatment Arm”
  • add a footnote to the table that says “Note: Missing values are indicated as ‘Missing’ in the table.”
  • improve the value labels for ‘Hawaii/Pacific’, and ‘Native-Am/Alaska’ to be “Native Hawaiian or other Pacific Islander” and “American Indian or Alaska Native” respectively.
  • change the number of digits in the table to 2.
  • change the stats to show the mean and standard deviation for continuous variables, and the count and percent for categorical variables.
  • convert the table to a gt table and style it with the gt package. Add some color.

The code block below starts with tbl, unmodified. Add a pipe and start fixing it up, with the goals above. Refer to the {gtsummary} and {gt} documentation for help. Click on the links below as needed.

tbl
Characteristic N Overall, N = 1,4991 Treatment Received
A: IFL, N = 4281 F: FOLFOX, N = 6911 G: IROX, N = 3801
Age in Years 1,499 61 (52, 68) 61 (53, 68) 61 (52, 69) 61 (52, 68)
Sex at Birth 1,499



    Male
916 (61%) 277 (65%) 411 (59%) 228 (60%)
    Female
583 (39%) 151 (35%) 280 (41%) 152 (40%)
Race Identified by Participant 1,492



    African-American
115 (7.7%) 39 (9.1%) 49 (7.2%) 27 (7.1%)
    Asian
18 (1.2%) 1 (0.2%) 14 (2.0%) 3 (0.8%)
    Caucasian
1,288 (86%) 371 (87%) 586 (86%) 331 (87%)
    Hawaii/Pacific
5 (0.3%) 1 (0.2%) 3 (0.4%) 1 (0.3%)
    Hispanic
54 (3.6%) 12 (2.8%) 28 (4.1%) 14 (3.7%)
    Native-Am/Alaska
5 (0.3%) 2 (0.5%) 1 (0.1%) 2 (0.5%)
    Other
7 (0.5%) 2 (0.5%) 4 (0.6%) 1 (0.3%)
    Missing
7 0 6 1
hgb 1,233 12.30 (11.10, 13.50) 12.10 (11.00, 13.45) 12.20 (11.10, 13.60) 12.40 (11.18, 13.63)
    Missing
266 69 141 56
Body Mass Index (kg/m^2) 1,466 26.3 (23.5, 30.1) 26.2 (23.6, 30.6) 26.5 (23.7, 30.1) 26.0 (23.2, 29.6)
    Missing
33 9 20 4
alk.phos 1,233 123 (86, 207) 133 (89, 217) 116 (85, 195) 122 (88, 210)
    Missing
266 69 141 56
ast 1,233 27 (20, 41) 29 (21, 42) 26 (19, 40) 27 (20, 41)
    Missing
266 69 141 56
1 Median (IQR); n (%)

18.5 Try this with a new dataset

Now we will use the trial dataset from the {gtsummary} package to create a Table 1. The trial dataset is a simulated dataset of a clinical trial with 200 observations and 6 variables. Run the code block below. Which are baseline variables for Table 1, and which are outcome variables? Use your R skills to:

  • select the baseline variables
  • create a Table 1, divided by treatment (trt), and
  • set the missing_text to “Missing”.
trial
## # A tibble: 200 × 8
##    trt      age marker stage grade response death ttdeath
##    <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
##  1 Drug A    23  0.16  T1    II           0     0    24  
##  2 Drug B     9  1.11  T2    I            1     0    24  
##  3 Drug A    31  0.277 T1    II           0     0    24  
##  4 Drug A    NA  2.07  T3    III          1     1    17.6
##  5 Drug A    51  2.77  T4    III          1     1    16.4
##  6 Drug B    39  0.613 T4    I            0     1    15.6
##  7 Drug A    37  0.354 T1    II           0     0    24  
##  8 Drug A    32  1.74  T1    I            0     1    18.4
##  9 Drug A    31  0.144 T1    II           0     0    24  
## 10 Drug B    34  0.205 T3    I            0     1    10.5
## # ℹ 190 more rows
trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing")
Characteristic Drug A, N = 981 Drug B, N = 1021
Age 46 (37, 59) 48 (39, 56)
    Missing 7 4
Marker Level (ng/mL) 0.84 (0.24, 1.57) 0.52 (0.19, 1.20)
    Missing 6 4
T Stage

    T1 28 (29%) 25 (25%)
    T2 25 (26%) 29 (28%)
    T3 22 (22%) 21 (21%)
    T4 23 (23%) 27 (26%)
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
1 Median (IQR); n (%)

18.6 Making Modifications to the trial table

Let’s do some basics, like adding a column for N, adding the overall column, and bolding the labels.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing") |> 
  add_n() |>
  add_overall() |>
  bold_labels()
Characteristic N Overall, N = 2001 Drug A, N = 981 Drug B, N = 1021
Age 189 47 (38, 57) 46 (37, 59) 48 (39, 56)
    Missing
11 7 4
Marker Level (ng/mL) 190 0.64 (0.22, 1.39) 0.84 (0.24, 1.57) 0.52 (0.19, 1.20)
    Missing
10 6 4
T Stage 200


    T1
53 (27%) 28 (29%) 25 (25%)
    T2
54 (27%) 25 (26%) 29 (28%)
    T3
43 (22%) 22 (22%) 21 (21%)
    T4
50 (25%) 23 (23%) 27 (26%)
Grade 200


    I
68 (34%) 35 (36%) 33 (32%)
    II
68 (34%) 32 (33%) 36 (35%)
    III
64 (32%) 31 (32%) 33 (32%)
1 Median (IQR); n (%)

18.7 More Modifications to the trial table

Let’s change the Headers to make them have line breaks (and no commas), change the label Characteristic to ’Participant Characteristic, and add a column spanner for the treatment arms. Use flanking double asterisks**` to bold the headers as needed.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing") |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 200",
    stat_1 ~ "**Drug A**<br>N = 98",
    stat_2 ~ "**Drug B**<br>N = 102"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**") 
Participant
Characteristic
N Overall
N = 200
1
Treatment Arm
Drug A
N = 98
1
Drug B
N = 102
1
Age 189 47 (38, 57) 46 (37, 59) 48 (39, 56)
    Missing
11 7 4
Marker Level (ng/mL) 190 0.64 (0.22, 1.39) 0.84 (0.24, 1.57) 0.52 (0.19, 1.20)
    Missing
10 6 4
T Stage 200


    T1
53 (27%) 28 (29%) 25 (25%)
    T2
54 (27%) 25 (26%) 29 (28%)
    T3
43 (22%) 22 (22%) 21 (21%)
    T4
50 (25%) 23 (23%) 27 (26%)
Grade 200


    I
68 (34%) 35 (36%) 33 (32%)
    II
68 (34%) 32 (33%) 36 (35%)
    III
64 (32%) 31 (32%) 33 (32%)
1 Median (IQR); n (%)

18.8 Taking Control of the Stats

You can take control of what statistics are shown. Let’s see an example where you require mean and SD for all continuous variables in the table, set digits to 2, and show fractions and percentages for all categorical values. These changes are made within the tbl_summary() function. Note how the footnote changes to keep up.

trial |> 
  select(age, marker, stage, grade, trt) |>
  tbl_summary(by = trt, missing_text = "Missing",
               statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"
    ),
    digits = all_continuous() ~ 2,
    label = grade ~ "Tumor Grade",) |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 200",
    stat_1 ~ "**Drug A**<br>N = 98",
    stat_2 ~ "**Drug B**<br>N = 102"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**") 
Participant
Characteristic
N Overall
N = 200
1
Treatment Arm
Drug A
N = 98
1
Drug B
N = 102
1
Age 189 47.24 (14.31) 47.01 (14.71) 47.45 (14.01)
    Missing
11 7 4
Marker Level (ng/mL) 190 0.92 (0.86) 1.02 (0.89) 0.82 (0.83)
    Missing
10 6 4
T Stage 200


    T1
53 / 200 (27%) 28 / 98 (29%) 25 / 102 (25%)
    T2
54 / 200 (27%) 25 / 98 (26%) 29 / 102 (28%)
    T3
43 / 200 (22%) 22 / 98 (22%) 21 / 102 (21%)
    T4
50 / 200 (25%) 23 / 98 (23%) 27 / 102 (26%)
Tumor Grade 200


    I
68 / 200 (34%) 35 / 98 (36%) 33 / 102 (32%)
    II
68 / 200 (34%) 32 / 98 (33%) 36 / 102 (35%)
    III
64 / 200 (32%) 31 / 98 (32%) 33 / 102 (32%)
1 Mean (SD); n / N (%)

18.8.1 Your Turn

The original streptomycin for tuberculosis trial never had a Table 1, because it was published in 1955, when these did not exist. Let’s make one now. Gender and the baseline_x variables in the strep_tb dataset will be the baseline variables. The arm variable will be the treatment variable. Use the strep_tb dataset from the {medicaldata} package to create a Table 1. You will need to run data(strep_tb) to load the dataset. Then pipe the strep_tb dataset into a select function, then into the tbl_summary() function. Improve the table from the baseline generic version. Add details and styling that you think would be helpful to the reader, and make it publication-ready. Consider cleaning up some of the value labels with mutate statements.

strep_tb |> 
  select(gender, starts_with("baseline"), arm) |>
  tbl_summary(by = arm, missing_text = "Missing",
              label = list(
                gender ~ "Gender",
                baseline_condition ~ "Baseline Condition",
                baseline_temp ~ "Baseline Temperature",
                baseline_esr ~ "Baseline ESR",
                baseline_cavitation ~ "Baseline Cavitation"
              )) |> 
  add_n() |>
  add_overall() |>
  bold_labels() |> 
  modify_header(update = list(
    label ~ "**Participant<br>Characteristic**",
    stat_0 ~ "**Overall**<br>N = 107",
    stat_1 ~ "**Control**<br>N = 52",
    stat_2 ~ "**Streptomycin**<br>N = 55"
  )) |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**") 
Participant
Characteristic
N Overall
N = 107
1
Treatment Arm
Control
N = 52
1
Streptomycin
N = 55
1
Gender 107


    F
59 (55%) 28 (54%) 31 (56%)
    M
48 (45%) 24 (46%) 24 (44%)
Baseline Condition 107


    1_Good
16 (15%) 8 (15%) 8 (15%)
    2_Fair
37 (35%) 20 (38%) 17 (31%)
    3_Poor
54 (50%) 24 (46%) 30 (55%)
Baseline Temperature 107


    1_<=98.9F/37.2C
7 (6.5%) 4 (7.7%) 3 (5.5%)
    2_99-99.9F/37.3-37.7C
24 (22%) 12 (23%) 12 (22%)
    2_99-99.9F/37.3-37.7C/37.3-37.7C
1 (0.9%) 0 (0%) 1 (1.8%)
    3_100-100.9F/37.8-38.2C
31 (29%) 17 (33%) 14 (25%)
    3_100-100.9F/37.8-38.2C/37.8-38.2C
1 (0.9%) 0 (0%) 1 (1.8%)
    4_>=101F/38.3C
43 (40%) 19 (37%) 24 (44%)
Baseline ESR 106


    2_11-20
5 (4.7%) 2 (3.9%) 3 (5.5%)
    3_21-50
36 (34%) 20 (39%) 16 (29%)
    4_51+
65 (61%) 29 (57%) 36 (65%)
    Missing
1 1 0
Baseline Cavitation 107 62 (58%) 30 (58%) 32 (58%)
1 n (%)