Chapter 18 Building Table One for a Clinical Study
In most clinical research, whether in an epidemiologic study or in a clinical trial, it is important to present a summary of the baseline characteristics of the study sample. This is often referred to as a “Table 1” or “Table One”. You generally want to provide enough information about your study sample so that a reader can determine whether your study sample is similar to the population or the specific patient they are interested in. The reader should be able to determine if the study is generalizable to their popuation or their patient of interest.
There are a number of R packates that can help you build a Table 1. These include
the appropriately named {tableone}
package, the {arsenal}
package, and the {compareGroups}
package. But since 2020, the {gtsummary}
package has become the most popular package for this purpose. This is in part because it is focused on clinical applications, and because it builds on the robust table infrastructure provided by
the {gt}
package, which is a modern approach to building tables in R.
In this chapter, we will focus on building a Table 1 using data from the {medicaldata}
package, the {gtsummary}
package, and the {arsenal}
package.
The {gtsummary}
package is also generally useful for producing other tables and
tables of regression results.
18.2 Pathway for this Chapter
Let’s start by looking at the mockstudy data from the {arsenal}
package.
This is a mock study of treatment of cholangiocarcinoma with 3 different
chemotherapy regimens.
We will generate the dimensions, then glimpse
the data.
## [1] 1499 14
## Rows: 1,499
## Columns: 14
## $ case <int> 110754, 99706, 105271, 105001, 112263,…
## $ age <int> 67, 74, 50, 71, 69, 56, 50, 57, 51, 63…
## $ arm <chr> "F: FOLFOX", "A: IFL", "A: IFL", "G: I…
## $ sex <fct> Male, Female, Female, Female, Female, …
## $ race <chr> "Caucasian", "Caucasian", "Caucasian",…
## $ fu.time <int> 922, 270, 175, 128, 233, 120, 369, 421…
## $ fu.stat <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ ps <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1,…
## $ hgb <dbl> 11.5, 10.7, 11.1, 12.6, 13.0, 10.2, 13…
## $ bmi <dbl> 25.09861, 19.49786, NA, 29.42922, 26.3…
## $ alk.phos <int> 160, 290, 700, 771, 350, 569, 162, 152…
## $ ast <int> 35, 52, 100, 68, 35, 27, 16, 12, 25, 1…
## $ mdquality.s <int> NA, 1, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA…
## $ age.ord <ord> 60-69, 70-79, 40-49, 70-79, 60-69, 50-…
18.3 Baseline Characteristics
For a Table One, you do not want outcomes like fu.time (followup time), but you do want particpant characteristics that were present at the start of the study. We also don’t need trivial information like case numbers or patient IDs. Let’s select for these with dplyr::select.
18.4 Building Your Table 1
The main function for building a Table 1 with gtsummary is tbl_summary()
.
We can pipe the selected data into this function to get a first (ungrouped) version of the
table.
Characteristic | N = 1,4991 |
---|---|
Age in Years | 61 (52, 68) |
Treatment Arm | |
A: IFL | 428 (29%) |
F: FOLFOX | 691 (46%) |
G: IROX | 380 (25%) |
sex | |
Male | 916 (61%) |
Female | 583 (39%) |
Race | |
African-Am | 115 (7.7%) |
Asian | 18 (1.2%) |
Caucasian | 1,288 (86%) |
Hawaii/Pacific | 5 (0.3%) |
Hispanic | 54 (3.6%) |
Native-Am/Alaska | 5 (0.3%) |
Other | 7 (0.5%) |
Unknown | 7 |
hgb | 12.30 (11.10, 13.50) |
Unknown | 266 |
Body Mass Index (kg/m^2) | 26.3 (23.5, 30.1) |
Unknown | 33 |
alk.phos | 123 (86, 207) |
Unknown | 266 |
ast | 27 (20, 41) |
Unknown | 266 |
1 Median (IQR); n (%) |
18.4.1 Updating Variable Labels
This very quickly gives you a pretty version of a Table 1 that is publication-ready. However, you might not like the variable names that were easy to use in analysis.
You might want longer or clearer versions for the table for publication. You can use the tbl_summary()
function to rename variables. You can use the label
argument to rename variables. Let’s rename the variables Sex and Race in the table to make them more publication-ready.
mockstudy_baseline |>
tbl_summary(label = list( sex ~ "Sex at Birth",
race ~ "Race Identified by Participant"))
Characteristic | N = 1,4991 |
---|---|
Age in Years | 61 (52, 68) |
Treatment Arm | |
A: IFL | 428 (29%) |
F: FOLFOX | 691 (46%) |
G: IROX | 380 (25%) |
Sex at Birth | |
Male | 916 (61%) |
Female | 583 (39%) |
Race Identified by Participant | |
African-Am | 115 (7.7%) |
Asian | 18 (1.2%) |
Caucasian | 1,288 (86%) |
Hawaii/Pacific | 5 (0.3%) |
Hispanic | 54 (3.6%) |
Native-Am/Alaska | 5 (0.3%) |
Other | 7 (0.5%) |
Unknown | 7 |
hgb | 12.30 (11.10, 13.50) |
Unknown | 266 |
Body Mass Index (kg/m^2) | 26.3 (23.5, 30.1) |
Unknown | 33 |
alk.phos | 123 (86, 207) |
Unknown | 266 |
ast | 27 (20, 41) |
Unknown | 266 |
1 Median (IQR); n (%) |
18.4.2 Updating Variable Values
The value for Race of “African-Am” is shorthand, but not great for publication.
Let’s fix this to “African-American” by using a mutate()
function to change the value ofrace
to “African-American” when it is “African-Am”. We will do this before we pipe the data into tbl_summary()
.
mockstudy_baseline |>
mutate(race = case_when(race == "African-Am" ~ "African-American",
.default = race)) |>
tbl_summary(label = list( sex ~ "Sex at Birth",
race ~ "Race Identified by Participant"))
Characteristic | N = 1,4991 |
---|---|
Age in Years | 61 (52, 68) |
Treatment Arm | |
A: IFL | 428 (29%) |
F: FOLFOX | 691 (46%) |
G: IROX | 380 (25%) |
Sex at Birth | |
Male | 916 (61%) |
Female | 583 (39%) |
Race Identified by Participant | |
African-American | 115 (7.7%) |
Asian | 18 (1.2%) |
Caucasian | 1,288 (86%) |
Hawaii/Pacific | 5 (0.3%) |
Hispanic | 54 (3.6%) |
Native-Am/Alaska | 5 (0.3%) |
Other | 7 (0.5%) |
Unknown | 7 |
hgb | 12.30 (11.10, 13.50) |
Unknown | 266 |
Body Mass Index (kg/m^2) | 26.3 (23.5, 30.1) |
Unknown | 33 |
alk.phos | 123 (86, 207) |
Unknown | 266 |
ast | 27 (20, 41) |
Unknown | 266 |
1 Median (IQR); n (%) |
18.4.3 Table 1 separated by Treatment Arm
Let’s make a Table 1 that is now separated into distinct columns by Treatment Arm,
allowing readers to compare the two groups and get a sense of whether the randomization
was balanced. As we are making many comparisons on each characteristic of the participants,
it is NOT appropriate to list p values for each comparison. We will use the tbl_summary(by = arm)
argument to separate the table by treatment arm.
We will also add_overall()
and add_n()
as these are commonly used.
mockstudy_baseline |>
mutate(race = case_when(race == "African-Am" ~ "African-American",
.default = race)) |>
tbl_summary(label = list( sex ~ "Sex at Birth",
race ~ "Race Identified by Participant"),
by = arm) |>
add_overall() |>
add_n()
Characteristic | N | Overall, N = 1,4991 | A: IFL, N = 4281 | F: FOLFOX, N = 6911 | G: IROX, N = 3801 |
---|---|---|---|---|---|
Age in Years | 1,499 | 61 (52, 68) | 61 (53, 68) | 61 (52, 69) | 61 (52, 68) |
Sex at Birth | 1,499 | ||||
Male | 916 (61%) | 277 (65%) | 411 (59%) | 228 (60%) | |
Female | 583 (39%) | 151 (35%) | 280 (41%) | 152 (40%) | |
Race Identified by Participant | 1,492 | ||||
African-American | 115 (7.7%) | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | |
Asian | 18 (1.2%) | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | |
Caucasian | 1,288 (86%) | 371 (87%) | 586 (86%) | 331 (87%) | |
Hawaii/Pacific | 5 (0.3%) | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | |
Hispanic | 54 (3.6%) | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | |
Native-Am/Alaska | 5 (0.3%) | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | |
Other | 7 (0.5%) | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | |
Unknown | 7 | 0 | 6 | 1 | |
hgb | 1,233 | 12.30 (11.10, 13.50) | 12.10 (11.00, 13.45) | 12.20 (11.10, 13.60) | 12.40 (11.18, 13.63) |
Unknown | 266 | 69 | 141 | 56 | |
Body Mass Index (kg/m^2) | 1,466 | 26.3 (23.5, 30.1) | 26.2 (23.6, 30.6) | 26.5 (23.7, 30.1) | 26.0 (23.2, 29.6) |
Unknown | 33 | 9 | 20 | 4 | |
alk.phos | 1,233 | 123 (86, 207) | 133 (89, 217) | 116 (85, 195) | 122 (88, 210) |
Unknown | 266 | 69 | 141 | 56 | |
ast | 1,233 | 27 (20, 41) | 29 (21, 42) | 26 (19, 40) | 27 (20, 41) |
Unknown | 266 | 69 | 141 | 56 | |
1 Median (IQR); n (%) |
18.4.4 Styling our Table 1
Let’s redefine the Unknown values as Missing (within the tbl_summary function), and bold the labels.
mockstudy_baseline |>
mutate(race = case_when(race == "African-Am" ~ "African-American",
.default = race)) |>
tbl_summary(label = list( sex ~ "Sex at Birth",
race ~ "Race Identified by Participant"),
by = arm,
missing_text = "Missing") |>
add_overall() |>
add_n() |>
bold_labels()
Characteristic | N | Overall, N = 1,4991 | A: IFL, N = 4281 | F: FOLFOX, N = 6911 | G: IROX, N = 3801 |
---|---|---|---|---|---|
Age in Years | 1,499 | 61 (52, 68) | 61 (53, 68) | 61 (52, 69) | 61 (52, 68) |
Sex at Birth | 1,499 | ||||
Male | 916 (61%) | 277 (65%) | 411 (59%) | 228 (60%) | |
Female | 583 (39%) | 151 (35%) | 280 (41%) | 152 (40%) | |
Race Identified by Participant | 1,492 | ||||
African-American | 115 (7.7%) | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | |
Asian | 18 (1.2%) | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | |
Caucasian | 1,288 (86%) | 371 (87%) | 586 (86%) | 331 (87%) | |
Hawaii/Pacific | 5 (0.3%) | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | |
Hispanic | 54 (3.6%) | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | |
Native-Am/Alaska | 5 (0.3%) | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | |
Other | 7 (0.5%) | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | |
Missing | 7 | 0 | 6 | 1 | |
hgb | 1,233 | 12.30 (11.10, 13.50) | 12.10 (11.00, 13.45) | 12.20 (11.10, 13.60) | 12.40 (11.18, 13.63) |
Missing | 266 | 69 | 141 | 56 | |
Body Mass Index (kg/m^2) | 1,466 | 26.3 (23.5, 30.1) | 26.2 (23.6, 30.6) | 26.5 (23.7, 30.1) | 26.0 (23.2, 29.6) |
Missing | 33 | 9 | 20 | 4 | |
alk.phos | 1,233 | 123 (86, 207) | 133 (89, 217) | 116 (85, 195) | 122 (88, 210) |
Missing | 266 | 69 | 141 | 56 | |
ast | 1,233 | 27 (20, 41) | 29 (21, 42) | 26 (19, 40) | 27 (20, 41) |
Missing | 266 | 69 | 141 | 56 | |
1 Median (IQR); n (%) |
18.4.5 Adding A Column Spanner
You can add column spanners to your table to group variables together. In this case, it would make sense to identify and group the three treatment arms with a column spanner.
tbl <- mockstudy_baseline |>
mutate(race = case_when(race == "African-Am" ~ "African-American",
.default = race)) |>
tbl_summary(label = list( sex ~ "Sex at Birth",
race ~ "Race Identified by Participant"),
by = arm,
missing_text = "Missing") |>
add_overall() |>
add_n() |>
bold_labels() |>
modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Treatment Received**")
tbl
Characteristic | N | Overall, N = 1,4991 | Treatment Received | ||
---|---|---|---|---|---|
A: IFL, N = 4281 | F: FOLFOX, N = 6911 | G: IROX, N = 3801 | |||
Age in Years | 1,499 | 61 (52, 68) | 61 (53, 68) | 61 (52, 69) | 61 (52, 68) |
Sex at Birth | 1,499 | ||||
Male | 916 (61%) | 277 (65%) | 411 (59%) | 228 (60%) | |
Female | 583 (39%) | 151 (35%) | 280 (41%) | 152 (40%) | |
Race Identified by Participant | 1,492 | ||||
African-American | 115 (7.7%) | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | |
Asian | 18 (1.2%) | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | |
Caucasian | 1,288 (86%) | 371 (87%) | 586 (86%) | 331 (87%) | |
Hawaii/Pacific | 5 (0.3%) | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | |
Hispanic | 54 (3.6%) | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | |
Native-Am/Alaska | 5 (0.3%) | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | |
Other | 7 (0.5%) | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | |
Missing | 7 | 0 | 6 | 1 | |
hgb | 1,233 | 12.30 (11.10, 13.50) | 12.10 (11.00, 13.45) | 12.20 (11.10, 13.60) | 12.40 (11.18, 13.63) |
Missing | 266 | 69 | 141 | 56 | |
Body Mass Index (kg/m^2) | 1,466 | 26.3 (23.5, 30.1) | 26.2 (23.6, 30.6) | 26.5 (23.7, 30.1) | 26.0 (23.2, 29.6) |
Missing | 33 | 9 | 20 | 4 | |
alk.phos | 1,233 | 123 (86, 207) | 133 (89, 217) | 116 (85, 195) | 122 (88, 210) |
Missing | 266 | 69 | 141 | 56 | |
ast | 1,233 | 27 (20, 41) | 29 (21, 42) | 26 (19, 40) | 27 (20, 41) |
Missing | 266 | 69 | 141 | 56 | |
1 Median (IQR); n (%) |
Note that the standard column names for columns in tbl_summary()
are
- label
- n
- stat_0
- stat_1
- stat_2
- stat_3, etc.
Note that you can confirm these with show_header_names()
and you can change these with modify_header()
to take control of the line breaks
with
and bolding with **.
Let’s take a look:
## ℹ As a usage guide, the code below re-creates the current column headers.
## modify_header(
## label = '**Characteristic**',
## n = '**N**',
## stat_0 = '**Overall**, N = 1,499',
## stat_1 = '**A: IFL**, N = 428',
## stat_2 = '**F: FOLFOX**, N = 691',
## stat_3 = '**G: IROX**, N = 380'
## )
##
##
## Column Name Column Header
## ------------ -----------------------
## label **Characteristic**
## n **N**
## stat_0 **Overall**, N = 1,499
## stat_1 **A: IFL**, N = 428
## stat_2 **F: FOLFOX**, N = 691
## stat_3 **G: IROX**, N = 380
tbl |>
modify_header((update = list(
stat_0 ~ "**Overall**<br>N = 1,499",
stat_1 ~ "**A: IFL**<br>N = 428",
stat_2 ~ "**F: FOLFOX**<br>N = 691",
stat_3 ~ "**G: IROX**<br>N = 380")
))
Characteristic | N | Overall N = 1,4991 |
Treatment Received | ||
---|---|---|---|---|---|
A: IFL N = 4281 |
F: FOLFOX N = 6911 |
G: IROX N = 3801 |
|||
Age in Years | 1,499 | 61 (52, 68) | 61 (53, 68) | 61 (52, 69) | 61 (52, 68) |
Sex at Birth | 1,499 | ||||
Male | 916 (61%) | 277 (65%) | 411 (59%) | 228 (60%) | |
Female | 583 (39%) | 151 (35%) | 280 (41%) | 152 (40%) | |
Race Identified by Participant | 1,492 | ||||
African-American | 115 (7.7%) | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | |
Asian | 18 (1.2%) | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | |
Caucasian | 1,288 (86%) | 371 (87%) | 586 (86%) | 331 (87%) | |
Hawaii/Pacific | 5 (0.3%) | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | |
Hispanic | 54 (3.6%) | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | |
Native-Am/Alaska | 5 (0.3%) | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | |
Other | 7 (0.5%) | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | |
Missing | 7 | 0 | 6 | 1 | |
hgb | 1,233 | 12.30 (11.10, 13.50) | 12.10 (11.00, 13.45) | 12.20 (11.10, 13.60) | 12.40 (11.18, 13.63) |
Missing | 266 | 69 | 141 | 56 | |
Body Mass Index (kg/m^2) | 1,466 | 26.3 (23.5, 30.1) | 26.2 (23.6, 30.6) | 26.5 (23.7, 30.1) | 26.0 (23.2, 29.6) |
Missing | 33 | 9 | 20 | 4 | |
alk.phos | 1,233 | 123 (86, 207) | 133 (89, 217) | 116 (85, 195) | 122 (88, 210) |
Missing | 266 | 69 | 141 | 56 | |
ast | 1,233 | 27 (20, 41) | 29 (21, 42) | 26 (19, 40) | 27 (20, 41) |
Missing | 266 | 69 | 141 | 56 | |
1 Median (IQR); n (%) |
18.4.6 Further Styling our Table 1
You can use {gtsummary} functions to modify header names, footnotes, captions, the number
of digits in numbers, and the stats used.
See the tutorial article here for more information. The pdf cheatsheet is also downloadable at the same website under
Articles/cheat sheet
, and is very helpful.
You can convert this table to a gt table with as_gt()
and further style it,
using all of the table styling functions available in the {gt}
package. Lots of
details on how to style {gt}
tables can be found in the {gt}
package
documentation.
18.4.7 Your Turn
In the last code block, we saved the table as the object tbl
. Take this tbl
object and
- fix up the labels for
hgb
,alk.phos
, andast
to be more publication-ready. - add a caption to the table that says “Baseline Characteristics of Participants in the Mock Study by Treatment Arm”
- add a footnote to the table that says “Note: Missing values are indicated as ‘Missing’ in the table.”
- improve the value labels for ‘Hawaii/Pacific’, and ‘Native-Am/Alaska’ to be “Native Hawaiian or other Pacific Islander” and “American Indian or Alaska Native” respectively.
- change the number of digits in the table to 2.
- change the stats to show the mean and standard deviation for continuous variables, and the count and percent for categorical variables.
- convert the table to a gt table and style it with the
gt
package. Add some color.
The code block below starts with tbl, unmodified.
Add a pipe and start fixing it up, with the goals above. Refer to the {gtsummary}
and {gt}
documentation for help. Click on the links below as needed.
- gtsummary: tbl_summary()
- gt: gt()
Characteristic | N | Overall, N = 1,4991 | Treatment Received | ||
---|---|---|---|---|---|
A: IFL, N = 4281 | F: FOLFOX, N = 6911 | G: IROX, N = 3801 | |||
Age in Years | 1,499 | 61 (52, 68) | 61 (53, 68) | 61 (52, 69) | 61 (52, 68) |
Sex at Birth | 1,499 | ||||
Male | 916 (61%) | 277 (65%) | 411 (59%) | 228 (60%) | |
Female | 583 (39%) | 151 (35%) | 280 (41%) | 152 (40%) | |
Race Identified by Participant | 1,492 | ||||
African-American | 115 (7.7%) | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | |
Asian | 18 (1.2%) | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | |
Caucasian | 1,288 (86%) | 371 (87%) | 586 (86%) | 331 (87%) | |
Hawaii/Pacific | 5 (0.3%) | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | |
Hispanic | 54 (3.6%) | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | |
Native-Am/Alaska | 5 (0.3%) | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | |
Other | 7 (0.5%) | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | |
Missing | 7 | 0 | 6 | 1 | |
hgb | 1,233 | 12.30 (11.10, 13.50) | 12.10 (11.00, 13.45) | 12.20 (11.10, 13.60) | 12.40 (11.18, 13.63) |
Missing | 266 | 69 | 141 | 56 | |
Body Mass Index (kg/m^2) | 1,466 | 26.3 (23.5, 30.1) | 26.2 (23.6, 30.6) | 26.5 (23.7, 30.1) | 26.0 (23.2, 29.6) |
Missing | 33 | 9 | 20 | 4 | |
alk.phos | 1,233 | 123 (86, 207) | 133 (89, 217) | 116 (85, 195) | 122 (88, 210) |
Missing | 266 | 69 | 141 | 56 | |
ast | 1,233 | 27 (20, 41) | 29 (21, 42) | 26 (19, 40) | 27 (20, 41) |
Missing | 266 | 69 | 141 | 56 | |
1 Median (IQR); n (%) |
18.5 Try this with a new dataset
Now we will use the trial
dataset from the {gtsummary} package to create a Table 1.
The trial
dataset is a simulated dataset of a clinical trial with 200 observations and 6 variables.
Run the code block below. Which are baseline variables for Table 1, and which are outcome variables?
Use your R skills to:
- select the baseline variables
- create a Table 1, divided by treatment (
trt
), and - set the missing_text to “Missing”.
## # A tibble: 200 × 8
## trt age marker stage grade response death ttdeath
## <chr> <dbl> <dbl> <fct> <fct> <int> <int> <dbl>
## 1 Drug A 23 0.16 T1 II 0 0 24
## 2 Drug B 9 1.11 T2 I 1 0 24
## 3 Drug A 31 0.277 T1 II 0 0 24
## 4 Drug A NA 2.07 T3 III 1 1 17.6
## 5 Drug A 51 2.77 T4 III 1 1 16.4
## 6 Drug B 39 0.613 T4 I 0 1 15.6
## 7 Drug A 37 0.354 T1 II 0 0 24
## 8 Drug A 32 1.74 T1 I 0 1 18.4
## 9 Drug A 31 0.144 T1 II 0 0 24
## 10 Drug B 34 0.205 T3 I 0 1 10.5
## # ℹ 190 more rows
Characteristic | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|
Age | 46 (37, 59) | 48 (39, 56) |
Missing | 7 | 4 |
Marker Level (ng/mL) | 0.84 (0.24, 1.57) | 0.52 (0.19, 1.20) |
Missing | 6 | 4 |
T Stage | ||
T1 | 28 (29%) | 25 (25%) |
T2 | 25 (26%) | 29 (28%) |
T3 | 22 (22%) | 21 (21%) |
T4 | 23 (23%) | 27 (26%) |
Grade | ||
I | 35 (36%) | 33 (32%) |
II | 32 (33%) | 36 (35%) |
III | 31 (32%) | 33 (32%) |
1 Median (IQR); n (%) |
18.6 Making Modifications to the trial table
Let’s do some basics, like adding a column for N, adding the overall column, and bolding the labels.
trial |>
select(age, marker, stage, grade, trt) |>
tbl_summary(by = trt, missing_text = "Missing") |>
add_n() |>
add_overall() |>
bold_labels()
Characteristic | N | Overall, N = 2001 | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|---|---|
Age | 189 | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Missing | 11 | 7 | 4 | |
Marker Level (ng/mL) | 190 | 0.64 (0.22, 1.39) | 0.84 (0.24, 1.57) | 0.52 (0.19, 1.20) |
Missing | 10 | 6 | 4 | |
T Stage | 200 | |||
T1 | 53 (27%) | 28 (29%) | 25 (25%) | |
T2 | 54 (27%) | 25 (26%) | 29 (28%) | |
T3 | 43 (22%) | 22 (22%) | 21 (21%) | |
T4 | 50 (25%) | 23 (23%) | 27 (26%) | |
Grade | 200 | |||
I | 68 (34%) | 35 (36%) | 33 (32%) | |
II | 68 (34%) | 32 (33%) | 36 (35%) | |
III | 64 (32%) | 31 (32%) | 33 (32%) | |
1 Median (IQR); n (%) |
18.7 More Modifications to the trial table
Let’s change the Headers to make them have line breaks (and no commas), change
the label Characteristic to ’Participant Characteristic, and add a column spanner for the treatment arms. Use flanking double asterisks
**` to bold the headers as needed.
trial |>
select(age, marker, stage, grade, trt) |>
tbl_summary(by = trt, missing_text = "Missing") |>
add_n() |>
add_overall() |>
bold_labels() |>
modify_header(update = list(
label ~ "**Participant<br>Characteristic**",
stat_0 ~ "**Overall**<br>N = 200",
stat_1 ~ "**Drug A**<br>N = 98",
stat_2 ~ "**Drug B**<br>N = 102"
)) |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")
Participant Characteristic |
N | Overall N = 2001 |
Treatment Arm | |
---|---|---|---|---|
Drug A N = 981 |
Drug B N = 1021 |
|||
Age | 189 | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Missing | 11 | 7 | 4 | |
Marker Level (ng/mL) | 190 | 0.64 (0.22, 1.39) | 0.84 (0.24, 1.57) | 0.52 (0.19, 1.20) |
Missing | 10 | 6 | 4 | |
T Stage | 200 | |||
T1 | 53 (27%) | 28 (29%) | 25 (25%) | |
T2 | 54 (27%) | 25 (26%) | 29 (28%) | |
T3 | 43 (22%) | 22 (22%) | 21 (21%) | |
T4 | 50 (25%) | 23 (23%) | 27 (26%) | |
Grade | 200 | |||
I | 68 (34%) | 35 (36%) | 33 (32%) | |
II | 68 (34%) | 32 (33%) | 36 (35%) | |
III | 64 (32%) | 31 (32%) | 33 (32%) | |
1 Median (IQR); n (%) |
18.8 Taking Control of the Stats
You can take control of what statistics are shown.
Let’s see an example where you require mean and SD for all continuous variables
in the table,
set digits to 2,
and show fractions and percentages for all categorical values.
These changes are made within the tbl_summary()
function.
Note how the footnote changes to keep up.
trial |>
select(age, marker, stage, grade, trt) |>
tbl_summary(by = trt, missing_text = "Missing",
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"
),
digits = all_continuous() ~ 2,
label = grade ~ "Tumor Grade",) |>
add_n() |>
add_overall() |>
bold_labels() |>
modify_header(update = list(
label ~ "**Participant<br>Characteristic**",
stat_0 ~ "**Overall**<br>N = 200",
stat_1 ~ "**Drug A**<br>N = 98",
stat_2 ~ "**Drug B**<br>N = 102"
)) |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")
Participant Characteristic |
N | Overall N = 2001 |
Treatment Arm | |
---|---|---|---|---|
Drug A N = 981 |
Drug B N = 1021 |
|||
Age | 189 | 47.24 (14.31) | 47.01 (14.71) | 47.45 (14.01) |
Missing | 11 | 7 | 4 | |
Marker Level (ng/mL) | 190 | 0.92 (0.86) | 1.02 (0.89) | 0.82 (0.83) |
Missing | 10 | 6 | 4 | |
T Stage | 200 | |||
T1 | 53 / 200 (27%) | 28 / 98 (29%) | 25 / 102 (25%) | |
T2 | 54 / 200 (27%) | 25 / 98 (26%) | 29 / 102 (28%) | |
T3 | 43 / 200 (22%) | 22 / 98 (22%) | 21 / 102 (21%) | |
T4 | 50 / 200 (25%) | 23 / 98 (23%) | 27 / 102 (26%) | |
Tumor Grade | 200 | |||
I | 68 / 200 (34%) | 35 / 98 (36%) | 33 / 102 (32%) | |
II | 68 / 200 (34%) | 32 / 98 (33%) | 36 / 102 (35%) | |
III | 64 / 200 (32%) | 31 / 98 (32%) | 33 / 102 (32%) | |
1 Mean (SD); n / N (%) |
18.8.1 Your Turn
The original streptomycin for tuberculosis trial never had a Table 1, because it
was published in 1955, when these did not exist. Let’s make one now.
Gender and the baseline_x variables in the strep_tb dataset will be the baseline variables.
The arm
variable will be the treatment variable.
Use the strep_tb
dataset from the {medicaldata} package to create a Table 1.
You will need to run data(strep_tb
) to load the dataset.
Then pipe the strep_tb
dataset into a select function, then into the tbl_summary()
function.
Improve the table from the baseline generic version. Add details and styling that you
think would be helpful to the reader, and make it publication-ready.
Consider cleaning up some of the value labels with mutate statements.
strep_tb |>
select(gender, starts_with("baseline"), arm) |>
tbl_summary(by = arm, missing_text = "Missing",
label = list(
gender ~ "Gender",
baseline_condition ~ "Baseline Condition",
baseline_temp ~ "Baseline Temperature",
baseline_esr ~ "Baseline ESR",
baseline_cavitation ~ "Baseline Cavitation"
)) |>
add_n() |>
add_overall() |>
bold_labels() |>
modify_header(update = list(
label ~ "**Participant<br>Characteristic**",
stat_0 ~ "**Overall**<br>N = 107",
stat_1 ~ "**Control**<br>N = 52",
stat_2 ~ "**Streptomycin**<br>N = 55"
)) |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")
Participant Characteristic |
N | Overall N = 1071 |
Treatment Arm | |
---|---|---|---|---|
Control N = 521 |
Streptomycin N = 551 |
|||
Gender | 107 | |||
F | 59 (55%) | 28 (54%) | 31 (56%) | |
M | 48 (45%) | 24 (46%) | 24 (44%) | |
Baseline Condition | 107 | |||
1_Good | 16 (15%) | 8 (15%) | 8 (15%) | |
2_Fair | 37 (35%) | 20 (38%) | 17 (31%) | |
3_Poor | 54 (50%) | 24 (46%) | 30 (55%) | |
Baseline Temperature | 107 | |||
1_<=98.9F/37.2C | 7 (6.5%) | 4 (7.7%) | 3 (5.5%) | |
2_99-99.9F/37.3-37.7C | 24 (22%) | 12 (23%) | 12 (22%) | |
2_99-99.9F/37.3-37.7C/37.3-37.7C | 1 (0.9%) | 0 (0%) | 1 (1.8%) | |
3_100-100.9F/37.8-38.2C | 31 (29%) | 17 (33%) | 14 (25%) | |
3_100-100.9F/37.8-38.2C/37.8-38.2C | 1 (0.9%) | 0 (0%) | 1 (1.8%) | |
4_>=101F/38.3C | 43 (40%) | 19 (37%) | 24 (44%) | |
Baseline ESR | 106 | |||
2_11-20 | 5 (4.7%) | 2 (3.9%) | 3 (5.5%) | |
3_21-50 | 36 (34%) | 20 (39%) | 16 (29%) | |
4_51+ | 65 (61%) | 29 (57%) | 36 (65%) | |
Missing | 1 | 1 | 0 | |
Baseline Cavitation | 107 | 62 (58%) | 30 (58%) | 32 (58%) |
1 n (%) |