5.5 Table 1
In almost every published article that includes quantitative data, there will be a “Table 1” displaying the descriptive statistics for the study sample. There are many ways to organize this information, but the following are commonly followed principles:
- Different variables are in different rows.
- Different statistics are in different columns.
- Categorical variables are typically summarized by displaying the number (N) and proportion (%) of cases at each level. Sometimes the number of missing values is indicated, as well.
- Continuous variables are typically summarized by displaying the mean and SD (or median and IQR). Sometimes the number of missing values is indicated, as well.
- If the descriptive statistics are to be presented by some other variable, levels of that variable should be in separate columns.
- The units for each variable should be included next to the variable name (e.g., Cholesterol (mg/dL)).
- The reader should be able to understand all the contents of the table, within reason, without reading the text. Clarifying information should be included in the title, headings, and footnotes.
For details on creating a “Table 1”, see Section 3.3 in Introduction to Regression Methods for Public Health. Here, we just present the relevant code for the dataset used in this chapter (Sjoberg et al. 2023, 2021).
library(gtsummary)
# Overall
mydat %>%
select(gender, race, income, age,
bmi, waist, choles, trigly, glucose) %>%
tbl_summary(
statistic = list(all_categorical() ~ "{n} ({p}%)",
age ~ "{mean} ({sd})",
bmi ~ "{mean} ({sd})",
waist ~ "{mean} ({sd})",
choles ~ "{mean} ({sd})",
trigly ~ "{median} ({IQR})",
glucose ~ "{median} ({IQR})"),
digits = list(all_continuous() ~ c(2, 2),
all_categorical() ~ c(0, 1)),
type = list(gender ~ "categorical",
race ~ "categorical",
income ~ "categorical",
age ~ "continuous",
bmi ~ "continuous",
waist ~ "continuous",
choles ~ "continuous",
trigly ~ "continuous",
glucose ~ "continuous"),
label = list(gender ~ "Gender",
race ~ "Race/Ethnicity",
income ~ "Annual Household Income",
age ~ "Age (years)",
bmi ~ "Body Mass Index (kg/m2)",
waist ~ "Waist Circumference (cm)",
choles ~ "Total Cholesterol (mg/dL)",
trigly ~ "Triglyceride (mg/dL)",
glucose ~ "Fasting Glucose (mg/dL)")
) %>%
modify_header(label = "**Variable**") %>%
modify_caption("Participant characteristics") %>%
bold_labels()
Variable | N = 2501 |
---|---|
Gender | |
Male | 116 (46.4%) |
Female | 134 (53.6%) |
Race/Ethnicity | |
Mexican American | 33 (13.2%) |
Other Hispanic | 21 (8.4%) |
Non-Hispanic White | 111 (44.4%) |
Non-Hispanic Black | 49 (19.6%) |
Other | 36 (14.4%) |
Annual Household Income | |
< $25,000 | 76 (33.2%) |
$25,000 to < $55,000 | 86 (37.6%) |
$55,000+ | 67 (29.3%) |
Unknown | 21 |
Age (years) | 48.29 (19.82) |
Body Mass Index (kg/m2) | 28.45 (6.72) |
Unknown | 1 |
Waist Circumference (cm) | 97.22 (16.23) |
Unknown | 15 |
Total Cholesterol (mg/dL) | 185.07 (44.22) |
Unknown | 20 |
Triglyceride (mg/dL) | 94.50 (79.50) |
Unknown | 20 |
Fasting Glucose (mg/dL) | 98.00 (12.25) |
Unknown | 18 |
1 n (%); Mean (SD); Median (IQR) |
# By gender
mydat %>%
select(gender, race, income, age,
bmi, waist, choles, trigly, glucose) %>%
tbl_summary(
by = gender,
statistic = list(all_categorical() ~ "{n} ({p}%)",
age ~ "{mean} ({sd})",
bmi ~ "{mean} ({sd})",
waist ~ "{mean} ({sd})",
choles ~ "{mean} ({sd})",
trigly ~ "{median} ({IQR})",
glucose ~ "{median} ({IQR})"),
digits = list(all_continuous() ~ c(2, 2),
all_categorical() ~ c(0, 1)),
type = list(race ~ "categorical",
income ~ "categorical",
age ~ "continuous",
bmi ~ "continuous",
waist ~ "continuous",
choles ~ "continuous",
trigly ~ "continuous",
glucose ~ "continuous"),
label = list(race ~ "Race/Ethnicity",
income ~ "Annual Household Income",
age ~ "Age (years)",
bmi ~ "Body Mass Index (kg/m2)",
waist ~ "Waist Circumference (cm)",
choles ~ "Total Cholesterol (mg/dL)",
trigly ~ "Triglyceride (mg/dL)",
glucose ~ "Fasting Glucose (mg/dL)")
) %>%
modify_header(
label = "**Variable**",
# The following adds the % to the column total label
# <br> is the location of a line break
all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"
) %>%
modify_caption("Participant characteristics, by gender") %>%
bold_labels() %>%
# Include an "overall" column
add_overall(
last = FALSE,
# The ** make it bold
col_label = "**All participants**<br>N = {N}"
)
Variable | All participants N = 2501 |
Male N = 116 (46.4%)1 |
Female N = 134 (53.6%)1 |
---|---|---|---|
Race/Ethnicity | |||
Mexican American | 33 (13.2%) | 11 (9.5%) | 22 (16.4%) |
Other Hispanic | 21 (8.4%) | 9 (7.8%) | 12 (9.0%) |
Non-Hispanic White | 111 (44.4%) | 48 (41.4%) | 63 (47.0%) |
Non-Hispanic Black | 49 (19.6%) | 25 (21.6%) | 24 (17.9%) |
Other | 36 (14.4%) | 23 (19.8%) | 13 (9.7%) |
Annual Household Income | |||
< $25,000 | 76 (33.2%) | 31 (30.7%) | 45 (35.2%) |
$25,000 to < $55,000 | 86 (37.6%) | 41 (40.6%) | 45 (35.2%) |
$55,000+ | 67 (29.3%) | 29 (28.7%) | 38 (29.7%) |
Unknown | 21 | 15 | 6 |
Age (years) | 48.29 (19.82) | 48.89 (19.84) | 47.77 (19.87) |
Body Mass Index (kg/m2) | 28.45 (6.72) | 28.26 (5.90) | 28.61 (7.38) |
Unknown | 1 | 1 | 0 |
Waist Circumference (cm) | 97.22 (16.23) | 100.47 (16.22) | 94.50 (15.78) |
Unknown | 15 | 9 | 6 |
Total Cholesterol (mg/dL) | 185.07 (44.22) | 177.25 (43.51) | 191.76 (43.90) |
Unknown | 20 | 10 | 10 |
Triglyceride (mg/dL) | 94.50 (79.50) | 100.50 (100.50) | 93.50 (76.25) |
Unknown | 20 | 10 | 10 |
Fasting Glucose (mg/dL) | 98.00 (12.25) | 99.00 (20.25) | 97.00 (12.75) |
Unknown | 18 | 10 | 8 |
1 n (%); Mean (SD); Median (IQR) |
References
Sjoberg, Daniel D., Joseph Larmarange, Michael Curry, Jessica Lavery, Karissa Whiting, and Emily C. Zabor. 2023. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://github.com/ddsjoberg/gtsummary.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.