5.5 Table 1

In almost every published article that includes quantitative data, there will be a “Table 1” displaying the descriptive statistics for the study sample. There are many ways to organize this information, but the following are commonly followed principles:

Different variables are in different rows.
Different statistics are in different columns.
Categorical variables are typically summarized by displaying the number (N) and proportion (%) of cases at each level. Sometimes the number of missing values is indicated, as well.
Continuous variables are typically summarized by displaying the mean and SD (or median and IQR). Sometimes the number of missing values is indicated, as well.
If the descriptive statistics are to be presented by some other variable, levels of that variable should be in separate columns.
The units for each variable should be included next to the variable name (e.g., Cholesterol (mg/dL)).
The reader should be able to understand all the contents of the table, within reason, without reading the text. Clarifying information should be included in the title, headings, and footnotes.

For details on creating a “Table 1”, see Section 3.3 in Introduction to Regression Methods for Public Health. Here, we just present the relevant code for the dataset used in this chapter (Sjoberg et al. 2025, 2021).

library(gtsummary)

# Overall
mydat %>% 
  select(gender, race, income, age,
         bmi, waist, choles, trigly, glucose) %>%
  tbl_summary(
    statistic = list(all_categorical() ~ "{n} ({p}%)",
                     age     ~ "{mean} ({sd})",
                     bmi     ~ "{mean} ({sd})",
                     waist   ~ "{mean} ({sd})",
                     choles  ~ "{mean} ({sd})",
                     trigly  ~ "{median} ({IQR})",
                     glucose ~ "{median} ({IQR})"),
    digits = list(all_continuous()  ~ c(2, 2),
                  all_categorical() ~ c(0, 1)),
    type = list(gender   ~ "categorical",
                race     ~ "categorical",
                income   ~ "categorical",
                age      ~ "continuous",
                bmi      ~ "continuous",
                waist    ~ "continuous",
                choles   ~ "continuous",
                trigly   ~ "continuous",
                glucose  ~ "continuous"),
    label = list(gender  ~ "Gender",
                 race    ~ "Race/Ethnicity",
                 income  ~ "Annual Household Income",
                 age     ~ "Age (years)",
                 bmi     ~ "Body Mass Index (kg/m2)",
                 waist   ~ "Waist Circumference (cm)",
                 choles  ~ "Total Cholesterol (mg/dL)",
                 trigly  ~ "Triglyceride (mg/dL)",
                 glucose ~ "Fasting Glucose (mg/dL)") 
  ) %>%
  modify_header(label = "**Variable**") %>%
  modify_caption("Participant characteristics") %>%
  bold_labels()

Table 5.1: Participant characteristics
Variable	N = 250¹
Gender
Male	116 (46.4%)
Female	134 (53.6%)
Race/Ethnicity
Mexican American	33 (13.2%)
Other Hispanic	21 (8.4%)
Non-Hispanic White	111 (44.4%)
Non-Hispanic Black	49 (19.6%)
Other	36 (14.4%)
Annual Household Income
< $25,000	76 (33.2%)
$25,000 to < $55,000	86 (37.6%)
$55,000+	67 (29.3%)
Unknown	21
Age (years)	48.29 (19.82)
Body Mass Index (kg/m2)	28.45 (6.72)
Unknown	1
Waist Circumference (cm)	97.22 (16.23)
Unknown	15
Total Cholesterol (mg/dL)	185.07 (44.22)
Unknown	20
Triglyceride (mg/dL)	94.50 (79.50)
Unknown	20
Fasting Glucose (mg/dL)	98.00 (12.25)
Unknown	18
¹ n (%); Mean (SD); Median (IQR)

# By gender
mydat %>% 
  select(gender, race, income, age,
         bmi, waist, choles, trigly, glucose) %>%
  tbl_summary(
    by = gender,
    statistic = list(all_categorical() ~ "{n}    ({p}%)",
                     age     ~ "{mean} ({sd})",
                     bmi     ~ "{mean} ({sd})",
                     waist   ~ "{mean} ({sd})",
                     choles  ~ "{mean} ({sd})",
                     trigly  ~ "{median} ({IQR})",
                     glucose ~ "{median} ({IQR})"),
    digits = list(all_continuous()  ~ c(2, 2),
                  all_categorical() ~ c(0, 1)),
    type = list(race     ~ "categorical",
                income   ~ "categorical",
                age      ~ "continuous",
                bmi      ~ "continuous",
                waist    ~ "continuous",
                choles   ~ "continuous",
                trigly   ~ "continuous",
                glucose  ~ "continuous"),
    label = list(race    ~ "Race/Ethnicity",
                 income  ~ "Annual Household Income",
                 age     ~ "Age (years)",
                 bmi     ~ "Body Mass Index (kg/m2)",
                 waist   ~ "Waist Circumference (cm)",
                 choles  ~ "Total Cholesterol (mg/dL)",
                 trigly  ~ "Triglyceride (mg/dL)",
                 glucose ~ "Fasting Glucose (mg/dL)") 
  ) %>%
  modify_header(
    label = "**Variable**",
    # The following adds the % to the column total label
    # <br> is the location of a line break
    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"
  ) %>%
  modify_caption("Participant characteristics, by gender") %>%
  bold_labels()  %>%
  # Include an "overall" column
  add_overall(
    last = FALSE,
    # The ** make it bold
    col_label = "**All participants**<br>N = {N}"
  )

Table 5.2: Participant characteristics, by gender
Variable	All participants N = 250¹	Male N = 116 (46.4%)¹	Female N = 134 (53.6%)¹
Race/Ethnicity
Mexican American	33 (13.2%)	11 (9.5%)	22 (16.4%)
Other Hispanic	21 (8.4%)	9 (7.8%)	12 (9.0%)
Non-Hispanic White	111 (44.4%)	48 (41.4%)	63 (47.0%)
Non-Hispanic Black	49 (19.6%)	25 (21.6%)	24 (17.9%)
Other	36 (14.4%)	23 (19.8%)	13 (9.7%)
Annual Household Income
< $25,000	76 (33.2%)	31 (30.7%)	45 (35.2%)
$25,000 to < $55,000	86 (37.6%)	41 (40.6%)	45 (35.2%)
$55,000+	67 (29.3%)	29 (28.7%)	38 (29.7%)
Unknown	21	15	6
Age (years)	48.29 (19.82)	48.89 (19.84)	47.77 (19.87)
Body Mass Index (kg/m2)	28.45 (6.72)	28.26 (5.90)	28.61 (7.38)
Unknown	1	1	0
Waist Circumference (cm)	97.22 (16.23)	100.47 (16.22)	94.50 (15.78)
Unknown	15	9	6
Total Cholesterol (mg/dL)	185.07 (44.22)	177.25 (43.51)	191.76 (43.90)
Unknown	20	10	10
Triglyceride (mg/dL)	94.50 (79.50)	100.50 (100.50)	93.50 (76.25)
Unknown	20	10	10
Fasting Glucose (mg/dL)	98.00 (12.25)	99.00 (20.25)	97.00 (12.75)
Unknown	18	10	8
¹ n (%); Mean (SD); Median (IQR)

References

Sjoberg, Daniel D., Joseph Larmarange, Michael Curry, Emily de la Rua, Jessica Lavery, Karissa Whiting, and Emily C. Zabor. 2025. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://github.com/ddsjoberg/gtsummary.

Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.