5.5 Table 1

In almost every published article that includes quantitative data, there will be a “Table 1” displaying the descriptive statistics for the study sample. There are many ways to organize this information, but the following are commonly followed principles:

  • Different variables are in different rows.
  • Different statistics are in different columns.
  • Categorical variables are typically summarized by displaying the number (N) and proportion (%) of cases at each level. Sometimes the number of missing values is indicated, as well.
  • Continuous variables are typically summarized by displaying the mean and SD (or median and IQR). Sometimes the number of missing values is indicated, as well.
  • If the descriptive statistics are to be presented by some other variable, levels of that variable should be in separate columns.
  • The units for each variable should be included next to the variable name (e.g., Cholesterol (mg/dL)).
  • The reader should be able to understand all the contents of the table, within reason, without reading the text. Clarifying information should be included in the title, headings, and footnotes.

For details on creating a “Table 1”, see Section 3.3 in Introduction to Regression Methods for Public Health. Here, we just present the relevant code for the dataset used in this chapter (Sjoberg et al. 2023, 2021).

library(gtsummary)

# Overall
mydat %>% 
  select(gender, race, income, age,
         bmi, waist, choles, trigly, glucose) %>%
  tbl_summary(
    statistic = list(all_categorical() ~ "{n} ({p}%)",
                     age     ~ "{mean} ({sd})",
                     bmi     ~ "{mean} ({sd})",
                     waist   ~ "{mean} ({sd})",
                     choles  ~ "{mean} ({sd})",
                     trigly  ~ "{median} ({IQR})",
                     glucose ~ "{median} ({IQR})"),
    digits = list(all_continuous()  ~ c(2, 2),
                  all_categorical() ~ c(0, 1)),
    type = list(gender   ~ "categorical",
                race     ~ "categorical",
                income   ~ "categorical",
                age      ~ "continuous",
                bmi      ~ "continuous",
                waist    ~ "continuous",
                choles   ~ "continuous",
                trigly   ~ "continuous",
                glucose  ~ "continuous"),
    label = list(gender  ~ "Gender",
                 race    ~ "Race/Ethnicity",
                 income  ~ "Annual Household Income",
                 age     ~ "Age (years)",
                 bmi     ~ "Body Mass Index (kg/m2)",
                 waist   ~ "Waist Circumference (cm)",
                 choles  ~ "Total Cholesterol (mg/dL)",
                 trigly  ~ "Triglyceride (mg/dL)",
                 glucose ~ "Fasting Glucose (mg/dL)") 
  ) %>%
  modify_header(label = "**Variable**") %>%
  modify_caption("Participant characteristics") %>%
  bold_labels()
Table 5.1: Participant characteristics
Variable N = 2501
Gender
    Male 116 (46.4%)
    Female 134 (53.6%)
Race/Ethnicity
    Mexican American 33 (13.2%)
    Other Hispanic 21 (8.4%)
    Non-Hispanic White 111 (44.4%)
    Non-Hispanic Black 49 (19.6%)
    Other 36 (14.4%)
Annual Household Income
    < $25,000 76 (33.2%)
    $25,000 to < $55,000 86 (37.6%)
    $55,000+ 67 (29.3%)
    Unknown 21
Age (years) 48.29 (19.82)
Body Mass Index (kg/m2) 28.45 (6.72)
    Unknown 1
Waist Circumference (cm) 97.22 (16.23)
    Unknown 15
Total Cholesterol (mg/dL) 185.07 (44.22)
    Unknown 20
Triglyceride (mg/dL) 94.50 (79.50)
    Unknown 20
Fasting Glucose (mg/dL) 98.00 (12.25)
    Unknown 18
1 n (%); Mean (SD); Median (IQR)
# By gender
mydat %>% 
  select(gender, race, income, age,
         bmi, waist, choles, trigly, glucose) %>%
  tbl_summary(
    by = gender,
    statistic = list(all_categorical() ~ "{n}    ({p}%)",
                     age     ~ "{mean} ({sd})",
                     bmi     ~ "{mean} ({sd})",
                     waist   ~ "{mean} ({sd})",
                     choles  ~ "{mean} ({sd})",
                     trigly  ~ "{median} ({IQR})",
                     glucose ~ "{median} ({IQR})"),
    digits = list(all_continuous()  ~ c(2, 2),
                  all_categorical() ~ c(0, 1)),
    type = list(race     ~ "categorical",
                income   ~ "categorical",
                age      ~ "continuous",
                bmi      ~ "continuous",
                waist    ~ "continuous",
                choles   ~ "continuous",
                trigly   ~ "continuous",
                glucose  ~ "continuous"),
    label = list(race    ~ "Race/Ethnicity",
                 income  ~ "Annual Household Income",
                 age     ~ "Age (years)",
                 bmi     ~ "Body Mass Index (kg/m2)",
                 waist   ~ "Waist Circumference (cm)",
                 choles  ~ "Total Cholesterol (mg/dL)",
                 trigly  ~ "Triglyceride (mg/dL)",
                 glucose ~ "Fasting Glucose (mg/dL)") 
  ) %>%
  modify_header(
    label = "**Variable**",
    # The following adds the % to the column total label
    # <br> is the location of a line break
    all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)"
  ) %>%
  modify_caption("Participant characteristics, by gender") %>%
  bold_labels()  %>%
  # Include an "overall" column
  add_overall(
    last = FALSE,
    # The ** make it bold
    col_label = "**All participants**<br>N = {N}"
  )
Table 5.2: Participant characteristics, by gender
Variable All participants
N = 2501
Male
N = 116 (46.4%)1
Female
N = 134 (53.6%)1
Race/Ethnicity


    Mexican American 33 (13.2%) 11 (9.5%) 22 (16.4%)
    Other Hispanic 21 (8.4%) 9 (7.8%) 12 (9.0%)
    Non-Hispanic White 111 (44.4%) 48 (41.4%) 63 (47.0%)
    Non-Hispanic Black 49 (19.6%) 25 (21.6%) 24 (17.9%)
    Other 36 (14.4%) 23 (19.8%) 13 (9.7%)
Annual Household Income


    < $25,000 76 (33.2%) 31 (30.7%) 45 (35.2%)
    $25,000 to < $55,000 86 (37.6%) 41 (40.6%) 45 (35.2%)
    $55,000+ 67 (29.3%) 29 (28.7%) 38 (29.7%)
    Unknown 21 15 6
Age (years) 48.29 (19.82) 48.89 (19.84) 47.77 (19.87)
Body Mass Index (kg/m2) 28.45 (6.72) 28.26 (5.90) 28.61 (7.38)
    Unknown 1 1 0
Waist Circumference (cm) 97.22 (16.23) 100.47 (16.22) 94.50 (15.78)
    Unknown 15 9 6
Total Cholesterol (mg/dL) 185.07 (44.22) 177.25 (43.51) 191.76 (43.90)
    Unknown 20 10 10
Triglyceride (mg/dL) 94.50 (79.50) 100.50 (100.50) 93.50 (76.25)
    Unknown 20 10 10
Fasting Glucose (mg/dL) 98.00 (12.25) 99.00 (20.25) 97.00 (12.75)
    Unknown 18 10 8
1 n (%); Mean (SD); Median (IQR)

References

Sjoberg, Daniel D., Joseph Larmarange, Michael Curry, Jessica Lavery, Karissa Whiting, and Emily C. Zabor. 2023. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://github.com/ddsjoberg/gtsummary.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.