Chapter 4 Tables

The focus of this chapter is production of tables suitable for inclusion in scientific manuscripts. R has several functions available that allows you to get descriptive information from your data, such as summary from base and various implementations of describe functions (e.g., from packages such as Hmisc or psych). Relevant functions available from base are mean(), sd(), median(), quantile(), prop.table(table()) etc.

The dataset used for demonstrations is the “starwars” dataset1, included in the dplyrpackage. This can be loaded into your environment by issuing the following command in your console:

data(starwars, package="dplyr")

resulting in the object “starwars” being created in you Global environment. Having loaded the tidyverse package, and issuing the command starwars will result in the following output in your console:

## # A tibble: 87 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke…    172    77 blond      fair       blue            19   male  
##  2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
##  3 R2-D2     96    32 <NA>       white, bl… red             33   <NA>  
##  4 Dart…    202   136 none       white      yellow          41.9 male  
##  5 Leia…    150    49 brown      light      brown           19   female
##  6 Owen…    178   120 brown, gr… light      blue            52   male  
##  7 Beru…    165    75 brown      light      blue            47   female
##  8 R5-D4     97    32 <NA>       white, red red             NA   <NA>  
##  9 Bigg…    183    84 black      light      brown           24   male  
## 10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
## # ... with 77 more rows, and 5 more variables: homeworld <chr>,
## #   species <chr>, films <list>, vehicles <list>, starships <list>

To obtain simple summary statistics from variables height and mass in the starwars dataset one would issue commands such as2:

4.1 Tables of descriptives

There are many packages that are designed to produce table output, there are some extensive overviews with feature comparison available from the 2017 rOpenSci Unconference and David Hugh-Jones, the author of the huxtable package also has a comparison table available.

Some guiding principles in selecting packages could be:

  1. Structure of the data for the table (e.g. numeric only, or mixed)
  2. Output format requirements (i.e. html, latex, or word)
  3. Statistical comparison requirements (e.g. such as t-tests or \(\chi\)2)
  4. Need for customization and tailoring (e.g. specific journal requirements)

For simple tables, there is often no need for custom table-making packages, and one may simply create the table manually by piecing together the required information. While this is the simplest approach through which to make a table, it is also perhaps the most customizable, as the pieces that go into the table are completely under your control. Some of the table-making packages are very opinionated with regards to which descriptives they provide, what statistical tests they use, and how they format the output. As the level of complexity of the table increases, however, there is time to be saved by using specific packages that have been designed for these purposes.

4.1.1 Manual approach

4.1.2 Package approach

The presentation below demonstrate some of the tables that I myself prefer to use when making tables. Most of them give output that is fairly good and close to what I want, but none of them make tables that are perfect for publication, so their output must be tweaked. I show how to tweak them into a table that could be included in a submission.

4.1.2.1 finalfit::summary_factorlist()

Making a table of descriptives with the finalfit package is based on tidy principles, and leverages functions from dplyr and the margrittr pipe %>%. The finalfit package is well-documented with many examples available.

It is also easy to fit the resulting table into kable, which will make your “table-making-life” a lot easier, as demonstrated later.

Bivariate table

Male Female
height Mean (SD) 179.2 (35.4) 165.5 (23)
mass Mean (SD) 81 (28.2) 54 (8.4)

Bivariate table with statistical test

Males Females p-value
height Mean (SD) 179.2 (35.4) 165.5 (23) 0.001
mass Mean (SD) 81 (28.2) 54 (8.4) <0.001

4.1.2.2 tableone::tableone()

This package is another recommended package. It has extensive documentation, and is highly configurable. The data is feed into tableonein a slightly different way than for finalfit, and is not configured to work well with tidy principles. It also does not facilitate printing to markdown (it is rather geared towards using excel as a intermediate step), but using a slight hack it may be coereced into something that can be feed to kable, which will make your life easier.

Univariate

Table 4.5: tableone Univariate table
Overall
n 87
height (mean (sd)) 174.36 (34.77)
mass (mean (sd)) 97.31 (169.46)

Bivariate

Table 4.6: tableone Bivariate table
female male
n 19 62
height (mean (sd)) 165.47 (23.03) 179.24 (35.39)
mass (mean (sd)) 54.02 (8.37) 81.00 (28.22)

Bivariate with test

Table 4.7: tableone Bivariate table with statistical test
female male p
n 19 62
height (mean (sd)) 165.47 (23.03) 179.24 (35.39) 0.135
mass (mean (sd)) 54.02 (8.37) 81.00 (28.22) 0.004

4.1.2.3 Hmisc::summaryM

I really enjoy the packages written by Frank Harrel, Hmisc for various stuff like bivariate tables, and rms for different regression techniques. One of the functions from Hmisc, summaryM produces great bivariate summaries of mixed data. This function, as well as most others, is opinionated, especially with regards to the summary statistics it provides, and will also need some “postprocessing” to coerce it into a kable-friendly format. It has tons of options, so you better read the help file by typing ?summaryM in the console.

The default print is useful in the console, see the example below, but is not appropriate for inclusion in a report as it is.

## 
## 
## Descriptive Statistics  (N=81)
## 
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |                 |N |female                               |male                                 |  Test                             |
## |                 |  |(N=19)                               |(N=62)                               |Statistic                          |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |height           |76|163.00/166.00/178.00  165.47+/- 23.03|174.00/183.00/193.00  179.24+/- 35.39|       F=12.4 d.f.=1,74 P<0.001    |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |mass             |54|   49.25/52.50/55.90  54.02+/- 8.37  |   76.50/80.00/87.25  81.00+/-28.22  |      F=15.76 d.f.=1,52 P<0.001    |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |eye_color        |81|                                     |                                     |   Chi-square=15.82 d.f.=14 P=0.325|
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    black        |  |              11%  ( 2)              |              11%  ( 7)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    blue         |  |              32%  ( 6)              |              21%  (13)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    blue-gray    |  |               0%  ( 0)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    brown        |  |              26%  ( 5)              |              26%  (16)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    dark         |  |               0%  ( 0)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    gold         |  |               0%  ( 0)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    green, yellow|  |               0%  ( 0)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    hazel        |  |              11%  ( 2)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    orange       |  |               0%  ( 0)              |              11%  ( 7)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    pink         |  |               0%  ( 0)              |               2%  ( 1)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    red          |  |               0%  ( 0)              |               3%  ( 2)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    red, blue    |  |               5%  ( 1)              |               0%  ( 0)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    unknown      |  |               5%  ( 1)              |               3%  ( 2)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    white        |  |               5%  ( 1)              |               0%  ( 0)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+
## |    yellow       |  |               5%  ( 1)              |              15%  ( 9)              |                                   |
## +-----------------+--+-------------------------------------+-------------------------------------+-----------------------------------+

summaryM provides a default print type for html (se below):

Hmisc summaryM table.
N
female
N=19
male
N=62
Test Statistic
height 76 163.00 166.00 178.00  (165.47 ± 23.03) 174.00 183.00 193.00  (179.24 ± 35.39) F1 74=12.4, P<0.0011
mass 54 49.25 52.50 55.90  (54.02 ± 8.37) 76.50 80.00 87.25  (81.00 ±28.22) F1 52=15.76, P<0.0011
eye_color 81 χ214=15.82, P=0.3252
    black 11% ( 2) 11% ( 7)
    blue 32% ( 6) 21% (13)
    blue-gray 0% ( 0) 2% ( 1)
    brown 26% ( 5) 26% (16)
    dark 0% ( 0) 2% ( 1)
    gold 0% ( 0) 2% ( 1)
    green, yellow 0% ( 0) 2% ( 1)
    hazel 11% ( 2) 2% ( 1)
    orange 0% ( 0) 11% ( 7)
    pink 0% ( 0) 2% ( 1)
    red 0% ( 0) 3% ( 2)
    red, blue 5% ( 1) 0% ( 0)
    unknown 5% ( 1) 3% ( 2)
    white 5% ( 1) 0% ( 0)
    yellow 5% ( 1) 15% ( 9)
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values. Numbers after proportions are frequencies.
Tests used: 1Wilcoxon test; 2Pearson test .

Although the default latex and html output is pretty polished, it does not return a data frame that you can manipulate, should you not be satisfied with the default choices for design, included statistics or formatting. I therefore prefer to convert the output into a data frame. When printed directly, it looks much less polished, but now we have a data frame which we can feed into kable, and we can manipulate it further, and style it with kableExtra for latex and html output.

Table 4.8: Table from Hmisc::summaryM
N female (N=19) male (N=62) Test Statistic
height 76 163.00/166.00/178.00 165.47+/- 23.03 174.00/183.00/193.00 179.24+/- 35.39 F=12.4 d.f.=1,74 P<0.001
mass 54 49.25/52.50/55.90 54.02+/- 8.37 76.50/80.00/87.25 81.00+/-28.22 F=15.76 d.f.=1,52 P<0.001
eye_color 81 Chi-square=15.82 d.f.=14 P=0.325
black 10.5% ( 2) 11.3% ( 7)
blue 31.6% ( 6) 21.0% (13)
blue-gray 0.0% ( 0) 1.6% ( 1)
brown 26.3% ( 5) 25.8% (16)
dark 0.0% ( 0) 1.6% ( 1)
gold 0.0% ( 0) 1.6% ( 1)
green, yellow 0.0% ( 0) 1.6% ( 1)
hazel 10.5% ( 2) 1.6% ( 1)
orange 0.0% ( 0) 11.3% ( 7)
pink 0.0% ( 0) 1.6% ( 1)
red 0.0% ( 0) 3.2% ( 2)
red, blue 5.3% ( 1) 0.0% ( 0)
unknown 5.3% ( 1) 3.2% ( 2)
white 5.3% ( 1) 0.0% ( 0)
yellow 5.3% ( 1) 14.5% ( 9)

4.1.2.4 Styling with kableExtra

The kableExtra package is not a table making package, rather it works together with kable and adds many useful extensions and options. It is well documented and maintained, but only works with latex or html output.

Table 4.9: tableone Bivariate table with statistical test
Gender
Test
female male1 p*
n 19 62
Demo grouping and indentation
height (mean (sd)) 165.47 (23.03) 179.24 (35.39) 0.135
mass (mean (sd))a 54.02 (8.37) 81.00 (28.22) 0.004
Note:
Here is a general comments of the table.
1 Demo
a Some were large
* t-test

4.2 Tables from statistical models

4.2.1 Package approach

4.2.1.1 finalfit

There are many packages made for presenting output from statistical models, such as regression analysis, some candidates are xtable, stargazer and finalfit. If you do a “plain” regression analysis, these may be good packages to explore. For most of these, you will have to decide on the outout format, all packages produce either latex or html formatted regression output. Once again, finalfit is a good choice, as it will allow for customization of the final table using kableextra as illustrated above. Importantly, this package is also very opinionated about what results from regression analysis that is reported, and how these results are presented. If you disagree with the package author about this, you may very well be better off by rolling your own!

Linear regression

A tutorial for making regression tables using finalfit is available here. Somewhat annoyingly, you would have to specify the regression itself in finalfit code, which may or may not be a hazzle.

Where you would specify a linear regression model in plain R as:

m1 <- lm(mass ~ height, data = starwars_fix)

the same model must be specified in finalfit, fitted and printed as a table with the following code:

resulting in the following table:

Table 4.10: Linear regression with finalfit
Dependent: mass Mean (sd) Coefficient (univariable) Coefficient (multivariable)
height [66,234] 76.0 (27.8) 0.62 (0.46 to 0.77, p<0.001) 0.62 (0.46 to 0.77, p<0.001)

Multiple linear regresssion

The model above may be extended into multiple linear regressions using the following syntax, also from the finalfit package. The extra step consists of specifiying a separate model with the same, or a reduced set of variables:

Table 4.11: Multiple linear regression with finalfit
Dependent: mass Mean (sd) Coefficient (univariable) Coefficient (multivariable)
height [66,234] 76.0 (27.8) 0.62 (0.46 to 0.77, p<0.001) 0.59 (0.46 to 0.73, p<0.001)
gender female 54.0 (8.4) - -
male 81.0 (28.2) 26.98 (8.78 to 45.19, p=0.004) 22.40 (10.70 to 34.10, p<0.001)

Logistic regression

finalfit can also fit logistic regression. As it appears from the syntax, there is no specification of the actual logistic model here, and finalfit decides the appropriate model from the type of data that is entered into the model. This underscores the importance of coding this correctly in R as numeric, factor and character variables will be treated differently.

Table 4.12: Logistic regression with finalfit
Dependent: gender female male OR (univariable) OR (multivariable)
height Mean (SD) 165.5 (23) 179.2 (35.4) 1.01 (1.00-1.03, p=0.142) 0.93 (0.88-0.97, p=0.005)
mass Mean (SD) 54 (8.4) 81 (28.2) 1.05 (1.02-1.09, p=0.009) 1.16 (1.08-1.30, p=0.001)

4.2.1.2 texreg

The texreg package provides anoter option for producing regression tables.

texreg table
Model 1 Model 2 Model 3
(Intercept) 165.67*** 123.11*** 121.21***
(8.97) (10.47) (9.10)
gendermale 15.51 -6.66 -4.81
(10.07) (8.41) (7.32)
mass 0.77*** 0.62***
(0.15) (0.14)
birth_year 0.24**
(0.08)
R2 0.08 0.54 0.67
Adj. R2 0.05 0.51 0.63
Num. obs. 29 29 29
RMSE 21.97 15.75 13.66
p < 0.001, p < 0.01, p < 0.05

4.2.1.3 stargazer

The stargazer package provides yet another option.

stargazer table
Dependent variable:
height
(1) (2) (3)
gendermale 15.507 -6.663 -4.807
(10.073) (8.407) (7.317)
mass 0.773*** 0.622***
(0.150) (0.139)
birth_year 0.238***
(0.077)
Constant 165.667*** 123.111*** 121.213***
(8.971) (10.470) (9.103)
Observations 29 29 29
R2 0.081 0.545 0.671
Adjusted R2 0.047 0.510 0.631
Residual Std. Error 21.974 (df = 27) 15.754 (df = 26) 13.665 (df = 25)
F Statistic 2.370 (df = 1; 27) 15.570*** (df = 2; 26) 16.982*** (df = 3; 25)
Note: p<0.1; p<0.05; p<0.01

4.2.2 Rolling your own

As the many different opinions regarding table ouput often are not suitable, I most often end up by rolling my own. Using a combination of base functions such as lm or glm and broom::tidy and broom::glance will give you most of the information you need to make a proper table. This often becomes necessary for custom models, or for models using imputed data etc.

rms is one of my favorite packages for conducting regression analyses; It produces nice tables in the console, easy summaries and contrasts, supports imputed data and makes it easy to make predictions for various conditions. It is a tool for conducting regression, not making regression tables, but will be used to illustrate how you can roll your own regression tables:

m1.coef <- broom::tidy(summary.lm((m1.r)))
m1.coef$p.value <- ifelse(m1.coef$p.value<.001, "<.001", round(m1.coef$p.value, 3))
m1.coef[,2:4] <- round(m1.coef[,2:4],2)
m1.coef <- as.data.frame(m1.coef)
m1.merge <- m1.coef[,c(2:3,5)]

m2.coef <- broom::tidy(summary.lm((m2.r)))
m2.coef$p.value <- ifelse(m2.coef$p.value<.001, "<.001", round(m2.coef$p.value, 3))
m2.coef[,2:4] <- round(m2.coef[,2:4],2)
m2.coef <- as.data.frame(m2.coef)
m2.merge <- m2.coef[,c(2:3,5)]

m3.coef <- broom::tidy(summary.lm((m3.r)))
m3.coef$p.value <- ifelse(m3.coef$p.value<.001, "<.001", round(m3.coef$p.value, 3))
m3.coef[,2:4] <- round(m3.coef[,2:4],2)
m3.coef <- as.data.frame(m3.coef)
m3.merge <- m3.coef[,c(2:3,5)]

m123 <- rowr::cbind.fill(m1.merge, m2.merge, m3.merge, fill = " ")
colnames(m123) <- c("b", "SE", "p", "b", "SE", "p", "b", "SE", "p")
rownames(m123) <- c("Constant", "Gender: Male", "Mass (kg)", "Birth year")

m1.fit <- round(broom::glance(m1.r),2)
m2.fit <- round(broom::glance(m2.r),2)
m3.fit <- round(broom::glance(m3.r),2)

m1v2 <- lrtest(m1.r, m2.r)$stats
m1v2[1] <- round(m1v2[1],1)
m1v2[2] <- round(m1v2[2],1)
m1v2[3] <- ifelse(as.numeric(m1v2[3])<.001, "<.001", as.numeric(m1v2[3]))

m2v3 <- lrtest(m2.r, m3.r)$stats
m2v3[1] <- format(m2v3[1],nsmall = 1, digits = 1)
m2v3[2] <- format(m2v3[2],nsmall = 1)
m2v3[3] <- ifelse(as.numeric(m2v3[3])<.001, "<.001", round(as.numeric(m2v3[3]),3))


m123.r2 <- cbind(b = m1.fit$r.squared, SE = "", p = "", 
                 b = m2.fit$r.squared, SE = "", p = "", 
                 b = m3.fit$r.squared, SE = "", p = "")

rownames(m123.r2) <- "R2"

m123.adjr2 <- cbind(b = m1.fit$adj.r.squared, SE = "", p = "", 
                    b = m2.fit$adj.r.squared, SE = "", p = "", 
                    b = m3.fit$adj.r.squared, SE = "", p = "")

rownames(m123.adjr2) <- "R2 adj."

m123.aic <- cbind(b = m1.fit$AIC, SE = "", p = "", 
                    b = m2.fit$AIC, SE = "", p = "", 
                    b = m3.fit$AIC, SE = "", p = "")

rownames(m123.aic) <- "AIC"

m123.lik <- cbind(b = m1.fit$logLik, SE = "", p = "", 
                    b = m2.fit$logLik, SE = "", p = "", 
                    b = m3.fit$logLik, SE = "", p = "")

rownames(m123.lik) <- "logLik"
Table 4.13: My own regression table
Model 1
Model 2
Model 3
b SE p b SE p b SE p
Constant 165.47 8.03 <.001 115.54 9.28 <.001 119.7 10.93 <.001
Gender: Male 13.77 9.11 0.135 -19.55 8.21 0.021 -9.16 8.65 0.299
Mass (kg) 1.01 0.12 <.001 0.88 0.15 <.001
Birth year -0.05 0.02 0.018
R2 0.03 0.6 0.71
R2 adj. 0.02 0.59 0.68
AIC 751.64 490.32 275.93
logLik -372.82 -241.16 -132.97
Note:
Likelihood ratio χ2-test Model 1 vs Model 2 = 47.4 (df = 1), p <.001, Model 2 vs Model 3 = 9.7 (df = 1), p = 0.002.

  1. The “starwars dataset is described in more detail here

  2. Note that, as the dataset contains missing values, one would need to add the na.rm = T command to explicitly instruct R to ignore missingness, if not, the result from the commands is simply NA indicating that there are missingness present in the data.