Chapter 10 Attractive Output

Once you’ve gotten your data into the shape you want to be ready for analysis, you’re going to want to output the results. This chapter won’t walk you through how to do the analysis, it’ll just focus on the presentation of those results. You can use the default formatting that R uses, but that can be a little plain and difficult on the eyes. For instance, let’s take a look at summary statistics and a regression table for data on California schools.

CASchools <- read.csv("https://raw.githubusercontent.com/ejvanholm/DataProjects/master/CASchools.csv")
CASchools2 <- subset(CASchools, select=c("math", "income", "students", "english"))
summary(CASchools2)

##       math           income          students          english      
##  Min.   :605.4   Min.   : 5.335   Min.   :   81.0   Min.   : 0.000  
##  1st Qu.:639.4   1st Qu.:10.639   1st Qu.:  379.0   1st Qu.: 1.941  
##  Median :652.5   Median :13.728   Median :  950.5   Median : 8.778  
##  Mean   :653.3   Mean   :15.317   Mean   : 2628.8   Mean   :15.768  
##  3rd Qu.:665.9   3rd Qu.:17.629   3rd Qu.: 3008.0   3rd Qu.:22.970  
##  Max.   :709.5   Max.   :55.328   Max.   :27176.0   Max.   :85.540

summary(lm(math~income+students+english, data=CASchools))

## 
## Call:
## lm(formula = math ~ income + students + english, data = CASchools)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.804  -7.577   0.421   7.453  30.415 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.366e+02  1.590e+00 400.498   <2e-16 ***
## income       1.498e+00  8.262e-02  18.137   <2e-16 ***
## students     6.333e-05  1.553e-04   0.408    0.684    
## english     -4.060e-01  3.491e-02 -11.632   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.49 on 416 degrees of freedom
## Multiple R-squared:  0.6274, Adjusted R-squared:  0.6248 
## F-statistic: 233.5 on 3 and 416 DF,  p-value: < 2.2e-16

Not the most beautiful thing I’ve ever seen, but it’s okay and you’re probably getting used to it. It’s a lot of numbers, and a lot of information. Some of which you will probably use, other parts of which will be ignored in general. This chapter will walk you through producing that same general output using two packages in R that are designed to make results and output be a bit more presentable: pander and stargazer.

10.1 Pander

You’ll need to install the package pander and load it to use it below.

library(pander)

Pander automatically reformats your output when you wrap it around another command. It’s always being used in addition to whatever command you were already running.

Here is how it looks for summary statistics.

pander(summary(CASchools2))

math	income	students	english
Min. :605.4	Min. : 5.335	Min. : 81.0	Min. : 0.000
1st Qu.:639.4	1st Qu.:10.639	1st Qu.: 379.0	1st Qu.: 1.941
Median :652.5	Median :13.728	Median : 950.5	Median : 8.778
Mean :653.3	Mean :15.317	Mean : 2628.8	Mean :15.768
3rd Qu.:665.9	3rd Qu.:17.629	3rd Qu.: 3008.0	3rd Qu.:22.970
Max. :709.5	Max. :55.328	Max. :27176.0	Max. :85.540

That adds space to make the numbers a bit clearer. We can also save whatever we’re doing as an object before using pander.

sum.stats <- summary(CASchools2)
pander(sum.stats)

math	income	students	english
Min. :605.4	Min. : 5.335	Min. : 81.0	Min. : 0.000
1st Qu.:639.4	1st Qu.:10.639	1st Qu.: 379.0	1st Qu.: 1.941
Median :652.5	Median :13.728	Median : 950.5	Median : 8.778
Mean :653.3	Mean :15.317	Mean : 2628.8	Mean :15.768
3rd Qu.:665.9	3rd Qu.:17.629	3rd Qu.: 3008.0	3rd Qu.:22.970
Max. :709.5	Max. :55.328	Max. :27176.0	Max. :85.540

Here it is for a regression.

pander(summary(lm(math~income+students+english, data=CASchools)))

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	636.6	1.59	400.5	0
income	1.498	0.08262	18.14	1.374e-54
students	6.333e-05	0.0001553	0.4079	0.6836
english	-0.406	0.03491	-11.63	2.849e-27

Fitting linear model: math ~ income + students + english
Observations	Residual Std. Error	$R^2$	Adjusted $R^2$
420	11.49	0.6274	0.6248

It removes some of the metrics that were in the default output, but includes most of the key numbers we’d use in interpreting our model.

One benefit of pander is its simplicity. You just need to load it and give it your command or saved output to produce more attractive tables and figures. That’s particularly advantageous when you’re just including a quick look at your table, such as if you’re inserting a table to show what values are in your data.

table(CASchools$grades)

## 
## KK-06 KK-08 
##    61   359

pander(table(CASchools$grades))

KK-06	KK-08
61	359

But that simplicity means there is a lack of options to customize the output.

10.2 Stargazer

On the other hand, stargazer is loaded with options for customization. You’ll need to install and load it as a package as well.

library(stargazer)

You can find the reference manual for the stargazer package here, and we’ll walk through some of the more common customization options below.

The first thing you’ll want to do is set type=“text”. That’s not the default, so you’ll want to do it each time. There are other options for type which you might use if you’re creating the output to go on a website, but for using stargazer to copy something into a paper or a markdown text is the simplest option.

If you’re producing summary statistics, you can just give stargazer a data frame with all the variables you want included, like we did with CASchools.

stargazer(CASchools2, type="text")

## 
## ================================================================
## Statistic  N    Mean    St. Dev.   Min  Pctl(25) Pctl(75)  Max  
## ----------------------------------------------------------------
## math      420  653.343   18.754    605   639.4    665.8    710  
## income    420  15.317     7.226   5.335  10.639   17.629  55.328
## students  420 2,628.793 3,913.105  81     379     3,008   27,176
## english   420  15.768    18.286     0     1.9      23.0     86  
## ----------------------------------------------------------------

But we can also use stargazer to select the subset of variables we want included using the keep= option. We could also remove one or more variables with omit=, if that’s a quicker option.

stargazer(CASchools, type="text",
          keep=c("math", "income", "students", "english"))

## 
## ================================================================
## Statistic  N    Mean    St. Dev.   Min  Pctl(25) Pctl(75)  Max  
## ----------------------------------------------------------------
## students  420 2,628.793 3,913.105  81     379     3,008   27,176
## income    420  15.317     7.226   5.335  10.639   17.629  55.328
## english   420  15.768    18.286     0     1.9      23.0     86  
## math      420  653.343   18.754    605   639.4    665.8    710  
## ----------------------------------------------------------------

Stargazer adds summary statistics we don’t typically get, and we can select which ones to include as well. There are options for keep. summary .stat or omit.summary.stat depending on how we want to structure our list. I’ll remove the Min and Max below.

stargazer(CASchools, type="text",
          keep=c("math", "income", "students", "english"),
          omit.summary.stat = c("min", "max"))

## 
## ===================================================
## Statistic  N    Mean    St. Dev.  Pctl(25) Pctl(75)
## ---------------------------------------------------
## students  420 2,628.793 3,913.105   379     3,008  
## income    420  15.317     7.226    10.639   17.629 
## english   420  15.768    18.286     1.9      23.0  
## math      420  653.343   18.754    639.4    665.8  
## ---------------------------------------------------

Of course, you’ll need to know what each stat is called since stargazer will only recognize certain items. The list is copied from the help guide below.

We can also change the names of the variables. Variable names are the default, but the are often written to be short and identify the basics of the data, but sometimes we want to offer a clearer explanation for our readers. We can change the labels with covariate.labels and a list.

stargazer(CASchools, type="text",
          keep=c("math", "income", "students", "english"),
          omit.summary.stat = c("min", "max"),
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Math Test Scores"))

## 
## ==========================================================================
## Statistic                         N    Mean    St. Dev.  Pctl(25) Pctl(75)
## --------------------------------------------------------------------------
## Number of Students at School     420 2,628.793 3,913.105   379     3,008  
## Median Income of Parents         420  15.317     7.226    10.639   17.629 
## % of Non-Native English Speakers 420  15.768    18.286     1.9      23.0  
## Math Test Scores                 420  653.343   18.754    639.4    665.8  
## --------------------------------------------------------------------------

We can do a lot of the same things in building regression results using stargazer. However, we’ll need to give stargazer our saved output to generate a table. It’s important that you don’t wrap the regression command in summary before saving the object, in this case you just want to leave the lm() (or glm()) command alone.

We’ll start with the basic output before adding some customizations.

school.reg <- lm(math~income+students+english, data=CASchools)
stargazer(school.reg, type="text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                math            
## -----------------------------------------------
## income                       1.498***          
##                               (0.083)          
##                                                
## students                      0.0001           
##                              (0.0002)          
##                                                
## english                      -0.406***         
##                               (0.035)          
##                                                
## Constant                    636.628***         
##                               (1.590)          
##                                                
## -----------------------------------------------
## Observations                    420            
## R2                             0.627           
## Adjusted R2                    0.625           
## Residual Std. Error      11.488 (df = 416)     
## F Statistic          233.540*** (df = 3; 416)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Looks good. A bit more space, with the coefficients and model metrics a little more organized. Let’s change the name of the variables first. We can do that with the same option as we did above.

stargazer(school.reg, type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers") )

## 
## ============================================================
##                                      Dependent variable:    
##                                  ---------------------------
##                                             math            
## ------------------------------------------------------------
## Number of Students at School              1.498***          
##                                            (0.083)          
##                                                             
## Median Income of Parents                   0.0001           
##                                           (0.0002)          
##                                                             
## % of Non-Native English Speakers          -0.406***         
##                                            (0.035)          
##                                                             
## Constant                                 636.628***         
##                                            (1.590)          
##                                                             
## ------------------------------------------------------------
## Observations                                 420            
## R2                                          0.627           
## Adjusted R2                                 0.625           
## Residual Std. Error                   11.488 (df = 416)     
## F Statistic                       233.540*** (df = 3; 416)  
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

And let’s change the metrics we show at the bottom of the table with keep.stat (notice that it isn’t keep.summary.stat here), which would line up well with the discussion hereabout different metrics we can include.

stargazer(school.reg, type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
         keep.stat=c("n", "adj.rsq", "aic") )

## 
## ============================================================
##                                      Dependent variable:    
##                                  ---------------------------
##                                             math            
## ------------------------------------------------------------
## Number of Students at School              1.498***          
##                                            (0.083)          
##                                                             
## Median Income of Parents                   0.0001           
##                                           (0.0002)          
##                                                             
## % of Non-Native English Speakers          -0.406***         
##                                            (0.035)          
##                                                             
## Constant                                 636.628***         
##                                            (1.590)          
##                                                             
## ------------------------------------------------------------
## Observations                                 420            
## Adjusted R2                                 0.625           
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

I asked for the AIC to be included, but it wasn’t. That’s because it wasn’t automatically calculated for R for this model (some models it is, others not.) In that case I can add it manually by adding a line with add.lines=. It’s a bit more complicated, because with add.lines I want to add multiple entries across the columns, so I need to tell R I’m making a list, and then set up each set of entries as a separate group with c().

It’s also made tougher because I need to tell R what to calculate to include, so I’m first adding a label for the row “AIC”, then telling it to calculate the AIC to insert. I also have to tell it to round that number using round() so that it doesn’t show a 9-digit decimal. The 2 at the end tells R to put two decimal points at most.

stargazer(school.reg, type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
         keep.stat=c("n", "adj.rsq"),
         add.lines = list(c("AIC", round(AIC(school.reg),2))) )

## 
## ============================================================
##                                      Dependent variable:    
##                                  ---------------------------
##                                             math            
## ------------------------------------------------------------
## Number of Students at School              1.498***          
##                                            (0.083)          
##                                                             
## Median Income of Parents                   0.0001           
##                                           (0.0002)          
##                                                             
## % of Non-Native English Speakers          -0.406***         
##                                            (0.035)          
##                                                             
## Constant                                 636.628***         
##                                            (1.590)          
##                                                             
## ------------------------------------------------------------
## AIC                                        3248.6           
## Observations                                 420            
## Adjusted R2                                 0.625           
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

I’ll add a second note to try and make that a little clearer.

stargazer(school.reg, type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
         keep.stat=c("n", "adj.rsq"),
         add.lines = list(c("AIC", round(AIC(school.reg),2)),
                          c("Regression", "Yes"))
                            )

## 
## ============================================================
##                                      Dependent variable:    
##                                  ---------------------------
##                                             math            
## ------------------------------------------------------------
## Number of Students at School              1.498***          
##                                            (0.083)          
##                                                             
## Median Income of Parents                   0.0001           
##                                           (0.0002)          
##                                                             
## % of Non-Native English Speakers          -0.406***         
##                                            (0.035)          
##                                                             
## Constant                                 636.628***         
##                                            (1.590)          
##                                                             
## ------------------------------------------------------------
## AIC                                        3248.6           
## Regression                                   Yes            
## Observations                                 420            
## Adjusted R2                                 0.625           
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

You can see all the statistics related to models that are available below.

One of the big advantageous of using stargazer is the ability to include multiple regressions in a single table. Say, for instance, you’re looking at the effect of including difference variables in your model. We can provide stargazer multiple regression objects.

school.reg1 <- lm(math~income+students+english, data=CASchools)
school.reg2 <- lm(math~income+students+english+expenditure, data=CASchools)
school.reg3 <- lm(math~income+students+english+expenditure+calworks, data=CASchools)

stargazer(school.reg1, school.reg2, school.reg3, type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Total School Spending", "Percent Qualifying for CalWorks "),
         keep.stat=c("n", "adj.rsq"),
         add.lines = list(c("AIC", round(AIC(school.reg1),2), round(AIC(school.reg2),2), round(AIC(school.reg3),2))))

## 
## =================================================================
##                                        Dependent variable:       
##                                  --------------------------------
##                                                math              
##                                     (1)        (2)        (3)    
## -----------------------------------------------------------------
## Number of Students at School      1.498***   1.553***   1.146*** 
##                                   (0.083)    (0.087)    (0.095)  
##                                                                  
## Median Income of Parents           0.0001    0.00002     0.0001  
##                                   (0.0002)   (0.0002)   (0.0001) 
##                                                                  
## % of Non-Native English Speakers -0.406***  -0.400***  -0.361*** 
##                                   (0.035)    (0.035)    (0.033)  
##                                                                  
## Total School Spending                        -0.002*     0.0004  
##                                              (0.001)    (0.001)  
##                                                                  
## Percent Qualifying for CalWorks                        -0.462*** 
##                                                         (0.056)  
##                                                                  
## Constant                         636.628*** 645.349*** 645.144***
##                                   (1.590)    (4.842)    (4.497)  
##                                                                  
## -----------------------------------------------------------------
## AIC                                3248.6    3246.94    3185.84  
## Observations                        420        420        420    
## Adjusted R2                        0.625      0.627      0.678   
## =================================================================
## Note:                                 *p<0.1; **p<0.05; ***p<0.01

You can also do that if you have multiple regressions with different dependent variables. You can change the labels at the top of the column to identify what the dependent variable is in each regression.

school.reg1 <- lm(math~income+students+english, data=CASchools)
school.reg2 <- lm(read~income+students+english, data=CASchools)

stargazer(school.reg1, school.reg2,  type="text",
          covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Total School Spending", "Percent Qualifying for CalWorks "),
          dep.var.labels = c("Math", "Reading"),
         keep.stat=c("n", "adj.rsq"),
         add.lines = list(c("AIC", round(AIC(school.reg1),2), round(AIC(school.reg2),2) )) )

## 
## =============================================================
##                                      Dependent variable:     
##                                  ----------------------------
##                                       Math         Reading   
##                                       (1)            (2)     
## -------------------------------------------------------------
## Number of Students at School        1.498***      1.501***   
##                                     (0.083)        (0.074)   
##                                                              
## Median Income of Parents             0.0001        -0.0001   
##                                     (0.0002)      (0.0001)   
##                                                              
## % of Non-Native English Speakers   -0.406***      -0.569***  
##                                     (0.035)        (0.031)   
##                                                              
## Total School Spending              636.628***    641.224***  
##                                     (1.590)        (1.431)   
##                                                              
## -------------------------------------------------------------
## AIC                                  3248.6        3160.46   
## Observations                          420            420     
## Adjusted R2                          0.625          0.735    
## =============================================================
## Note:                             *p<0.1; **p<0.05; ***p<0.01