Chapter 10 Attractive Output
Once you’ve gotten your data into the shape you want to be ready for analysis, you’re going to want to output the results. This chapter won’t walk you through how to do the analysis, it’ll just focus on the presentation of those results. You can use the default formatting that R uses, but that can be a little plain and difficult on the eyes. For instance, let’s take a look at summary statistics and a regression table for data on California schools.
CASchools <- read.csv("https://raw.githubusercontent.com/ejvanholm/DataProjects/master/CASchools.csv")
CASchools2 <- subset(CASchools, select=c("math", "income", "students", "english"))
summary(CASchools2)
## math income students english
## Min. :605.4 Min. : 5.335 Min. : 81.0 Min. : 0.000
## 1st Qu.:639.4 1st Qu.:10.639 1st Qu.: 379.0 1st Qu.: 1.941
## Median :652.5 Median :13.728 Median : 950.5 Median : 8.778
## Mean :653.3 Mean :15.317 Mean : 2628.8 Mean :15.768
## 3rd Qu.:665.9 3rd Qu.:17.629 3rd Qu.: 3008.0 3rd Qu.:22.970
## Max. :709.5 Max. :55.328 Max. :27176.0 Max. :85.540
##
## Call:
## lm(formula = math ~ income + students + english, data = CASchools)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.804 -7.577 0.421 7.453 30.415
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.366e+02 1.590e+00 400.498 <2e-16 ***
## income 1.498e+00 8.262e-02 18.137 <2e-16 ***
## students 6.333e-05 1.553e-04 0.408 0.684
## english -4.060e-01 3.491e-02 -11.632 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.49 on 416 degrees of freedom
## Multiple R-squared: 0.6274, Adjusted R-squared: 0.6248
## F-statistic: 233.5 on 3 and 416 DF, p-value: < 2.2e-16
Not the most beautiful thing I’ve ever seen, but it’s okay and you’re probably getting used to it. It’s a lot of numbers, and a lot of information. Some of which you will probably use, other parts of which will be ignored in general. This chapter will walk you through producing that same general output using two packages in R that are designed to make results and output be a bit more presentable: pander and stargazer.
10.1 Pander
You’ll need to install the package pander and load it to use it below.
Pander automatically reformats your output when you wrap it around another command. It’s always being used in addition to whatever command you were already running.
Here is how it looks for summary statistics.
math | income | students | english |
---|---|---|---|
Min. :605.4 | Min. : 5.335 | Min. : 81.0 | Min. : 0.000 |
1st Qu.:639.4 | 1st Qu.:10.639 | 1st Qu.: 379.0 | 1st Qu.: 1.941 |
Median :652.5 | Median :13.728 | Median : 950.5 | Median : 8.778 |
Mean :653.3 | Mean :15.317 | Mean : 2628.8 | Mean :15.768 |
3rd Qu.:665.9 | 3rd Qu.:17.629 | 3rd Qu.: 3008.0 | 3rd Qu.:22.970 |
Max. :709.5 | Max. :55.328 | Max. :27176.0 | Max. :85.540 |
That adds space to make the numbers a bit clearer. We can also save whatever we’re doing as an object before using pander.
math | income | students | english |
---|---|---|---|
Min. :605.4 | Min. : 5.335 | Min. : 81.0 | Min. : 0.000 |
1st Qu.:639.4 | 1st Qu.:10.639 | 1st Qu.: 379.0 | 1st Qu.: 1.941 |
Median :652.5 | Median :13.728 | Median : 950.5 | Median : 8.778 |
Mean :653.3 | Mean :15.317 | Mean : 2628.8 | Mean :15.768 |
3rd Qu.:665.9 | 3rd Qu.:17.629 | 3rd Qu.: 3008.0 | 3rd Qu.:22.970 |
Max. :709.5 | Max. :55.328 | Max. :27176.0 | Max. :85.540 |
Here it is for a regression.
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 636.6 | 1.59 | 400.5 | 0 |
income | 1.498 | 0.08262 | 18.14 | 1.374e-54 |
students | 6.333e-05 | 0.0001553 | 0.4079 | 0.6836 |
english | -0.406 | 0.03491 | -11.63 | 2.849e-27 |
Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
---|---|---|---|
420 | 11.49 | 0.6274 | 0.6248 |
It removes some of the metrics that were in the default output, but includes most of the key numbers we’d use in interpreting our model.
One benefit of pander is its simplicity. You just need to load it and give it your command or saved output to produce more attractive tables and figures. That’s particularly advantageous when you’re just including a quick look at your table, such as if you’re inserting a table to show what values are in your data.
##
## KK-06 KK-08
## 61 359
KK-06 | KK-08 |
---|---|
61 | 359 |
But that simplicity means there is a lack of options to customize the output.
10.2 Stargazer
On the other hand, stargazer is loaded with options for customization. You’ll need to install and load it as a package as well.
You can find the reference manual for the stargazer package here, and we’ll walk through some of the more common customization options below.
The first thing you’ll want to do is set type=“text”. That’s not the default, so you’ll want to do it each time. There are other options for type which you might use if you’re creating the output to go on a website, but for using stargazer to copy something into a paper or a markdown text is the simplest option.
If you’re producing summary statistics, you can just give stargazer a data frame with all the variables you want included, like we did with CASchools.
##
## ================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## ----------------------------------------------------------------
## math 420 653.343 18.754 605 639.4 665.8 710
## income 420 15.317 7.226 5.335 10.639 17.629 55.328
## students 420 2,628.793 3,913.105 81 379 3,008 27,176
## english 420 15.768 18.286 0 1.9 23.0 86
## ----------------------------------------------------------------
But we can also use stargazer to select the subset of variables we want included using the keep= option. We could also remove one or more variables with omit=, if that’s a quicker option.
##
## ================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## ----------------------------------------------------------------
## students 420 2,628.793 3,913.105 81 379 3,008 27,176
## income 420 15.317 7.226 5.335 10.639 17.629 55.328
## english 420 15.768 18.286 0 1.9 23.0 86
## math 420 653.343 18.754 605 639.4 665.8 710
## ----------------------------------------------------------------
Stargazer adds summary statistics we don’t typically get, and we can select which ones to include as well. There are options for keep. summary .stat or omit.summary.stat depending on how we want to structure our list. I’ll remove the Min and Max below.
stargazer(CASchools, type="text",
keep=c("math", "income", "students", "english"),
omit.summary.stat = c("min", "max"))
##
## ===================================================
## Statistic N Mean St. Dev. Pctl(25) Pctl(75)
## ---------------------------------------------------
## students 420 2,628.793 3,913.105 379 3,008
## income 420 15.317 7.226 10.639 17.629
## english 420 15.768 18.286 1.9 23.0
## math 420 653.343 18.754 639.4 665.8
## ---------------------------------------------------
Of course, you’ll need to know what each stat is called since stargazer will only recognize certain items. The list is copied from the help guide below.
We can also change the names of the variables. Variable names are the default, but the are often written to be short and identify the basics of the data, but sometimes we want to offer a clearer explanation for our readers. We can change the labels with covariate.labels and a list.
stargazer(CASchools, type="text",
keep=c("math", "income", "students", "english"),
omit.summary.stat = c("min", "max"),
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Math Test Scores"))
##
## ==========================================================================
## Statistic N Mean St. Dev. Pctl(25) Pctl(75)
## --------------------------------------------------------------------------
## Number of Students at School 420 2,628.793 3,913.105 379 3,008
## Median Income of Parents 420 15.317 7.226 10.639 17.629
## % of Non-Native English Speakers 420 15.768 18.286 1.9 23.0
## Math Test Scores 420 653.343 18.754 639.4 665.8
## --------------------------------------------------------------------------
We can do a lot of the same things in building regression results using stargazer. However, we’ll need to give stargazer our saved output to generate a table. It’s important that you don’t wrap the regression command in summary before saving the object, in this case you just want to leave the lm() (or glm()) command alone.
We’ll start with the basic output before adding some customizations.
##
## ===============================================
## Dependent variable:
## ---------------------------
## math
## -----------------------------------------------
## income 1.498***
## (0.083)
##
## students 0.0001
## (0.0002)
##
## english -0.406***
## (0.035)
##
## Constant 636.628***
## (1.590)
##
## -----------------------------------------------
## Observations 420
## R2 0.627
## Adjusted R2 0.625
## Residual Std. Error 11.488 (df = 416)
## F Statistic 233.540*** (df = 3; 416)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looks good. A bit more space, with the coefficients and model metrics a little more organized. Let’s change the name of the variables first. We can do that with the same option as we did above.
stargazer(school.reg, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers") )
##
## ============================================================
## Dependent variable:
## ---------------------------
## math
## ------------------------------------------------------------
## Number of Students at School 1.498***
## (0.083)
##
## Median Income of Parents 0.0001
## (0.0002)
##
## % of Non-Native English Speakers -0.406***
## (0.035)
##
## Constant 636.628***
## (1.590)
##
## ------------------------------------------------------------
## Observations 420
## R2 0.627
## Adjusted R2 0.625
## Residual Std. Error 11.488 (df = 416)
## F Statistic 233.540*** (df = 3; 416)
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
And let’s change the metrics we show at the bottom of the table with keep.stat (notice that it isn’t keep.summary.stat here), which would line up well with the discussion hereabout different metrics we can include.
stargazer(school.reg, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
keep.stat=c("n", "adj.rsq", "aic") )
##
## ============================================================
## Dependent variable:
## ---------------------------
## math
## ------------------------------------------------------------
## Number of Students at School 1.498***
## (0.083)
##
## Median Income of Parents 0.0001
## (0.0002)
##
## % of Non-Native English Speakers -0.406***
## (0.035)
##
## Constant 636.628***
## (1.590)
##
## ------------------------------------------------------------
## Observations 420
## Adjusted R2 0.625
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
I asked for the AIC to be included, but it wasn’t. That’s because it wasn’t automatically calculated for R for this model (some models it is, others not.) In that case I can add it manually by adding a line with add.lines=. It’s a bit more complicated, because with add.lines I want to add multiple entries across the columns, so I need to tell R I’m making a list, and then set up each set of entries as a separate group with c().
It’s also made tougher because I need to tell R what to calculate to include, so I’m first adding a label for the row “AIC”, then telling it to calculate the AIC to insert. I also have to tell it to round that number using round() so that it doesn’t show a 9-digit decimal. The 2 at the end tells R to put two decimal points at most.
stargazer(school.reg, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
keep.stat=c("n", "adj.rsq"),
add.lines = list(c("AIC", round(AIC(school.reg),2))) )
##
## ============================================================
## Dependent variable:
## ---------------------------
## math
## ------------------------------------------------------------
## Number of Students at School 1.498***
## (0.083)
##
## Median Income of Parents 0.0001
## (0.0002)
##
## % of Non-Native English Speakers -0.406***
## (0.035)
##
## Constant 636.628***
## (1.590)
##
## ------------------------------------------------------------
## AIC 3248.6
## Observations 420
## Adjusted R2 0.625
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
I’ll add a second note to try and make that a little clearer.
stargazer(school.reg, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers"),
keep.stat=c("n", "adj.rsq"),
add.lines = list(c("AIC", round(AIC(school.reg),2)),
c("Regression", "Yes"))
)
##
## ============================================================
## Dependent variable:
## ---------------------------
## math
## ------------------------------------------------------------
## Number of Students at School 1.498***
## (0.083)
##
## Median Income of Parents 0.0001
## (0.0002)
##
## % of Non-Native English Speakers -0.406***
## (0.035)
##
## Constant 636.628***
## (1.590)
##
## ------------------------------------------------------------
## AIC 3248.6
## Regression Yes
## Observations 420
## Adjusted R2 0.625
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
You can see all the statistics related to models that are available below.
One of the big advantageous of using stargazer is the ability to include multiple regressions in a single table. Say, for instance, you’re looking at the effect of including difference variables in your model. We can provide stargazer multiple regression objects.
school.reg1 <- lm(math~income+students+english, data=CASchools)
school.reg2 <- lm(math~income+students+english+expenditure, data=CASchools)
school.reg3 <- lm(math~income+students+english+expenditure+calworks, data=CASchools)
stargazer(school.reg1, school.reg2, school.reg3, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Total School Spending", "Percent Qualifying for CalWorks "),
keep.stat=c("n", "adj.rsq"),
add.lines = list(c("AIC", round(AIC(school.reg1),2), round(AIC(school.reg2),2), round(AIC(school.reg3),2))))
##
## =================================================================
## Dependent variable:
## --------------------------------
## math
## (1) (2) (3)
## -----------------------------------------------------------------
## Number of Students at School 1.498*** 1.553*** 1.146***
## (0.083) (0.087) (0.095)
##
## Median Income of Parents 0.0001 0.00002 0.0001
## (0.0002) (0.0002) (0.0001)
##
## % of Non-Native English Speakers -0.406*** -0.400*** -0.361***
## (0.035) (0.035) (0.033)
##
## Total School Spending -0.002* 0.0004
## (0.001) (0.001)
##
## Percent Qualifying for CalWorks -0.462***
## (0.056)
##
## Constant 636.628*** 645.349*** 645.144***
## (1.590) (4.842) (4.497)
##
## -----------------------------------------------------------------
## AIC 3248.6 3246.94 3185.84
## Observations 420 420 420
## Adjusted R2 0.625 0.627 0.678
## =================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
You can also do that if you have multiple regressions with different dependent variables. You can change the labels at the top of the column to identify what the dependent variable is in each regression.
school.reg1 <- lm(math~income+students+english, data=CASchools)
school.reg2 <- lm(read~income+students+english, data=CASchools)
stargazer(school.reg1, school.reg2, type="text",
covariate.labels = c("Number of Students at School", "Median Income of Parents", "% of Non-Native English Speakers", "Total School Spending", "Percent Qualifying for CalWorks "),
dep.var.labels = c("Math", "Reading"),
keep.stat=c("n", "adj.rsq"),
add.lines = list(c("AIC", round(AIC(school.reg1),2), round(AIC(school.reg2),2) )) )
##
## =============================================================
## Dependent variable:
## ----------------------------
## Math Reading
## (1) (2)
## -------------------------------------------------------------
## Number of Students at School 1.498*** 1.501***
## (0.083) (0.074)
##
## Median Income of Parents 0.0001 -0.0001
## (0.0002) (0.0001)
##
## % of Non-Native English Speakers -0.406*** -0.569***
## (0.035) (0.031)
##
## Total School Spending 636.628*** 641.224***
## (1.590) (1.431)
##
## -------------------------------------------------------------
## AIC 3248.6 3160.46
## Observations 420 420
## Adjusted R2 0.625 0.735
## =============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01