Chapter 12 Reports
If you have ever written a report, you are probably familiar with the process of preparing your figures in some software, say R, and then copy-pasting into your text editor, say MS Word. While very popular, this process is both tedious, and plain painful if your data has changed and you need to update the report. Wouldn’t it be nice if you could produce figures and numbers from within the text of the report, and everything else would be automated? It turns out it is possible. There are actually several systems in R that allow this. We start with a brief review.
Sweave: LaTeX is a markup language that compiles to Tex programs that compile, in turn, to documents (typically PS or PDFs). If you never heard of it, it may be because you were born the the MS Windows+MS Word era. You should know, however, that LaTeX was there much earlier, when computers were mainframes with text-only graphic devices. You should also know that LaTeX is still very popular (in some communities) due to its very rich markup syntax, and beautiful output. Sweave (Leisch 2002) is a compiler for LaTeX that allows you do insert R commands in the LaTeX source file, and get the result as part of the outputted PDF. It’s name suggests just that: it allows to weave S22 output into the document, thus, Sweave.
knitr: Markdown is a text editing syntax that, unlike LaTeX, is aimed to be human-readable, but also compilable by a machine. If you ever tried to read HTML or LaTeX source files, you may understand why human-readability is a desirable property. There are many markdown compilers. One of the most popular is Pandoc, written by the Berkeley philosopher(!) Jon MacFarlane. The availability of Pandoc gave Yihui Xie, a name to remember, the idea that it is time for Sweave to evolve. Yihui thus wrote knitr (Xie 2015), which allows to write human readable text in Rmarkdown, a superset of markdown, compile it with R and the compile it with Pandoc. Because Pandoc can compile to PDF, but also to HTML, and DOCX, among others, this means that you can write in Rmarkdown, and get output in almost all text formats out there.
bookdown: Bookdown (Xie 2016) is an evolution of knitr, also written by Yihui Xie, now working for RStudio. The text you are now reading was actually written in bookdown. It deals with the particular needs of writing large documents, and cross referencing in particular (which is very challenging if you want the text to be human readable).
Shiny: Shiny is essentially a framework for quick web-development. It includes (i) an abstraction layer that specifies the layout of a web-site which is our report, (ii) the command to start a web server to deliver the site. For more on Shiny see Chang et al. (2017).
12.1 knitr
12.1.1 Installation
To run knitr you will need to install the package.
It is also recommended that you use it within RStudio (version>0.96), where you can easily create a new .Rmd
file.
12.1.2 Pandoc Markdown
Because knitr builds upon Pandoc markdown, here is a simple example of markdown text, to be used in a .Rmd
file, which can be created using the File-> New File -> R Markdown menu of RStudio.
Underscores or asterisks for _italics1_
and *italics2*
return italics1 and italics2.
Double underscores or asterisks for __bold1__
and **bold2**
return bold1 and bold2.
Subscripts are enclosed in tildes, like~this~
(likethis), and superscripts are enclosed in carets like^this^
(likethis).
For links use [text](link)
, like [my site](www.john-ros.com)
.
An image is the same as a link, starting with an exclamation, like this ![image caption](image path)
.
An itemized list simply starts with hyphens preceeded by a blank line (don’t forget that!):
- bullet
- bullet
- second level bullet
- second level bullet
Compiles into:
- bullet
- bullet
- second level bullet
- second level bullet
An enumerated list starts with an arbitrary number:
1. number
1. number
1. second level number
1. second level number
Compiles into:
- number
- number
- second level number
- second level number
For more on markdown see https://bookdown.org/yihui/bookdown/markdown-syntax.html.
12.1.3 Rmarkdown
Rmarkdown, is an extension of markdown due to RStudio, that allows to incorporate R expressions in the text, that will be evaluated at the time of compilation, and the output automatically inserted in the outputted text.
The output can be a .PDF
, .DOCX
, .HTML
or others, thanks to the power of Pandoc.
The start of a code chunk is indicated by three backticks and the end of a code chunk is indicated by three backticks. Here is an example.
```{r eval=FALSE}
rnorm(10)
```
This chunk will compile to the following output (after setting eval=FALSE
to eval=TRUE
):
## [1] -1.191659233 -0.008735042 -0.251447395 0.294509886 1.545818175
## [6] 0.076847920 -0.279008173 1.056446506 0.103559259 1.823610224
Things to note:
- The evaluated expression is added in a chunk of highlighted text, before the R output.
- The output is prefixed with
##
. - The
eval=
argument is not required, since it is set toeval=TRUE
by default. It does demonstrate how to set the options of the code chunk.
In the same way, we may add a plot:
```{r eval=FALSE}
plot(rnorm(10))
```
which compiles into
Some useful code chunk options include:
eval=FALSE
: to return code only, without output.echo=FALSE
: to return output, without code.cache=
: to save results so that future compilations are faster.results='hide'
: to plot figures, without text output.collapse=TRUE
: if you want the whole output after the whole code, and not interleaved.warning=FALSE
: to supress watning. The same formessage=FALSE
, anderror=FALSE
.
You can also call r expressions inline.
This is done with a single tick and the r
argument.
For instance:
`r rnorm(1)`
is a random Gaussian will output
-1.6490174 is a random Gaussian.
12.1.4 BibTex
BibTex is both a file format and a compiler. The bibtex compiler links documents to a reference database stored in the .bib
file format.
Bibtex is typically associated with Tex and LaTex typesetting, but it also operates within the markdown pipeline.
Just store your references in a .bib
file, add a bibliography: yourFile.bib
in the YML preamble of your Rmarkdown file, and call your references from the Rmarkdown text using @referencekey
.
Rmarkdow will take care of creating the bibliography, and linking to it from the text.
12.1.5 Compiling
Once you have your .Rmd
file written in RMarkdown, knitr will take care of the compilation for you.
You can call the knitr::knitr
function directly from some .R
file, or more conveniently, use the RStudio (0.96) Knit button above the text editing window.
The location of the output file will be presented in the console.
12.2 bookdown
As previously stated, bookdown is an extension of knitr intended for documents more complicated than simple reports– such as books. Just like knitr, the writing is done in RMarkdown. Being an extension of knitr, bookdown does allow some markdowns that are not supported by other compilers. In particular, it has a more powerful cross referencing system.
12.3 Shiny
Shiny (Chang et al. 2017) is different than the previous systems, because it sets up an interactive web-site, and not a static file. The power of Shiny is that the layout of the web-site, and the settings of the web-server, is made with several simple R commands, with no need for web-programming. Once you have your app up and running, you can setup your own Shiny server on the web, or publish it via Shinyapps.io. The freemium versions of the service can deal with a small amount of traffic. If you expect a lot of traffic, you will probably need the paid versions.
12.3.1 Installation
To setup your first Shiny app, you will need the shiny package. You will probably want RStudio, which facilitates the process.
Once installed, you can run an example app to get the feel of it.
Remember to press the Stop button in RStudio to stop the web-server, and get back to RStudio.
12.3.2 The Basics of Shiny
Every Shiny app has two main building blocks.
- A user interface, specified via the
ui.R
file in the app’s directory. - A server side, specified via the
server.R
file, in the app’s directory.
You can run the app via the RunApp button in the RStudio interface, of by calling the app’s directory with the shinyApp
or runApp
functions– the former designed for single-app projects, and the latter, for multiple app projects.
The site’s layout, is specified in the ui.R
file using one of the layout functions.
For instance, the function sidebarLayout
, as the name suggest, will create a sidebar.
More layouts are detailed in the layout guide.
The active elements in the UI, that control your report, are known as widgets.
Each widget will have a unique inputId
so that it’s values can be sent from the UI to the server.
More about widgets, in the widget gallery.
The inputId
on the UI are mapped to input
arguments on the server side.
The value of the mytext
inputId
can be queried by the server using input$mytext
.
These are called reactive values.
The way the server “listens” to the UI, is governed by a set of functions that must wrap the input
object.
These are the observe
, reactive
, and reactive*
class of functions.
With observe
the server will get triggered when any of the reactive values change.
With observeEvent
the server will only be triggered by specified reactive values.
Using observe
is easier, and observeEvent
is more prudent programming.
A reactive
function is a function that gets triggered when a reactive element changes.
It is defined on the server side, and reside within an observe
function.
We now analyze the 1_Hello
app using these ideas.
Here is the ui.R
file.
library(shiny)
shinyUI(fluidPage(
titlePanel("Hello Shiny!"),
sidebarLayout(
sidebarPanel(
sliderInput(inputId = "bins",
label = "Number of bins:",
min = 1,
max = 50,
value = 30)
),
mainPanel(
plotOutput(outputId = "distPlot")
)
)
))
Here is the server.R
file:
library(shiny)
shinyServer(function(input, output) {
output$distPlot <- renderPlot({
x <- faithful[, 2] # Old Faithful Geyser data
bins <- seq(min(x), max(x), length.out = input$bins + 1)
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
})
Things to note:
ShinyUI
is a (deprecated) wrapper for the UI.fluidPage
ensures that the proportions of the elements adapt to the window side, thus, are fluid.- The building blocks of the layout are a title, and the body. The title is governed by
titlePanel
, and the body is governed bysidebarLayout
. ThesidebarLayout
includes thesidebarPanel
to control the sidebar, and themainPanel
for the main panel. sliderInput
calls a widget with a slider. ItsinputId
isbins
, which is later used by the server within therenderPlot
reactive function.plotOutput
specifies that the content of themainPanel
is a plot (textOutput
for text). This expectation is satisfied on the server side with therenderPlot
function (renderText
).shinyServer
is a (deprecated) wrapper function for the server.- The server runs a function with an
input
and anoutput
. The elements ofinput
are theinputId
s from the UI. The elements of theoutput
will be called by the UI using theiroutputId
.
This is the output.
Here is another example, taken from the RStudio Shiny examples.
ui.R
:
library(shiny)
fluidPage(
titlePanel("Tabsets"),
sidebarLayout(
sidebarPanel(
radioButtons(inputId = "dist",
label = "Distribution type:",
c("Normal" = "norm",
"Uniform" = "unif",
"Log-normal" = "lnorm",
"Exponential" = "exp")),
br(), # add a break in the HTML page.
sliderInput(inputId = "n",
label = "Number of observations:",
value = 500,
min = 1,
max = 1000)
),
mainPanel(
tabsetPanel(type = "tabs",
tabPanel(title = "Plot", plotOutput(outputId = "plot")),
tabPanel(title = "Summary", verbatimTextOutput(outputId = "summary")),
tabPanel(title = "Table", tableOutput(outputId = "table"))
)
)
)
)
server.R
:
library(shiny)
# Define server logic for random distribution application
function(input, output) {
data <- reactive({
dist <- switch(input$dist,
norm = rnorm,
unif = runif,
lnorm = rlnorm,
exp = rexp,
rnorm)
dist(input$n)
})
output$plot <- renderPlot({
dist <- input$dist
n <- input$n
hist(data(), main=paste('r', dist, '(', n, ')', sep=''))
})
output$summary <- renderPrint({
summary(data())
})
output$table <- renderTable({
data.frame(x=data())
})
}
Things to note:
- We reused the
sidebarLayout
. - As the name suggests,
radioButtons
is a widget that produces radio buttons, above thesliderInput
widget. Note the differentinputId
s. - Different widgets are separated in
sidebarPanel
by commas. br()
produces extra vertical spacing (break).tabsetPanel
produces tabs in the main output panel.tabPanel
governs the content of each panel. Notice the use of various output functions (plotOutput
,verbatimTextOutput
,tableOutput
) with correspondingoutputId
s.- In
server.R
we see the usualfunction(input,output)
. - The
reactive
function tells the server the trigger the function wheneverinput
changes. - The
output
object is constructed outside thereactive
function. See how the elements ofoutput
correspond to theoutputId
s in the UI.
This is the output:
12.3.3 Beyond the Basics
Now that we have seen the basics, we may consider extensions to the basic report.
12.3.3.1 Widgets
actionButton
Action Button.checkboxGroupInput
A group of check boxes.checkboxInput
A single check box.dateInput
A calendar to aid date selection.dateRangeInput
A pair of calendars for selecting a date range.fileInput
A file upload control wizard.helpText
Help text that can be added to an input form.numericInput
A field to enter numbers.radioButtons
A set of radio buttons.selectInput
A box with choices to select from.sliderInput
A slider bar.submitButton
A submit button.textInput
A field to enter text.
See examples here.
12.3.3.2 Output Elements
The ui.R
output types.
htmlOutput
raw HTML.imageOutput
image.plotOutput
plot.tableOutput
table.textOutput
text.uiOutput
raw HTML.verbatimTextOutput
text.
The corresponding server.R
renderers.
renderImage
images (saved as a link to a source file).renderPlot
plots.renderPrint
any printed output.renderTable
data frame, matrix, other table like structures.renderText
character strings.renderUI
a Shiny tag object or HTML.
Your Shiny app can use any R object. The things to remember:
- The working directory of the app is the location of
server.R
. - The code before
shinyServer
is run only once. - The code inside `
shinyServer
is run whenever a reactive is triggered, and may thus slow things. To keep learning, see the RStudio’s tutorial, and the Biblipgraphic notes herein. ### shinydashboard A template for Shiny to give it s modern look. ## flexdashboard If you want to quickly write an interactive dashboard, which is simple enough to be a static HTML file and does not need an HTML server, then Shiny may be an overkill. With flexdashboard you can write your dashboard a single .Rmd file, which will generate an interactive dashboard as a static HTML file. See [http://rmarkdown.rstudio.com/flexdashboard/] for more info. ## Bibliographic Notes For RMarkdown see here. For everything on knitr see Yihui’s blog, or the book Xie (2015). For a bookdown manual, see Xie (2016). For a Shiny manual, see Chang et al. (2017), the RStudio tutorial, or Hadley’s Book. For compunding Plotly’s interactive graphics, with Shiny sites, see here. Video tutorials are available here. ## Practice Yourself
- Generate a report using knitr with your name as title, and a scatter plot of two random variables in the body. Save it as PDF, DOCX, and HTML.
- Recall that this book is written in bookdown, which is a superset of knitr. Go to the source .Rmd file of the first chapter, and parse it in your head: (https://raw.githubusercontent.com/johnros/Rcourse/master/02-r-basics.Rmd)
Allard, Denis. 2013. “J.-P. Chilès, P. Delfiner: Geostatistics: Modeling Spatial Uncertainty.” Springer.
Arlot, Sylvain, Alain Celisse, and others. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4: 40–79.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2017. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.
Christakos, George. 2000. Modern Spatiotemporal Geostatistics. Vol. 6. Oxford University Press.
Conway, Drew, and John White. 2012. Machine Learning for Hackers. " O’Reilly Media, Inc.".
Cressie, Noel, and Christopher K Wikle. 2015. Statistics for Spatio-Temporal Data. John Wiley & Sons.
Diggle, Peter J, JA Tawn, and RA Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3): 299–350.
Foster, Dean P, and Robert A Stine. 2004. “Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy.” Journal of the American Statistical Association 99 (466): 303–13.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer series in statistics Springer, Berlin.
Graham, RL. 1988. “Isometric Embeddings of Graphs.” Selected Topics in Graph Theory 3: 133–50.
Greene, William H. 2003. Econometric Analysis. Pearson Education India.
Hotelling, Harold. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24 (6): 417.
Izenman, Alan Julian. 2008. “Modern Multivariate Statistical Techniques.” Regression, Classification and Manifold Learning.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 6. Springer.
Kuhn, Max, and others. 2008. “Building Predictive Models in R Using the Caret Package.” Journal of Statistical Software 28 (5): 1–26.
Lantz, Brett. 2013. Machine Learning with R. Packt Publishing Ltd.
Leisch, Friedrich. 2002. “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In Compstat, 575–80. Springer.
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press.
McCullagh, Peter. 1984. “Generalized Linear Models.” European Journal of Operational Research 16 (3): 285–92.
Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. MIT press.
Pearson, Karl. 1901. “LIII. On Lines and Planes of Closest Fit to Systems of Points in Space.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11): 559–72.
Pinero, Jose, and Douglas Bates. 2000. “Mixed-Effects Models in S and S-Plus (Statistics and Computing).” Springer, New York.
Rabinowicz, Assaf, and Saharon Rosset. 2018. “Assessing Prediction Error at Interpolation and Extrapolation Points.” arXiv Preprint arXiv:1802.00996.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ripley, Brian D. 2007. Pattern Recognition and Neural Networks. Cambridge university press.
Robinson, George K. 1991. “That Blup Is a Good Thing: The Estimation of Random Effects.” Statistical Science, 15–32.
Rosset, Saharon, and Ryan J Tibshirani. 2018. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association, nos. just-accepted.
Sammut, Claude, and Geoffrey I Webb. 2011. Encyclopedia of Machine Learning. Springer Science & Business Media.
Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer. http://lmdvr.r-forge.r-project.org.
Searle, Shayle R, George Casella, and Charles E McCulloch. 2009. Variance Components. Vol. 391. John Wiley & Sons.
Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press.
Shawe-Taylor, John, and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge university press.
Small, Christopher G. 1990. “A Survey of Multidimensional Medians.” International Statistical Review/Revue Internationale de Statistique, 263–77.
Tukey, John W. 1977. Exploratory Data Analysis. Reading, Mass.
Vapnik, Vladimir. 2013. The Nature of Statistical Learning Theory. Springer science & business media.
Venables, William N, and Brian D Ripley. 2013. Modern Applied Statistics with S-Plus. Springer Science & Business Media.
Venables, William N, David M Smith, R Development Core Team, and others. 2004. “An Introduction to R.” Network Theory Limited.
Weiss, Robert E. 2005. Modeling Longitudinal Data. Springer Science & Business Media.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
———. 2014. Advanced R. CRC Press.
Wilcox, Rand R. 2011. Introduction to Robust Estimation and Hypothesis Testing. Academic Press.
Wilkinson, GN, and CE Rogers. 1973. “Symbolic Description of Factorial Models for Analysis of Variance.” Applied Statistics, 392–99.
Wilkinson, Leland. 2006. The Grammar of Graphics. Springer Science & Business Media.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. Vol. 29. CRC Press.
———. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. CRC Press.
S and S-Plus used to save objects on disk. Working from RAM has advantages and disadvantages. More on this in Chapter ??.↩︎
Taken from http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html↩︎
R uses a three valued logic where a missing value (NA) is neither TRUE, nor FALSE.↩︎
This is a classical functional programming paradigm. If you want an object oriented flavor of R programming, see Hadley’s Advanced R book.↩︎
More formally, this is called Lexical Scoping.↩︎
The “response” is also know as the “dependent” variable in the statistical literature, or the “labels” in the machine learning literature.↩︎
The “factors” are also known as the “independent variable”, or “the design”, in the statistical literature, and the “features”, or “attributes” in the machine learning literature.↩︎
The “error term” is also known as the “noise”, or the “common causes of variability”.↩︎
You may philosophize if the measurement error is a mere instance of unmodeled factors or not, but this has no real implication for our purposes.↩︎
By “computed” we mean what statisticians call “fitted”, or “estimated”, and computer scientists call “learned”.↩︎
Sometimes known as the Root Mean Squared Error (RMSE).↩︎
The example is taken from http://rtutorialseries.blogspot.co.il/2011/02/r-tutorial-series-two-way-anova-with.html↩︎
Do not confuse generalized linear models with non-linear regression, or generalized least squares. These are different things, that we do not discuss.↩︎
Taken from http://www.theanalysisfactor.com/generalized-linear-models-in-r-part-6-poisson-regression-count-variables/↩︎
Think: why bother treating the
Batch
effect as noise? Should we now just subtractBatch
effects? This is not a trick question.↩︎It is even a subset of the Hilbert space, itself a subset of the space of all functions.↩︎
Example taken from https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/ch6.html↩︎
You are probably used to thinking of the dimension of linear spaces. We will not rigorously define what is the dimension of a manifold, but you may think of it as the number of free coordinates needed to navigate along the manifold.↩︎
Then again, it is possible that the true distances are the white matter fibers connecting going within the cortex, in which case, Euclidean distances are more appropriate than geodesic distances. We put that aside for now.↩︎
Recall, S was the original software from which R evolved.↩︎
References
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2017. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.
Leisch, Friedrich. 2002. “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In Compstat, 575–80. Springer.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. Vol. 29. CRC Press.
Xie, Yihui. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. CRC Press.