Chapter 2 Visualizing data
Apart from doing statistics and running simulations, the power to create clear and beautiful visualizations is one of the main reasons for using R. From the very start, R was conceived not just as a powerful machine for statistics, but as a language that allows for all kinds of graphical expression. As a side-effect of this dual identity, its flexibility and long history, R now possesses a variety of modules and functions that allow creating all kinds of graphs.
This chapter gravitates around the ggplot2 package (Wickham, Chang, et al., 2020), which is widely used for creating scientific visualizations.21
Introducing visualizations with
ggplot() at such an early point of your R-career is unusual. We do this here in the interest of rapid prototyping and hope to excite you about the potential of R by gaining quick visual insights into data.
Such a swift and short foray into graphics is possible, as ggplot provides a concise language for graphical expression, as long as it is supplied with the right kinds of inputs. Nevertheless, some caveats are in order:
Introducing visualizations with
ggplot()does not imply that other ways for creating graphs in R are deficient or inferior. R has powerful functions for creating graphs, and many people create beautiful graphs in R without ever using
ggplot(). But as ggplot2 allows creating a large variety of graphs before digging deeper into the mysteries of R and other tidyverse packages, we can start using it without knowing much about the rest of R.
The functionality of the ggplot2 package extends far beyond this modest introduction. Today,
ggplot()is an important pillar of the tidyverse (Wickham, 2019c), but the package was developed prior to it. Hadley Wickham created the original ggplot package in 2005 (Wickham, 2016) to provide an R implementation of The Grammar of Graphics (Wilkinson, 2005), which develops an entire language and philosophy of data visualisation. As learning to use
ggplot()— like R — is a journey, rather than a destination, we should not be surprised if some concepts remain somewhat obscure for a while. Fortunately, there is no need to understand all about
ggplot()to create awesome graphs with it.
ggplot()assumes that the data to be plotted is in tabular form (a data frame or tibble) and formatted in specific ways (using factors and in so-called “long format”). At this point, we do not need to worry about this and just work with existing datasets that happen to be in the right shape.
Overall, this means that using ggplot2 is only one of many alternative ways for creating visualizations in R,
can be complicated, and is not free of preconditions. To fully realize the potential of any graphical framework, we need to learn more about transforming data in subsequent chapters.22
Keeping these caveats in mind, it is amazing and motivating to see what we can do with some basic knowledge of
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Retrieved from https://ggplot2.tidyverse.org
Wickham, H. (2019c). tidyverse: Easily install and load the ’tidyverse’. Retrieved from https://CRAN.R-project.org/package=tidyverse
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., … Dunnington, D. (2020). ggplot2: Create elegant data visualisations using the grammar of graphics. Retrieved from https://CRAN.R-project.org/package=ggplot2
Wilkinson, L. (2005). The grammar of graphics (2nd edition). Springer.
The current version of the ggplot package is ggplot2 (Wickham, Chang, et al., 2020), but the corresponding plotting function is still called
ggplot(). We will try to write ggplot or ggplot2 when referring to the package and
ggplot()to refer to the function, but trust that it will be clear what is meant from the context.↩