Chapter 2 Visualizing data

ds4psy: (2) Visualizing data

Apart from doing statistics and running simulations, the power to create clear and beautiful visualizations is one of the main reasons to use R. From the very start, R was conceived not just as a powerful calculator for statistics, but as a language that allows for all kinds of graphical expression. As a side-effect of this dual identity, its flexibility and long history, R now possesses a variety of modules and functions that allow drawing all kinds of graphs.

We start _exploring_ data by _visualizing_ it, using the **ggplot2** package.

Figure 2.1: We start exploring data by visualizing it, using the ggplot2 package.

This chapter gravitates around the ggplot2 package, which is widely used for creating scientific visualizations.13 Introducing visualizations with ggplot() at such an early point of your R-career is unusual. We do this here in the interest of rapid prototyping and hope to excite you about the potential of R by gaining quick visual insights into data. Such a swift and short foray into graphics is possible, as ggplot provides a concise language for graphical expression, as long as it is supplied with the right kinds of inputs. Nevertheless, some caveats are in order:

  • Introducing visualizations with ggplot() does not imply that other ways for creating graphs in R are deficient or inferior. R has powerful functions for creating graphs, and many people create beautiful graphs in R without ever using ggplot(). But as ggplot2 allows creating a large variety of graphs before digging deeper into the mysteries of R and other tidyverse packages, we can start using it without knowing much about the rest of R.

  • The functionality of the ggplot2 package extends far beyond this modest introduction. Today, ggplot() is an important pillar of the tidyverse (Wickham, 2017), but the package was developed prior to it. Hadley Wickham created the original ggplot package in 2005 (Wickham, 2016) to provide an R implementation of The Grammar of Graphics (Wilkinson, 2005), which develops an entire language and philosophy of data visualisation. As learning to use ggplot() — like R — is a journey, rather than a destination, we should not be surprised if some concepts remain somewhat obscure for a while. Fortunately, there is no need to understand all about ggplot() to create awesome graphs with it.

  • Using ggplot() assumes that the data to be plotted is in tabular form (a data frame or tibble) and formatted in specific ways (using factors and in so-called “long format”). At this point, we do not need to worry about this and just work with existing datasets that happen to be in the right shape.

Overall, this means that using ggplot2 is only one of many alternative ways for creating visualizations in R, can be complicated, and is not free of preconditions. To fully realize the potential of any graphical framework, we need to learn more about transforming data in subsequent chapters.14 Keeping these caveats in mind, it is amazing and motivating to see what we can do with some basic knowledge of ggplot().

References

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Retrieved from https://ggplot2.tidyverse.org

Wickham, H. (2017). tidyverse: Easily install and load the ’tidyverse’. Retrieved from https://CRAN.R-project.org/package=tidyverse

Wilkinson, L. (2005). The grammar of graphics (2nd edition). Springer.


  1. The current version of the ggplot package is ggplot2 (Wickham et al., 2019a), but the corresponding plotting function is still called ggplot(). We will try to write ggplot or ggplot2 when referring to the package and ggplot() to refer to the function, but trust that it will be clear what is meant from the context.

  2. For instance, we will learn how to slice and dice data into shapes suitable for plotting in Chapters 3 and 7.