Visualizations can be good, bad, or anything in between. The success of any particular visualization depends on its ecological rationality: On the one hand, the type of graph chosen and its aesthetic features need to fit to the data that is being shown. On the other hand, the message to be conveyed and the audience that is to view and interpret the graph need to be considered.
Plotting in R
Overall, the base R plotting system is very flexible, powerful, and offers a high degree of control over plotting. But as the graphical functions of R have been developed over a long period of time, they are quite heterogeneous. The main reason for this heterogeneity is that the basic R plotting system simultaneously pursues two distinct strategies:
On the one hand, there are many pre-packaged graphical commands — like
boxplot()— that combine several aspects and provide options for quickly generating some particular type of visualization.
On the other hand, there are many low-level plotting functions for designing new visualizations from scratch, or for modifying existing plots.
As the latter functions often need to be combined with the former, the combination of both strategies increases complexity and frequently confuses R novices.
An alternative to a range of different functions for creating different visualizations would be a unified system that generates many different types of visualizations from a common set of principles. Remember the Swiss knife analogy invoked in Chapter 2 on Basic R concepts and commands: Rather than using a range of specialized tools, someone could design a toolbox that provides many different functions in a systematic fashion (e.g., by sharing the same arguments and command syntax for different visualizations). Such a toolbox is provided by the ggplot2 package, which is discussed in Chapter 2 on Visualizing data of the ds4psy textbook (Neth, 2022a).
There are many different types of graphs and corresponding commands in R. In this chapter, we have learned to use base R functions for creating a few of them:
histograms show a variable’s distribution of values;
scatterplots (and some variants) show the relation between two variables;
bar plots show the values of one or more categorical variables;
Key aesthetic elements (and corresponding arguments of base R functions) include:
color of various elements (
line width (
lwd) and type (
point shape (
?pointsfor possible values)
size of symbols or text (
Key arguments for setting properties of
mainfor providing a plot title (as character);
ylabfor proving axes labels (as character);
ylimfor proving the limits of axes ranges (as a numeric vector of start and end values);
aspfor setting the aspect ratio (as a number y/x);
lasfor setting the orientation of axis labels (as a number 0–3).
For additional parameters, see the documentations of
Scientific visualizations should typically contain the following elements:
A descriptive title or caption that states what the graph is showing;
axes with descriptive labels and sensible value ranges;
one or more geometric objects (e.g., points, bars, lines) that depict the data in a clear fashion;
informative labels or a legend that explains the mapping of geometric objects and aesthetic features to data elements (e.g., which color, line, or shape, is showing which variable for which group).
When creating a new graph, planning these four steps is a good heuristic for creating successful graphs. Due to an abundance of options, we should always aim to create the basic plot before fiddling with labels and aesthetic parameters (like colors, themes, etc.).
Creating good visualizations is both an art and a craft. R provides abundant tools, but using them in a successful fashion is mostly a matter of experience.
The insight that any representation can be good or bad at serving particular purposes is an important point to keep in mind beyond visualizations.
Here are some links to general resources on visualization, not just in R.
Background information and inspiration
Books or scripts on data visualization include the landmark publications by Jacques Bertin (e.g., Bertin, 2011) and Edward R. Tufte (Tufte et al., 1990; Tufte, 2001, 2006) combine sound advice with many inspiring examples. Friendly (2008) provides a historical perspective with many beautiful examples.
More recent publications that are geared to the needs of aspiring data scientists include:
More specific resources on the principles of data visualization (with many beautiful or bizarre examples) include:
Data visualization principles (by Rafael A. Irizarry)
Data visualization: Basic principles (by Peter Aldhous)
Inspiration and tools for additional types of visualizations can be found at (from specific to general):
Plotting in base R
Here are some links to helpful resources on the base R plotting system:
Given that we have some basic knowledge on the base R graphics system, a good next step would be to check out the ggplot2 package (Wickham et al., 2020). For instance, here are two introductory chapters: