## 3.3 Base R graphics

This chapter covers **base** R graphics, rather than **ggplot2**, which is covered in Chapter 2: Visualizing data of the ds4psy textbook.

Our main goals are:

Hands-on instructions on visualizing data in

**base**R.Distinguishing between different types of visualizations.

Adding aesthetics: Color, shape, size, etc.

Two important caveats:

Graphs often requires transforming data into a specific format or shape. We ignore this here, but will return to this topic in our part on wrangling data.

We occasionally use colors, but do not cover how to specify and select colors in R. In the following, we occasionally select colors and color functions of the

**unikn**package (see Appendix D: Using colors for details).

### 3.3.1 Basic plots

Due to the inclusion of the core packages **graphics** and **grDevices**, a standard installation of R comes with pretty powerful tools for creating a variety of visualizations.

In this chapter, we can only introduce some very basic commands for creating typical (named) graphs. We can distinguish between basic and complex plots:

*Basic plots*create an entire plot in one function call:

`hist()`

creates histograms;

`plot()`

creates point and line plots (as well as more complex ones);`barplot()`

creates bar charts;`boxplot()`

creates box plots; and`curve()`

allows drawing arbitrary lines and curves.

*Complex plots* (discussed below) are created by multiple function calls.
They are typically started with a generic call to:

`plot()`

, and then followed by more specific plot functions, like`grid()`

,`abline()`

,`points()`

,`text()`

,`title()`

, etc.

#### Histograms

Histograms are one of the simplest types of plots: They show the distribution of values of a single variable.

For demonstration purposes, we create a vector `x`

that randomly draws 500 values of a normal distribution:

`<- rnorm(n = 500, mean = 100, sd = 10) v `

In R, the `hist()`

function allows specifying the data to plot as an argument `x`

.
Providing our vector `v`

to the function yields the following:

`hist(x = v)`

This looks fairly straightforward, but due to the random nature of `x`

the distribution of its values will vary every time we re-create the vector `x`

.

Note that the `hist()`

command added some automatic elements to our plot:

- a descriptive title (above the plot);
- an x- and a y-axes, with appropriate value ranges, axis ticks and labels;
- some geometric objects (here: vertical bars or rectangles) that represent the values of data.

Under the hood, the function even re-arranged the input vector and computed something:
It categorized the values of `x`

into bins of a specific width and counted the number of values falling into each bin (to show their frequency on the y-axis).

Whenever we are unhappy with the automatic defaults, we can adjust some parameters.
In the case of histograms, an important parameter is the number of separate bins into which the data values are being categorized. This can be adjusted using the `breaks`

argument:

```
# specifying breaks:
hist(v, breaks = 5)
```

`hist(v, breaks = 25)`

Once we have settled on the basic parameters, we can adjust the labels and aesthetic aspects of a plot.
A good plot should always contain informative titles, value ranges, and labels.
In the following expression, we adjust the main title (using the `main`

argument),
the label of the x-Axis (using `xlab`

argument),
the main color of the bars and their border (using the `col`

and `border`

arguments):

```
# with aesthetics:
hist(v, breaks = 20,
main = "A basic histogram (showing the distribution of x)",
xlab = "Values of x",
col = "gold", border = "blue")
```

Note that we did not adjust the range of values on the x- and y-axes.
If we wanted to do this, we could have done so by providing the desired ranges (each as a vector with two numeric values) to the `xlim`

and `ylim`

arguments:

```
# with aesthetics:
hist(v, breaks = 20,
main = "A basic histogram (showing the distribution of x)",
xlab = "Values of x",
col = "gold", border = "maroon",
xlim = c(50, 150), ylim = c(0, 120))
```

#### Scatterplots

The `plot()`

function shows relationships between two variables `x`

and `y`

.
Actually, `plot()`

is a flexible plotting function in R:
On the one hand, it allows defining new plots (e.g., create a new plotting canvas).
On the other hand, calling the function with some data arguments aims to directly create a plot of them.

In this section, we will call it with two vectors `x`

and `y`

to create a *scatterplot* (i.e., a plot of points).
However, we will also see that `plot()`

allows creating different plots of the same data, specifically:

- a line plot;
- a step function.

We first create some data to plot.
Here are two simple numeric vectors `x`

and `y`

(where `y`

is defined as a function of `x`

):

```
# Data to plot:
<- -10:10
x <- x^2 y
```

When providing these vectors `x`

and `y`

to the `x`

and `y`

arguments of the `plot()`

function, we obtain:

```
# Minimal scatterplot:
plot(x = x, y = y)
```

Thus, the default plot chosen by `plot()`

was a scatterplot (i.e., a plot of points).
We can change the plot type by providing different values to the `type`

argument:

```
# Distinguish types:
plot(x, y, type = "p") # points (default)
```

`plot(x, y, type = "l") # lines`

`plot(x, y, type = "b") # both points and lines`

`plot(x, y, type = "o") # both overplotted `

`plot(x, y, type = "h") # histogram or height density`

`plot(x, y, type = "s") # steps`

See the documentation `?plot()`

for details on these types and additional arguments.
For most datasets, only some of these types make sense.
Actually, one of the most common uses of `plot()`

uses the type `n`

(for “no plotting”):

`plot(x, y, type = "n") # no plotting / nothing`

As we see, this does not plot any data, but created an empty plot (with appropriate axis ranges). When creating more complex plots (below), we start like this and then add various custom objects to our plot.

Once we have selected our plot, we can fiddle with its aesthetic properties and labels to make it both prettier and more informative:

```
# Set aesthetic parameters and other options:
plot(x, y, type = "b",
lty = 2, pch = 16, lwd = 2, col = "red", cex = 1.5,
main = "A basic plot (showing the relation between x and y)",
xlab = "X label", ylab = "Y label",
asp = 1/10 # aspect ratio (x/y)
)
```

#### Overplotting

A common problem with scatterplots is *overplotting* (i.e., too many points at the same locations).
For instance, suppose we wanted to plot the following data points:

```
# Data to plot:
<- runif(250, min = 0, max = 10)
x <- runif(250, min = 0, max = 10) y
```

Here is how basic scatterplot (with filled and fairly large points) would look like:

```
# Basic scatterplot:
plot(x, y, type = "p",
pch = 20, cex = 4, # filled circles with increased size
main = "An example of overplotting")
```

**Note:** In case you’re wondering what `pch`

and `cex`

mean:

- Calling
`?par()`

provides detailed information on a large variety of graphical parameters. - Calling
`par()`

shows your current system settings.

One of several solutions to the overplotting problem lies in using transparent colors, that allow viewing overlaps of graphical objects. There are several ways of obtaining transparent colors.
For instance, the following solution uses the **unikn** package to select some dark color (called `Petrol`

) and then use the `usecol()`

function to set it to 1/3 of its original opacity:

```
library(unikn)
<- usecol(Petrol, alpha = 1/3) my_col
```

Providing `my_col`

to the `col`

argument of `plot()`

yields:

```
# Set aesthetic parameters and other options:
plot(x, y, type = "p",
pch = 20, cex = 4,
col = my_col,
main = "Addressing overplotting (by color transparency)")
```

Note that the following `type`

variants of `plot()`

may look pretty, but unless we are trying to make an artistic point they make very limited sense given this data:

```
# Select colors:
<- usecol(c(Karpfenblau, Pinky, Petrol, Bordeaux), alpha = 2/3)
my_cols
# Plot 4 types of same data:
plot(x, y, type = "l", col = my_cols[1], main = "A: Line plot")
plot(x, y, type = "b", col = my_cols[2], main = "B: Both points and lines")
plot(x, y, type = "h", col = my_cols[3], main = "C: Height density plot")
plot(x, y, type = "s", col = my_cols[4], main = "D: Step plot")
```

Thus, which `type`

of `plot()`

makes sense is primarily a function of the data that is to be shown.^{9}

#### Bar plots

One of the most common, but also quite complicated types of plots are bar plots (aka. bar charts). The two main reasons why bar plots are complicated are:

the bars often represent processed data (e.g., counts, or the means, sums, or proportions of values).

the bars can be arranged in multiple ways (e.g., stacked vs. beside each other, grouped, etc.)

When we have a named vector of data values that we want to plot, the `barplot()`

command is pretty straightforward:

```
# A vector as data:
<- c(1, 3, 2, 4, 2) # some values
v names(v) <- c(LETTERS[1:5]) # add names
barplot(height = v, col = Seeblau)
```

In most cases, we have some more complicated data (e.g., a data frame or multiple vectors). To create a bar graph from data, we first create a table that contains the values we want to show.

A simple example could use the `mpg`

data from **ggplot2**:

```
# From data:
<- ggplot2::mpg mpg
```

The following `table()`

function creates a simple table of data by counting the number of observations (here: cars) for each level of the `class`

variable:

```
# Create a table:
<- table(mpg$class)
tb # names(tb)
tb#>
#> 2seater compact midsize minivan pickup subcompact suv
#> 5 47 41 11 33 35 62
```

Providing this table as the `height`

argument of `barplot()`

creates a basic bar plot:

`barplot(height = tb) # basic version`

Adding aesthetics and labels renders the plot more colorful and informative:

```
barplot(height = tb,
main = "Counts of cars by class",
xlab = "Class of car",
las = 2, # vertical labels
col = usecol(pal_unikn_light))
```

An alternative way of creating a `barplot()`

would use the `data`

and `formula`

arguments:

Using the `UCBAdmissions`

data:

```
<- as.data.frame(UCBAdmissions)
df
<- df[df$Dept == "A", ]
df_A <- df[df$Dept == "E", ]
df_E
# Select 2 colors:
<- c(Seeblau, Bordeaux) # two colors my_cols
```

Create two bar plots:

```
# A:
barplot(data = df_A, Freq ~ Gender + Admit, beside = TRUE,
main = "Department A", col = my_cols, legend = TRUE)
```

```
# E:
barplot(data = df_E, Freq ~ Gender + Admit, beside = TRUE,
main = "Department E", col = my_cols, legend = TRUE)
```

Problem: Legend position overlaps with bars.

Two possible solutions:

```
# Moving legend position:
# Solution 1: specify args.legend (as a list)
barplot(data = df_E, Freq ~ Gender + Admit, beside = TRUE,
main = "Department E", col = my_cols,
legend = TRUE, args.legend = list(bty = "n", x = "topleft"))
```

```
# Solution 2: Adjust the size of the x-axis:
barplot(data = df_E, Freq ~ Gender + Admit, beside = TRUE,
main = "Department E", col = my_cols,
legend = TRUE, xlim = c(1, 8))
```

+++ here now +++

### 3.3.2 Complex plots

The commands covered so far provide shortcuts and all do multiple things (e.g., select default layouts, range and labels on axes, as well as drawing particular objects). If we want more control about our plots or modify some of the automatic choices, we can specify all parts of a plot in detail.

*Complex plots* are created by multiple function calls and typically start by calling `plot()`

with `type = "n"`

to create a basic plot or canvass, and then adding additional objects to it.

More specific plot functions include:

`grid()`

,`abline()`

,`points()`

,`text()`

,`title()`

, etc.

Mention some details:

Figure 3.3 shows that setting the plot `type = 'b'`

only makes sense when data points are ordered:

```
# par(mar = c(4, 4, 2, 1)) # reduce margin on top and right
# Showing trends:
plot(pressure, pch = 19, type = 'b', main = "Pressure by temperature")
# Nonsense:
plot(x = anscombe$x1, y = anscombe$y1,
pch = 19, type = 'b', main = "Anscombe's 1st set")
```

However, remembering our notion of

*ecological rationality*yields at least two other factors that matter for designing a good visualization: What is the message to be conveyed by the plot, and who will be viewing it?↩︎