4 Plotting in base R

Here, we will discuss creating plots with the built in base R plots. There is also a powerful graphing package called ggplot2 which we will use in a later chapter.

4.1 Scatter plots / line plots

At its most basic, all you need is some data and the plot command.

x <- seq(-2,2,0.1)
y <- x^2
plot(x,y)

This isn’t very interesting or satisfying, so let’s generate some data to play with. First, remember to check / set your work directory. And remember to try this yourself in your console window (top left).

getwd()
#> [1] "C:/Users/Karin/Desktop/Sabbatical2023/333book/complete"
setwd("C:/Users/Karin/Desktop/Sabbatical2023/333book/complete")
# You use your own working directory of course
x <- c(1:100)
y1 <- x+rnorm(100,0,1)*4
y2 <- x+runif(100,-1,1)*4
y3 <- x+rexp(100, 1)*4

In the code below, you see the (optional) commands type, lty,col,pch, cex, and lwd. These commands change the plot type used, line type, the color of your lines and markers, the marker type, the size of your marker, and the width of your line. Play around some, change the type to p, l, b, c, o, s, h, n, try different numeric values for lty, pch, cex, and lwd.

plot(x, y1, type="l", lty=2,col="green", pch=1, cex=1, lwd=1)

You can add additional plot lines and a legend with the legend and lines commands:

plot(x, y1, type="l", lty=2,col="green", pch=1, cex=1, lwd=1)
lines(x,y2, type="o", col="red",pch=1, cex=1)
lines(x,y3, type="p", col="blue")
legend("topright",legend =c("normal error","uniform error",
        "exponential error"), fill=c("green","red","blue"))

Finally, let’s add a title and axis labels. xlim and ylim allow us to set the axis limits.

plot(x, y1, type="l", lty=2,col="green", pch=1, cex=1, lwd=1, 
     main ="A first sample plot", 
     sub="three functions",xlab="x value",ylab="y value",
     xlim=c(20,50),ylim=c(0,150))
lines(x,y2, type="o", col="red",pch=1, cex=1)
lines(x,y3, type="p", col="blue")
legend("topright",legend = c("normal error","uniform error", 
      "exponential error"),fill=c("green","red","blue"))

4.2 Bar graphs

First, we again need some data to plot.

c1 <- rep(c(1,2,3,4,5), c(10,11,12,10,7))
c2 <- rep(c(0,1,2,4,6), c(5,5,20,8,40))
c3 <- c(12,5,9,7,11,10)
c4 <- c(13,14,8,4,4,13)
c1
#>  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
#> [29] 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5

The barplot command generates a bar plot where the height of each bar corresponds to the value in your input sequence. For example, here c3[2] = 5, so the second bar is 5 units tall.

We can create labels for the different bars, add a title and axis labels, set the vertical scale, and otherwise customize our bar graphs. You should create your own, and play around to practice.

labels <- c("A",":)","C","4th group","another","last")
barplot(c3,main = "Some title", xlab = "data point number", 
        ylab = "data point value", ylim = c(0, 15), 
        names.arg = labels, border="pink",col="blue", 
        width=0.1,legend.text ="some made-up data from c3")

Here are some more graphical parameters.

barplot(c3,main = "Some title", xlab = "data point number", 
        ylab = "data point value", ylim = c(0, 15), 
        names.arg = labels, border="pink", density=10, 
        angle=-45,col="blue",legend.text ="some made-up data")

You can change the orientation of the bars to horizontal:

barplot(c3, col="green", horiz=TRUE, xlim=c(-10,50))

Here, two series are plotted on top of each other, with c3 first, in red, and c4 in grey on top of it.

#plot 5
barplot(c3, col="red")
barplot(c4, add=TRUE)

This one plots c4 first, in red, and c4 in grey on top of it.

barplot(c4, col="red")
barplot(c3, add=TRUE)

You might want to plot the two sequences with the bars added together. In this case, bind the two sequences together. Note this only works if they have the same number of entries.

barplot(rbind(c3, c4))

The beside parameter allows the bars to be plotted next to each other.

barplot(rbind(c3, c4), beside=TRUE)

This is a pretty good looking graph combining some of what you learned before.

barplot(rbind(c3, c4),  main = "Some title", xlab = "some data", 
        ylab = "some values", ylim = c(0, 20), 
        names.arg = labels, col=c("blue","red"),
        legend.text =c("this is c3","this is c4"),beside =TRUE)

4.3 Histograms

in a histogram, the area of each bar compares to the count of values in the underlying interval. Compare a barplot and a histogram of the same data:

barplot(c1,main="Barplot of c1")
hist(c1,main="Histogram of c1")

You can set the break points yourself, and also decide if you want the counts in each group (frequencies) or the percentages (density).

hist(c2, breaks=c(-0.5,0.5,1.5,2.5,3.5, 4.5, 5.5, 6.5, 7.5))
hist(c2, breaks=c(-0.5,0.5,1.5,2.5,3.5, 4.5, 5.5, 6.5, 7.5), freq=FALSE)

You can choose your colors and also plot several data sets at once. However, you will see that information gets lost.

hist(c1, breaks=c(-0.5,0.5,1.5,2.5,3.5, 4.5, 5.5, 6.5), 
     ylim=c(0,50), col="green")
hist(c2, breaks=c(-1,1,3,5,7), add=TRUE, col="red")

We would like to see both series. To do that, we use the rgb notation for colors. rgb defines the red, green, and blue proportion of each color and the saturation. rgb(0,0,1,0.1) for example would give no red, no green, 100% blue, and 10% saturation. We’d expect to see a very light blue. Note that rgb colors do not need the " ".

hist(c1, breaks=c(-0.5,0.5,1.5,2.5,3.5, 4.5, 5.5, 6.5), ylim=c(0,50), xlim=c(-2, 10), 
     col=rgb(0,1,0,0.2))
hist(c2, breaks=c(-1,1,3,5,7), add=TRUE, col=rgb(1,0,0,0.2))
legend('topright',legend=c("c1","c2"), fill=c(rgb(0,1,0,0.2),rgb(1,0,0,0.2)))

As before, we can add legends and labels. Instead of defining break points, we can specify the number of breaks we want. Play around with the number of breaks. You will find that the number is treated as a suggestion only, R will pick what it thinks is “pretty”.

hist(c1, breaks=9, ylim=c(0,50), xlim=c(0,6), 
     col=rgb(0,1,0,0.2), border="green",labels=c("G1","G2","G3","G4","G5","G6","G7","G8"),
     xlab="Groups",ylab="% in group", main="Another histogram")
hist(c2, breaks=6, add=TRUE, col=rgb(1,0,0,0.2), border="red",labels = c("A","B","C","D","E","F"))
hist(c2, breaks=3, add=TRUE, col=rgb(0,0,1,0.2), border="blue",labels = c("1","2","2"))
legend('top',legend=c("c1","c2 - 6 breaks","c2 - 3 breaks"), fill=c(rgb(0,1,0,0.2),rgb(1,0,0,0.2),rgb(0,0,1,0.2)))

You can also add a line to a histogram. Here, we graph the function \(y=\frac{10}{1+x}\) on top of the histogram.

x<-c(0,1,2,3,4,5,6)
y<- 10/(x+1)
hist(c1, breaks=9, ylim=c(0,50), xlim=c(0,6), 
     col=rgb(0,1,0,0.2), border="green",labels=c("G1","G2","G3","G4","G5","G6","G7","G8"),
     xlab="Groups",ylab="% in group", main="A histogram with a line added")
lines(x=x, y=y, col="red")

CAUTION CAUTION CAUTION CAUTION CAUTION

In a histogram, the area of a bar corresponds to the frequency, not the height of the bar. You can see that in this example:

x<- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
hist(x, breaks=c(-1,0,1,2,3),freq=FALSE, main="Equal frequency in each group, same bin widths")
hist(x, breaks=c(-1,1,2,3),freq=FALSE,main="Equal frequency in each group, different bin widths")

4.4 Pie chart

Let’s say you have the numbers 10, 20, 30, and 50. R will interpret this data as having a total of 10+20+30+50 = 110, and plot the pie with slices of sizes 10/110, 20/110, 30/110, and 50/110. The default labels are 1, 2, 3, 4, meaning groups 1 through 4. The pie chart will look like this:

data <- c(10,20,30,50)
pie(data)

You should only use pie charts if you want to visualize how a total is split up into groups. Here is a better example showing results of a German election:

parties <- c("SPD","CDU","Green","FDP","AFD","Left","Other")
votes <- c(17,27,15,7,21,4,9)
mylabels <- c("SPD,17%","CDU,27%","Green,15%","FDP,7%","AFD,21%","Left,4%","Other,9%")
pie(votes,labels=mylabels,col=c("red","black","green","yellow","skyblue","purple","grey"),main="Election results 2023")
pie(votes,labels=mylabels,col=c("red","black","green","yellow","skyblue","purple","grey"),density=20, angle=c(30, 60, 90, 120, 150, 180),main="Election results 2023")

However, a bar chart would be better.

parties <- c("SPD","CDU","Green","FDP","AFD","Left","Other")
votes <- c(17,27,15,7,21,4,9)
mylabels <- c("SPD 17%","CDU 27%","Green 15%","FDP 7%","AFD 21%","Left 4%","Other 9%")
barplot(votes,names.arg=mylabels,col=c("red","black","green","yellow","skyblue","purple","grey"),main="Election results 2023",ylim=c(0,30), ylab="% of total votes",las=2 )

4.5 Assignment

  1. Below are two data sets. Decide which plot(s) would be appropriate, and create the perfect graph or graphs
y1 <- rnorm(1000, 3, 2)
y2 <- rnorm(1200, -1, 1)
  1. Re-produce this graph. The data:
x <- seq(-10,10,0.1)
y <- sin(abs(x))
z <- cos(abs(x))
  1. Reproduce this graph. Same data as in 2).