Practice 4 Bar Plots with R

4.1 Directions


In this practice exercise, you will load data into R, and create a bar plot. Watch the VoiceThread and then work through the practice.

4.2 A closer look at the code


In this practice, we will be looking into a few slightly more advanced commands. We will use the mtcars data set to calculate average miles per gallon by the number of cylinders. Then we will make a bar plot of the averages.

4.2.1 Let’s make a simple bar plot

We are going to be working with the mtcars dataset to create a nice looking bar plot. In the code window below, I have the code necessary to make a simple bar plot. Press the Run button to see the plot.

To make this plot, we need to first create a table,

table1 <- table(mtcars$cyl)

then we use the barplot() command to draw the bar plot,

barplot(table1)

Click the Run button below the code window to see the R output.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgZ2dwbG90MiBhbmQgZGF0YVxuZGF0YShcIm10Y2Fyc1wiKVxuXG4jIEJhciBwbG90IGN5bCBjb3VudCBpbiBiYXNlIFJcbnRhYmxlMSA8LSB0YWJsZShtdGNhcnMkY3lsKVxuYmFycGxvdCh0YWJsZTEpIn0=

4.2.2 Let’s add a title and label the x-axis

The plot we just made is OK, but unless you know that this is the number of cars in the data set with 4, 6, and 8 cylinders, respectively, it does not tell you very much. So let’s add a title and label the x-axis.

To add a title, we add the argument main to thebarplot() command. main holds a string that will be used as the title of the plot (Note: a string is a variable that holds text).

barplot(table1,main="Car Frequency by Number of Cylinders")

Next, lets add a label to the x-axis. This is done with the xlab argument.

barplot(table1,main="Car Frequency by Number of Cylinders",xlab="Number of Cylinders")

Click the Run button below the code window to see the R output.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgZ2dwbG90MiBhbmQgZGF0YVxuZGF0YShcIm10Y2Fyc1wiKVxuXG4jIEJhciBwbG90IGN5bCBjb3VudCBpbiBiYXNlIFJcbnRhYmxlMSA8LSB0YWJsZShtdGNhcnMkY3lsKVxuYmFycGxvdCh0YWJsZTEsbWFpbj1cIkNhciBGcmVxdWVuY3kgYnkgTnVtYmVyIG9mIEN5bGluZGVyc1wiLHhsYWI9XCJOdW1iZXIgb2YgQ3lsaW5kZXJzXCIpIn0=

4.2.3 Let’s calculate average miles per gallon by the number of cylinders

A frequency plot of cars by the number of cylinders is somewhat interesting. Still, it doesn’t tell us much about the relationship between variables in the data set. Let’s see how the number of cylinders impacts miles per gallon by calculating average miles per gallon for cars with 4, 6 and 8 cylinders.

To do this, we need the we need the aggregate() command. The aggregate() command combines observations by group using some function, such as mean. So we can find the average mpg for cars with 4, 6 and 8 cylinders by aggregating by mtcars$cyl.

aggregate(x=mtcars$mpg, by=list(mtcars$cyl), FUN=mean)

The aggregate() command has three arguments we need to be concerned about:

  1. x which is the data to aggregate.
  2. by which is the variable indicating the groups. We use the list command because by argument requires a list.
  3. FUN wish is the aggregation function, in this case mean.

Click the Run button below the code window to see the R output. Notice that the output is not the same as the table() command. For now, think of mpg.avg as a new data frame.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIGRhdGFcbmRhdGEoXCJtdGNhcnNcIilcblxuIyBDYWxjdWxhdGUgYXZnIG1wZyBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5tcGcuYXZnIDwtIGFnZ3JlZ2F0ZSh4PW10Y2FycyRtcGcsIGJ5PWxpc3QobXRjYXJzJGN5bCksIEZVTj1tZWFuKVxuXG4jIERpc3BsYXkgYXZlcmFnZSBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5tcGcuYXZnIn0=

4.2.4 Let’s make a bar plot of average miles per gallon by the number of cylinders

Now let’s use what we have learned to make a super cool bar plot that shows how mpg relates to the number cylinders in an engine.

mpg.avg is not a table, so we need to tell R which variable should be used to set the height of each bar and which variable should be used to label the bars. We do this by passing barplot() the arguments height and names.arg. The height of the bars should be the average mpg,

height = mpg.avg[,2]

and the names should be the number of cylinders,

names.arg = mpg.avg[,1]

Put it all together and we have,

barplot(height = mpg.avg[,2], names.arg = mpg.avg[,1])

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIGRhdGFcbmRhdGEoXCJtdGNhcnNcIilcblxuIyBDYWxjdWxhdGUgYXZnIG1wZyBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5tcGcuYXZnIDwtIGFnZ3JlZ2F0ZSh4PW10Y2FycyRtcGcsIGJ5PWxpc3QobXRjYXJzJGN5bCksIEZVTj1tZWFuKVxuXG4jIFBsb3QgYXZlcmFnZSBtcGcgYnkgbnVtYmVyIG9mIGN5bGluZGVyc1xuYmFycGxvdChoZWlnaHQgPSBtcGcuYXZnWywyXSwgbmFtZXMuYXJnID0gbXBnLmF2Z1ssMV0pIn0=

4.2.5 Let’s make the bar plot better

Let’s fix the y-axis, we can do this by setting the ylim argument. ylim is a vector, list of numbers, with the first number being the lower limit for the y-axis and the second number being the upper limit for the y-axis. The combine function, c() is used for this. Let’s set the upper and lower limits of the y-axis to 0 and 30, respectively.

ylim = c(0,30)

The final feature we can add to this plot is data labels. We do this with the text() command. The first step is to store our plot in an object,

mpg.avg.barplot <- barplot(height = mpg.avg$mpg_cyl ...

Next we need to pass the plot, the data labels, and where we want the labels to appear to the text() command. The three arguments are:

  1. x in which we pass the plot object, i.e. mpg.avg.barplot.
  2. y in which we pass the how high on the plot we want the labels. We can use the height of each bar, i.e. mpg.avg$mpg_cyl, plus a little to put the labels above the bars.
  3. labels in which we pass the label for each bar.

The labels should be the average miles per gallon by the number of cylinders, but there are two problems: thes are numbers, not text, and they have a lot of digits past the decimal point. To solve this we use round(mpg.avg$mpg_cyl,2) to round to the thousandths place and as.character() to make R treat these numbers as text. So put it all together and we have

mpg.avg.barplot <- text(x=mpg.avg.barplot, y=mpg.avg[,2]+2, labels=as.character(round(mpg.avg[,2],2)))

WARNING! Keeping the parenthesis paired correctly in a compound statement like this can be difficult.

Now add in all the arguments we have discussed previously, and you get a pretty cool bar chart. Click the Run button below the code window to see the R output.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIGRhdGFcbmRhdGEoXCJtdGNhcnNcIilcblxuIyBDYWxjdWxhdGUgYXZnIG1wZyBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5tcGcuYXZnIDwtIGFnZ3JlZ2F0ZSh4PW10Y2FycyRtcGcsIGJ5PWxpc3QobXRjYXJzJGN5bCksIEZVTj1tZWFuKVxuXG4jIFBsb3QgdGhlIGRhdGFcbm1wZy5hdmcuYmFycGxvdCA8LSBiYXJwbG90KGhlaWdodCA9IG1wZy5hdmdbLDJdLCBcbiAgICAgICAgICAgICAgICAgICAgICBuYW1lcy5hcmcgPSBtcGcuYXZnWywxXSxcbiAgICAgICAgICAgICAgICAgICAgICBtYWluPVwiQXZlcmFnZSBNUEcgYnkgTnVtYmVyIG9mIEN5bGluZGVyc1wiLFxuICAgICAgICAgICAgICAgICAgICAgIHhsYWI9XCJOdW1iZXIgb2YgQ3lsaW5kZXJzXCIsXG4gICAgICAgICAgICAgICAgICAgICAgeWxhYj1cIkF2ZXJhZ2UgTVBHXCIsXG4gICAgICAgICAgICAgICAgICAgICAgeWxpbSA9IGMoMCwzMCkpXG5cbm1wZy5hdmcuYmFycGxvdCA8LSB0ZXh0KHg9bXBnLmF2Zy5iYXJwbG90LCBcbiAgICAgICAgICAgICAgICAgICAgICAgIHk9bXBnLmF2Z1ssMl0rMiwgXG4gICAgICAgICAgICAgICAgICAgICAgICBsYWJlbHM9YXMuY2hhcmFjdGVyKHJvdW5kKG1wZy5hdmdbLDJdLDIpKSlcbm1wZy5hdmcuYmFycGxvdCJ9

4.3 R code used in the VoiceThread


# Load the data
data("mtcars")

data <- mtcars
data$cyl <- as.factor(data$cyl)

# Bar plot in base R
barplot( table(data$cyl) )

# Calculate avg mpg by number of cylinders
mpg.avg <- aggregate(x=mtcars$mpg, by=list(mtcars$cyl), FUN=mean)

# Plot the data
mpg.avg.barplot <- barplot(height = mpg.avg[,2], 
                      names.arg = mpg.avg[,1],
                      main="Average MPG by Number of Cylinders",
                      xlab="Number of Cylinders",
                      ylab="Average MPG",
                      ylim = c(0,30))

mpg.avg.barplot <- text(x=mpg.avg.barplot, 
                        y=mpg.avg[,2]+2, 
                        labels=as.character(round(mpg.avg[,2],2)))
mpg.avg.barplot

4.4 Now you try


Use R to complete the following activities (this is just for practice you do not need to turn anything in).

Use R to complete the following activities (this is just for practice you do not need to turn anything in).

  1. Use the mtcars dataset to calculate the average horsepower, hp, by number of cylinders.
  2. Make a bar plot of the average horsepower by the number of cylinders.
  3. Add data labels to each bar in the plot.

R Code Window

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImRhdGEoXCJtdGNhcnNcIikiLCJzYW1wbGUiOiIjIFRoZSBkYXRhIGhhcyBhbHJlYWR5IGJlZW4gbG9hZGVkXG5cbiMgQ2FsY3VsYXRlIGF2ZyBgbXBnYGhwYCBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5cblxuIyBQbG90IHRoZSBkYXRhXG4gXG5cbiMgQWRkIGRhdGEgbGFiZWxzIHRvIHBsb3RcblxuXG4jIERpc3BsYXkgdGhlIHBsb3QiLCJzb2x1dGlvbiI6IiMgTG9hZCBkYXRhXG5kYXRhKFwibXRjYXJzXCIpXG5cbiMgQ2FsY3VsYXRlIGF2ZyBgbXBnYGhwYCBieSBudW1iZXIgb2YgY3lsaW5kZXJzXG5ocC5hdmcgPC0gYWdncmVnYXRlKHg9bXRjYXJzJGhwLCBieT1saXN0KG10Y2FycyRjeWwpLCBGVU49bWVhbilcblxuIyBQbG90IHRoZSBkYXRhXG5ocC5wbG90IDwtIGJhcnBsb3QoaGVpZ2h0ID0gaHAuYXZnWywyXSwgXG4gICAgICAgICAgICAgICAgICAgbmFtZXMuYXJnID0gaHAuYXZnWywxXSxcbiAgICAgICAgICAgICAgICAgICB5bGltID0gYygwLDMwMCkpXG4jIEFkZCBkYXRhIGxhYmVscyB0byBwbG90XG5ocC5wbG90IDwtIHRleHQoeD1ocC5wbG90LCBcbiAgICAgICAgICAgICAgICB5PWhwLmF2Z1ssMl0rMTAsIFxuICAgICAgICAgICAgICAgIGxhYmVscz1hcy5jaGFyYWN0ZXIocm91bmQoaHAuYXZnWywyXSwyKSkpXG5cbiMgRGlzcGxheSB0aGUgcGxvdFxuaHAucGxvdCJ9