Practice 4 Bar Plots with R
4.1 Directions
In this practice exercise, you will load data into R, and create a bar plot. Watch the VoiceThread and then work through the practice.
4.2 A closer look at the code
In this practice, we will be looking into a few slightly more advanced commands. We will use the mtcars data set to calculate average miles per gallon by the number of cylinders. Then we will make a bar plot of the averages.
4.2.1 Let’s make a simple bar plot
We are going to be working with the mtcars dataset to create a nice looking bar plot. In the code window below, I have the code necessary to make a simple bar plot. Press the Run button to see the plot.
To make this plot, we need to first create a table,
table1 <- table(mtcars$cyl)
then we use the barplot()
command to draw the bar plot,
barplot(table1)
Click the Run button below the code window to see the R output.
4.2.2 Let’s add a title and label the x-axis
The plot we just made is OK, but unless you know that this is the number of cars in the data set with 4, 6, and 8 cylinders, respectively, it does not tell you very much. So let’s add a title and label the x-axis.
To add a title, we add the argument main
to thebarplot()
command. main
holds a string that will be used as the title of the plot (Note: a string is a variable that holds text).
barplot(table1,main="Car Frequency by Number of Cylinders")
Next, lets add a label to the x-axis. This is done with the xlab
argument.
barplot(table1,main="Car Frequency by Number of Cylinders",xlab="Number of Cylinders")
Click the Run button below the code window to see the R output.
4.2.3 Let’s calculate average miles per gallon by the number of cylinders
A frequency plot of cars by the number of cylinders is somewhat interesting. Still, it doesn’t tell us much about the relationship between variables in the data set. Let’s see how the number of cylinders impacts miles per gallon by calculating average miles per gallon for cars with 4, 6 and 8 cylinders.
To do this, we need the we need the aggregate()
command. The aggregate()
command combines observations by group using some function, such as mean
. So we can find the average mpg
for cars with 4, 6 and 8 cylinders by aggregating by mtcars$cyl
.
aggregate(x=mtcars$mpg, by=list(mtcars$cyl), FUN=mean)
The aggregate()
command has three arguments we need to be concerned about:
x
which is the data to aggregate.by
which is the variable indicating the groups. We use thelist
command becauseby
argument requires a list.FUN
wish is the aggregation function, in this casemean
.
Click the Run button below the code window to see the R output. Notice that the output is not the same as the table()
command. For now, think of mpg.avg as a new data frame.
4.2.4 Let’s make a bar plot of average miles per gallon by the number of cylinders
Now let’s use what we have learned to make a super cool bar plot that shows how mpg
relates to the number cylinders in an engine.
mpg.avg
is not a table, so we need to tell R which variable should be used to set the height of each bar and which variable should be used to label the bars. We do this by passing barplot()
the arguments height
and names.arg
. The height
of the bars should be the average mpg
,
height = mpg.avg[,2]
and the names should be the number of cylinders,
names.arg = mpg.avg[,1]
Put it all together and we have,
barplot(height = mpg.avg[,2], names.arg = mpg.avg[,1])
4.2.5 Let’s make the bar plot better
Let’s fix the y-axis, we can do this by setting the ylim
argument. ylim
is a vector, list of numbers, with the first number being the lower limit for the y-axis and the second number being the upper limit for the y-axis. The combine function, c()
is used for this. Let’s set the upper and lower limits of the y-axis to 0 and 30, respectively.
ylim = c(0,30)
The final feature we can add to this plot is data labels. We do this with the text()
command. The first step is to store our plot in an object,
mpg.avg.barplot <- barplot(height = mpg.avg$mpg_cyl ...
Next we need to pass the plot, the data labels, and where we want the labels to appear to the text()
command.
The three arguments are:
x
in which we pass the plot object, i.e.mpg.avg.barplot.
y
in which we pass the how high on the plot we want the labels. We can use the height of each bar, i.e.mpg.avg$mpg_cyl
, plus a little to put the labels above the bars.labels
in which we pass the label for each bar.
The labels should be the average miles per gallon by the number of cylinders, but there are two problems: thes are numbers, not text, and they have a lot of digits past the decimal point. To solve this we use round(mpg.avg$mpg_cyl,2)
to round to the thousandths place and as.character()
to make R treat these numbers as text. So put it all together and we have
mpg.avg.barplot <- text(x=mpg.avg.barplot, y=mpg.avg[,2]+2, labels=as.character(round(mpg.avg[,2],2)))
WARNING! Keeping the parenthesis paired correctly in a compound statement like this can be difficult.
Now add in all the arguments we have discussed previously, and you get a pretty cool bar chart. Click the Run button below the code window to see the R output.
4.3 R code used in the VoiceThread
# Load the data
data("mtcars")
mtcars
data <-$cyl <- as.factor(data$cyl)
data
# Bar plot in base R
barplot( table(data$cyl) )
# Calculate avg mpg by number of cylinders
aggregate(x=mtcars$mpg, by=list(mtcars$cyl), FUN=mean)
mpg.avg <-
# Plot the data
barplot(height = mpg.avg[,2],
mpg.avg.barplot <-names.arg = mpg.avg[,1],
main="Average MPG by Number of Cylinders",
xlab="Number of Cylinders",
ylab="Average MPG",
ylim = c(0,30))
text(x=mpg.avg.barplot,
mpg.avg.barplot <-y=mpg.avg[,2]+2,
labels=as.character(round(mpg.avg[,2],2)))
mpg.avg.barplot
4.4 Now you try
Use R to complete the following activities (this is just for practice you do not need to turn anything in).
Use R to complete the following activities (this is just for practice you do not need to turn anything in).
- Use the mtcars dataset to calculate the average horsepower,
hp
, by number of cylinders. - Make a bar plot of the average horsepower by the number of cylinders.
- Add data labels to each bar in the plot.