11.1 Bar Graph

Let’s continue to use our graph from diamonds but replace geom_point() with geom_bar(). Here, we are graphing the average (mean) price of the diamonds by cut category.

diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price)) %>% 
  ggplot(aes(x = clarity, y = m, group = cut, fill = cut)) +
  geom_bar(stat = "identity") 

In geom_bar(), the default dependent measure is the count (i.e., stat = "count" by default). In the above example, we’ve overridden the default count value by specifying stat = "identity". This indicates that R should use the y-value given in the ggplot() function. Notice that bar graphs use the fill argument instead of the color argument to color-code each cut category.

If we execute this same code without stat = "identity", this will result in an error:

# produces error due to unnecessary y variable
diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price)) %>% 
  ggplot(aes(x = clarity, y = m, group = cut, fill = cut)) +
  geom_bar() # removed stat = "identity" from geom_bar()

By default, R will assume that the stat argument of geom_bar() is set to "count". If we set stat to equal count, R will count how many observations (read: rows of data) there are has for each clarity (x-variable) in the diamonds dataset. Since R is counting how many diamonds there are for each clarity with stat = "count", we do not need to include a y-variable (dependent variable) in ggplot():

# graphing the frequency/count of the clarity categories
diamonds %>% 
  group_by(clarity, cut) %>% 
  ggplot(aes(x = clarity, group = cut, fill = cut)) + # removed y argument and value
  geom_bar()
# is the same as this:
diamonds %>% 
  group_by(clarity, cut) %>% 
  ggplot(aes(x = clarity, group = cut, fill = cut)) + 
  geom_bar(stat = "count") # stat = "count" is implied in the first example

In summary: we need to be mindful of the value we assign to the stat argument within the geom_bar() function. If it is stat = "identity", we are asking R to use the y-value we provide for the dependent variable. If we specify stat = "count" or leave geom_bar() blank, R will count the number of observations based on the x-variable groupings.

11.1.1 Exercises

  1. In the above code, replace fill = cut with color = cut. What happened?
diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price)) %>% 
  ggplot(aes(x = clarity, y = m, group = cut, color = cut)) + # changed fill = cut tp color = cut
  geom_bar(stat = "identity") 
  1. Set geom_bar()’s position argument equal to "dodge" using the code below. What do you see?
diamonds %>% 
  group_by(clarity, cut) %>% 
  ggplot(aes(x = clarity, group = cut, fill = cut)) +
  geom_bar(position = "dodge")
  • You can also choose to specify how far the bars dodge each other with position = position_dodge():
# Try editing the number within the position_dodge() function:
# notice how the bars overlap at a 0.5 value
diamonds %>% 
  group_by(clarity, cut) %>% 
  ggplot(aes(x = clarity, group = cut, fill = cut)) +
  geom_bar(position = position_dodge(0.5)) # manually determining the dodge distance
  1. Plot cut (x-axis) vs. price (y-axis)
# this calculates the total cost of all diamonds within each clarity category
# for example: "the cumulative cost of all diamonds with an I1 clarity is $2,907,809"
diamonds %>% 
  ggplot(aes(x = clarity, y = price)) + 
  geom_bar(stat = "identity") 
  
# to calculate the average price of diamonds per each clarity:
diamonds %>% 
  group_by(clarity) %>% 
  summarize(m = mean(price)) %>% # defined m as the mean price of the diamonds dataset
  ggplot(aes(x = clarity, y = m)) +
  geom_bar(stat = "identity") +
  labs(y = "Mean Price of Diamonds",
       x = "Clarity Category")
  1. Adding Error bars and facet wraps

    • Notice how the error bars (standard deviation) overlay the bars themselves. Recall back to the line graph chapter that the order of graphing elements matters!
    • Try repositioning geom_errobar() above geom_bar() to see what happens!
diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price),
            s = sd(price)) %>% # standard deviation
  ggplot(aes(x = clarity, y = m, group = cut, color = cut, fill = cut)) + # what happens when you add both color and fill arguments? (hint: remove one argument at a time to see the difference)
  geom_bar(stat = "identity")+
  geom_errorbar(aes(ymin = m-s, ymax = m+s)) +
  facet_wrap(~cut)
  1. Using the code from the previous example, rename the x and y-axes using lab()

  2. Execute ?geom_bar() to check out some of the arguments that can be used for this geom element.

diamonds %>% 
  group_by(clarity) %>% 
  summarize(m = mean(price)) %>% # defined m as the mean price of the diamonds dataset
  ggplot(aes(x = clarity, y = m)) +
  geom_bar(stat = "identity",
           show.legend = FALSE, # altering the show.legend argument
           color = "red",
           size = 1, # when we change the size, what are we really changing? (size of what?)
           alpha = .5) # what does changing the value of alpha do? 
  1. Using techniques we’ve learned from the line graph chapter, recreate the graph below:

    • Use group_by() for clarity and cut
    • Re-order the x-axis values of clarity using mutate() and factor()
    • Calculate the mean and standard error (using the standard error function) for price within summarize
    • Set the independent variable to clarity and dependent variable to mean price
    • Set the group, color, and fill to equal cut
    • Change the size of the error bars to 0.2 and the width to 0.5 inside geom_errorbar()
    • Change the y-axis tick intervals using limits() and breaks() within scale_y_continuous to match the graph below (hint: values for those functions should include c(-numbers in here-))
    • Change the x, y, and title labels using labs()
    • Set the theme to theme_classic()
    • Remove the legend title from the graph using theme() and element_blank() (note that this line must go after theme_classic())
    • Center the plot title using theme(), element_text, and hjust
    • Set the position_dodge() to 0.9 for both geom_errorbar() and geom_bar()
Product of the changes listed above.

Figure 11.1: Product of the changes listed above.