11.1 Bar Graph
Let’s continue to use our graph from diamonds but replace geom_point()
with geom_bar()
. Here, we are graphing the average (mean) price of the diamonds by cut category.
diamonds %>%
group_by(clarity, cut) %>%
summarize(m = mean(price)) %>%
ggplot(aes(x = clarity, y = m, group = cut, fill = cut)) +
geom_bar(stat = "identity")
In geom_bar()
, the default dependent measure is the count
(i.e., stat = "count"
by default). In the above example, we’ve overridden the default count value by specifying stat = "identity"
. This indicates that R should use the y-value given in the ggplot()
function. Notice that bar graphs use the fill argument instead of the color argument to color-code each cut category.
If we execute this same code without stat = "identity"
, this will result in an error:
# produces error due to unnecessary y variable
diamonds %>%
group_by(clarity, cut) %>%
summarize(m = mean(price)) %>%
ggplot(aes(x = clarity, y = m, group = cut, fill = cut)) +
geom_bar() # removed stat = "identity" from geom_bar()
By default, R will assume that the stat
argument of geom_bar()
is set to "count"
. If we set stat
to equal count
, R will count how many observations (read: rows of data) there are has for each clarity (x-variable) in the diamonds
dataset. Since R is counting how many diamonds there are for each clarity with stat = "count"
, we do not need to include a y-variable (dependent variable) in ggplot()
:
# graphing the frequency/count of the clarity categories
diamonds %>%
group_by(clarity, cut) %>%
ggplot(aes(x = clarity, group = cut, fill = cut)) + # removed y argument and value
geom_bar()
# is the same as this:
diamonds %>%
group_by(clarity, cut) %>%
ggplot(aes(x = clarity, group = cut, fill = cut)) +
geom_bar(stat = "count") # stat = "count" is implied in the first example
In summary: we need to be mindful of the value we assign to the stat
argument within the geom_bar()
function. If it is stat = "identity"
, we are asking R to use the y-value we provide for the dependent variable. If we specify stat = "count"
or leave geom_bar()
blank, R will count the number of observations based on the x-variable groupings.
11.1.1 Exercises
- In the above code, replace
fill = cut
withcolor = cut
. What happened?
diamonds %>%
group_by(clarity, cut) %>%
summarize(m = mean(price)) %>%
ggplot(aes(x = clarity, y = m, group = cut, color = cut)) + # changed fill = cut tp color = cut
geom_bar(stat = "identity")
- Set
geom_bar()
’sposition
argument equal to"dodge"
using the code below. What do you see?
diamonds %>%
group_by(clarity, cut) %>%
ggplot(aes(x = clarity, group = cut, fill = cut)) +
geom_bar(position = "dodge")
- You can also choose to specify how far the bars dodge each other with
position = position_dodge()
:
# Try editing the number within the position_dodge() function:
# notice how the bars overlap at a 0.5 value
diamonds %>%
group_by(clarity, cut) %>%
ggplot(aes(x = clarity, group = cut, fill = cut)) +
geom_bar(position = position_dodge(0.5)) # manually determining the dodge distance
- Plot cut (x-axis) vs. price (y-axis)
# this calculates the total cost of all diamonds within each clarity category
# for example: "the cumulative cost of all diamonds with an I1 clarity is $2,907,809"
diamonds %>%
ggplot(aes(x = clarity, y = price)) +
geom_bar(stat = "identity")
# to calculate the average price of diamonds per each clarity:
diamonds %>%
group_by(clarity) %>%
summarize(m = mean(price)) %>% # defined m as the mean price of the diamonds dataset
ggplot(aes(x = clarity, y = m)) +
geom_bar(stat = "identity") +
labs(y = "Mean Price of Diamonds",
x = "Clarity Category")
Adding Error bars and facet wraps
- Notice how the error bars (standard deviation) overlay the bars themselves. Recall back to the line graph chapter that the order of graphing elements matters!
- Try repositioning
geom_errobar()
abovegeom_bar()
to see what happens!
diamonds %>%
group_by(clarity, cut) %>%
summarize(m = mean(price),
s = sd(price)) %>% # standard deviation
ggplot(aes(x = clarity, y = m, group = cut, color = cut, fill = cut)) + # what happens when you add both color and fill arguments? (hint: remove one argument at a time to see the difference)
geom_bar(stat = "identity")+
geom_errorbar(aes(ymin = m-s, ymax = m+s)) +
facet_wrap(~cut)
Using the code from the previous example, rename the x and y-axes using
lab()
Execute
?geom_bar()
to check out some of the arguments that can be used for thisgeom
element.
diamonds %>%
group_by(clarity) %>%
summarize(m = mean(price)) %>% # defined m as the mean price of the diamonds dataset
ggplot(aes(x = clarity, y = m)) +
geom_bar(stat = "identity",
show.legend = FALSE, # altering the show.legend argument
color = "red",
size = 1, # when we change the size, what are we really changing? (size of what?)
alpha = .5) # what does changing the value of alpha do?
Using techniques we’ve learned from the line graph chapter, recreate the graph below:
- Use
group_by()
forclarity
andcut
- Re-order the x-axis values of
clarity
usingmutate()
andfactor()
- Calculate the mean and standard error (using the standard error function) for
price
withinsummarize
- Set the independent variable to clarity and dependent variable to mean price
- Set the
group
,color
, andfill
to equalcut
- Change the
size
of the error bars to 0.2 and thewidth
to 0.5 insidegeom_errorbar()
- Change the y-axis tick intervals using
limits()
andbreaks()
withinscale_y_continuous
to match the graph below (hint: values for those functions should include c(-numbers in here-)) - Change the x, y, and title labels using
labs()
- Set the theme to
theme_classic()
- Remove the legend title from the graph using
theme()
andelement_blank()
(note that this line must go aftertheme_classic()
) - Center the plot title using
theme()
,element_text
, andhjust
- Set the
position_dodge()
to 0.9 for bothgeom_errorbar()
andgeom_bar()
- Use