10.6 Manual Changes

Don’t forget to make sure the sem function is defined in your environment before following along the examples in this section!

sem <- function(x, na.rm = FALSE) {
  out <- sd(x, na.rm = na.rm)/sqrt(length(x))
  return(out)}

10.6.1 Coloring Individual Values

So far, we’ve seen how we can change the aesthetics of the graph in terms of color, shape, and linetype. We’ve also seen that you can specifically color each geom element individually (i.e., point, line, and error bars). However, R has has a default color scheme. So far, we have not specified the exact color for each value. That is, R has picked the color purple for “Fair” diamonds, dark blue for “Good” diamonds, light blue for “Very Good”, etc. What if we wanted to specify each cut’s color on our own?

In order to do this, we first have to create a new object that holds the designated colors for each cut category. The label for the object in the example is pointcolor. This name was chosen for descriptiveness, but you can choose to name it however you’d like (remember that objects can be labeled however you want, but it’s important that it is descriptive and concise).

pointcolor <- c("Fair" = "yellow",
                "Good" = "red",
                "Very Good" = "pink",
                "Premium" = "blue",
                "Ideal" = "black")

Then, we must execute the graphing code:

diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price),
            se = sem(price)) %>% 
  ggplot(aes(x = clarity, 
             y = m, 
             group = cut,
             color = cut)) +
  geom_point() +
  geom_errorbar(aes(ymin = m - se, 
                    ymax = m + se)) +
  geom_line() +
  scale_color_manual(values = pointcolor) # manual color change

Having trouble running the code? Refer back to the troubleshooting section (3.6)!

Play around with moving the aesthetics. See what happens when you move color = cut inside the geom_point():

You could also have chosen to exclude the names for each cut category as follows:

pointcolor2 <- c("yellow",
                 "red",
                 "pink",
                 "blue",
                 "black")

However, the order in which you list the colors will determine how each cut category is colored. For example, the following will not produce the same colored graph despite containing the same colors:

pointcolor3 <- c("black",
                 "red",        
                 "yellow",
                 "blue",     
                 "pink")   

Instead, pointcolor3 would be the equivalent to:

pointcolor <- c("Fair" = "black",
                "Good" = "red",
                "Very Good" = "yellow",
                "Premium" = "blue",
                "Ideal" = "pink")

Remember, if you were to use pointcolor3 to color your graph, you must update the object name in your graphing code (again, this code relies on the sem function to be available in the global environment beforehand):

diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price),
            se = sem(price)) %>% 
  ggplot(aes(x = clarity, 
             y = m, 
             group = cut)) +
  geom_point(aes(color = cut)) +
  geom_errorbar(aes(ymin = m - se, 
                    ymax = m + se)) +
  geom_line() +
  scale_color_manual(values = pointcolor3)

10.6.2 Order of the X-axis

It is possible to also change the order in which the categorical values are arranged on the x-axis. There are two main ways of doing this:

  1. Change the individual graph only

  2. Change the dataset

Changing the x-axis Order for the Individual Graph Let’s say that I want to change the order of the x-axis so that the clarity is out of order. Changing how the graph is arranged is the simplest and the most localized. Simply alter the dataset’s variable via mutate():

diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price)) %>% 
  ungroup() %>% 
  mutate(clarity = factor(clarity, levels = c("VVS1", "IF", "VVS2", 
                                              "I1", "VS2", "SI1", "SI2", "VS1"))) %>%
  ggplot(aes(x = clarity, y = m, group = cut, color = cut)) +
  geom_point()

Changing the x-axis Order for the Entire Dataset This is very similar to the above method. The difference is that you save the changes from mutate() to the data object. Here, diamonds_edit1 is the name of a new object that is defined with the new changes we made to clarity.

diamonds_edit1 <- 
  diamonds %>% 
  mutate(clarity = factor(clarity, 
                          levels = c("VVS1", "IF", "VVS2", 
                                     "I1", "VS2", "SI1", "SI2", "VS1")))

THEN

diamonds_edit1 %>% # take notice of the new object here
  group_by(clarity, cut) %>% 
  summarize(m = mean(price)) %>% 
  ungroup() %>%
  ggplot(aes(x = clarity, y = m, group = cut, color = cut)) +
  geom_point()

Remember to be wary of saving over objects. For beginners at R, I recommend creating new objects (as in the above example) when making permanent changes to a dataset. This avoids mass confusion and error messages that arise from renaming an object with the same name (i.e., saving over another object).