11.5 Graphing with Different Datasets

One final note is that geom elements (geom_point(), geom_line(), etc.) can plot data from two (or more) different datasets. Let’s see an example:

## creating dataset #1
data1 <- 
  diamonds %>% 
  group_by(clarity) %>% 
  summarize(m = mean(price))  
  
## creating dataset #2
data2 <- 
  diamonds %>% 
  group_by(clarity, cut) %>% 
  summarize(m = mean(price))

## graphing data points from 2 different datasets on one graph
ggplot() +
  geom_point(data = data1, aes(x = clarity, y = m), color = "blue") + # must include argument label "data"
  geom_point(data = data2, aes(x = clarity, y = m))

In the above example, the data from the dataset called data1 is colored in blue for distinction. This data’s values calculate the mean (average) price of diamonds for each clarity (simply execute data1 or View(data1) to view the data). The data from the dataset called data2 is colored in black. This dataset’s values are derived from the mean (average) price of diamonds for each clarity and cut category. Again, the x and y values must be the same (clarity and m).

Within each geom element, you specify the name of the dataset with the argument label data =. This is because the first argument for many of the geom functions is the aesthetic mapping by default. Note that you can plot with multiple datasets for any other geom element too. You could have a geom_bar() for data1 and a geom_point() for data2 if you wanted to! If for some reason you wanted to plot error bars from data1 and data points from data2, you could do that also. This would likely be a terrible graph, but you could.