Chapter 6 Data Visualization

One of the strengths of R is the relative ease in which one can produce a nice looking graph. We will again use the formula template for making a graph:

plotname(~variable, data = dataName)

As you can see, there are three pieces of information we must provide to get the graph we want:

  • The kind of plot (histogram(), bargraph(), bwplot(), etc)
  • The name of the variable(s)
  • The name of the dataframe

For example, let’s look at a basic histogram:

library(mosaic)

histogram(~width, data=KidsFeet)

histogram(~width, data=KidsFeet, main = "main title",
          xlab = "x-axis title", ylab = "y-axis title", 
          col = "dark red")                             #use fcol if data is in groups

We can control the appearance of the graph using the following options:

  • main = title at top of graph
  • xlab = x-axis label
  • ylab = y-axis label
  • col or fcol = color of filled portion of graph

6.1 bargraphs - for one or two catagorical variables

A bargraph is used to display COUNTS for the number of observations in a datafile within certain categories. For example, let’s say we want to view the number of children in each of the boy/girl categories, or the right/left hand categories.

library(mosaic)

bargraph(~sex, data=KidsFeet, main = "sex")
bargraph(~domhand, data=KidsFeet, main = "dominant hand")

bargraph(~sex |domhand, data=KidsFeet, main = "sex by dominant hand")
bargraph(~sex,  groups=domhand, data=KidsFeet, main = "sex grouped by dominant hand")

6.2 gf_bargraph - for categorical data

Note the the gf_bargraph() function uses title= instead of main= for the title of the graph.

library(ggformula)
gf_bar(~sex, fill = ~ domhand,   data=KidsFeet, 
       title = "main title",
       xlab = "Gender of student (Boy or Girl)",
       position = position_dodge())

gf_props(~sex, fill = ~ domhand,   data=KidsFeet, 
       title = "main title",
       xlab = "Gender of student (Boy or Girl)",
       position = position_dodge())

gf_bar() uses COUNTS of the data, where as gf_props() uses PROPORTIONS on the y-axis.

6.3 dotplot - for one numerical variable

library(mosaic)

# Notice that the "P" is captialized in dotPlot!
dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")

dotPlot(~width, groups= sex, data=KidsFeet, pch=16,
        xlab = "width of foot (inches)")
p1=dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")

p2=dotPlot(~width, groups= sex, data=KidsFeet, 
        pch=16,                                  # type ?pch in console to see character types
        xlim = c(6,11), 
        xlab = "width of foot (inches)")


print(p1, position = c(0, 0, 0.5, 1), more = TRUE)
print(p2, position = c(0.5,0,1,1))

6.4 gf_dotPlot - for one numerical variable

library(ggformula)
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1, stackgroups = FALSE, 
        xlab = "width of foot (inches)")
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1, stackgroups = TRUE, binpositions="all", 
        xlab = "width of foot (inches)")

6.5 bwplot - for one numerical variable

library(mosaic)

bwplot(~width, data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)", horizontal=TRUE, pch="|")

bwplot(width~sex , data=KidsFeet, xlab = "width of foot (inches)")

## gf_boxplot - for one numerical variable

library(ggformula)

gf_histogram(~width | sex, 
          binwidth = 0.25,
          xlim = c(7,11),
          ylab = "Percent of Students",              # y-axis label
          data=KidsFeet, xlab = "width of foot (inches)", 
          layout= c(2,1))

6.6 histogram - for one numerical variable

library(mosaic)

histogram(~width, data=KidsFeet, xlab = "width of foot (inches)")

histogram(~width | sex, data=KidsFeet, xlab = "width of foot (inches)", 
          layout= c(1,2))                       # layout using 1 column and 2 rows

6.7 gf_histogram - for one numerical variable

Notice that the ggformula function for a box-and-whisker plot is gf_boxplot (not gf_bwplot).

library(ggformula)

gf_boxplot(width~sex , data=KidsFeet)

6.8 xyplot - for two numerical variables

An xyplot is also called a scatterplot.

library(mosaic)

xyplot(length~width, data=KidsFeet, xlab = "width of foot (inches)")

xyplot(length~width, groups=sex,  pch=16, cex = 1.4, 
       auto.key=list(space="right", title = "Gender", cex=.75), 
       data=KidsFeet, xlab = "width of foot (inches)")

6.9 gf_xyplot - for two numerical variables

library(ggformula)

gf_point(length~width, data=KidsFeet,
         size = 3, color = ~ sex)

6.10 barplot - for comparing two proportions

library(data.table)

tally(sex ~ domhand, data=KidsFeet)
##    domhand
## sex  L  R
##   B  5 15
##   G  3 16
barplot(prop.table(tally(sex ~ domhand, data=KidsFeet), margin = 2), 
        las = 1, 
        main = "Proportion of dominant hand by gender", 
        col = c("green", "purple"),
        legend.text = c("boys", "girls"))