# Chapter 6 Data Visualization

One of the strengths of R is the relative ease in which one can produce a nice looking graph. We will again use the formula template for making a graph:

plotname(~variable, data = dataName)

As you can see, there are three pieces of information we must provide to get the graph we want:

• The kind of plot (histogram(), bargraph(), bwplot(), etc)
• The name of the variable(s)
• The name of the dataframe

For example, let’s look at a basic histogram:

library(mosaic)

histogram(~width, data=KidsFeet)

histogram(~width, data=KidsFeet,
main = "main title",
xlab = "x-axis title",
xlim = c(min, max),
ylab = "y-axis title",
ylib = c(min,max),
sub = "subtitle",
col = "dark red")             

We can control the appearance of the graph using the following options:

• main = title at top of graph
• xlab = x-axis label
• xlim = minimum/maximum of limits on x-axis
• ylab = y-axis label
• ylim = minimum/maximum of limits on y-axis
• sub = subtitle
• col or fcol = color of filled portion of graph (use fcol if data is in groups)

## 6.1 bargraphs - for one or two catagorical variables

A bargraph is used to display COUNTS for the number of observations in a datafile within certain categories. For example, let’s say we want to view the number of children in each of the boy/girl categories, or the right/left hand categories.

library(mosaic)

bargraph(~sex, data=KidsFeet, main = "sex")
bargraph(~domhand, data=KidsFeet, main = "dominant hand")

bargraph(~sex |domhand, data=KidsFeet, main = "sex by dominant hand")
bargraph(~sex,  groups=domhand, data=KidsFeet, main = "sex grouped by dominant hand")

df2 <-data.frame(twoparent =c(407,45),
always =c(61,16),
divorce =c(231,29),
nocohab =c(124,11),
withcohab =c(193,51),
row.names =c("yes", "no"))

df2
##     twoparent always divorce nocohab withcohab
## yes       407     61     231     124       193
## no         45     16      29      11        51
barplot(as.matrix(df2), legend.text =T, col =c("purple","magenta"))

## 6.2 gf_bargraph - for categorical data

Note the the gf_bargraph() function uses title= instead of main= for the title of the graph.

library(ggformula)
gf_bar(~sex, fill = ~ domhand,   data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())

gf_props(~sex, fill = ~ domhand,   data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())

gf_bar() uses COUNTS of the data, where as gf_props() uses PROPORTIONS on the y-axis.

## 6.3 pie - for categorical data

observed <-c(278, 523, 98, 101)

pie(observed, labels =c("Elementary", "Secondary", "College Credits", "College Degree"))

## 6.4 dotplot - for one numerical variable

library(mosaic)

# Notice that the "P" is captialized in dotPlot!
dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")

dotPlot(~width, groups= sex, data=KidsFeet, pch=16,
xlab = "width of foot (inches)")

## 6.5 gf_dotPlot - for one numerical variable

library(ggformula)
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1,
stackgroups = FALSE,
xlab = "width of foot (inches)")
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.2,
stackgroups = TRUE, binpositions="all",
xlab = "width of foot (inches)")

## 6.6 bwplot - for one numerical variable or one numerical/one categorical variable

library(mosaic)

bwplot(~width, data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)", horizontal=TRUE, pch="|")

bwplot(width~sex , data=KidsFeet, xlab = "width of foot (inches)")

## 6.7 gf_histogram - for one numerical variable

library(ggformula)

gf_histogram(~width | sex,
binwidth = 0.25,
xlim = c(7,11),
ylab = "Percent of Students",              # y-axis label
data=KidsFeet, xlab = "width of foot (inches)",
layout= c(2,1))

## 6.8 histogram - for one numerical variable

library(mosaic)

histogram(~width, data=KidsFeet, xlab = "width of foot (inches)")

histogram(~width | sex, data=KidsFeet, xlab = "width of foot (inches)",
layout= c(1,2))                       # layout using 1 column and 2 rows

## 6.9 gf_boxplot - for one numerical variable

Notice that the ggformula function for a box-and-whisker plot is gf_boxplot (not gf_bwplot).

library(ggformula)

gf_boxplot(width~sex , data=KidsFeet)

## 6.10 xyplot - for two numerical variables

An xyplot is also called a scatterplot.

library(mosaic)

xyplot(length~width, data=KidsFeet, xlab = "width of foot (inches)")

xyplot(length~width, groups=sex,  pch=16, cex = 1.4,
auto.key=list(space="right", title = "Gender", cex=.75),
data=KidsFeet, xlab = "width of foot (inches)")

## 6.11 gf_xyplot - for two numerical variables

library(ggformula)

gf_point(length~width, data=KidsFeet,
size = 3, color = ~ sex)

## 6.12 barplot - for comparing two proportions

tally(sex ~ domhand, data=KidsFeet)
##    domhand
## sex  L  R
##   B  5 15
##   G  3 16
barplot(prop.table(tally(sex ~ domhand, data=KidsFeet), margin = 2),
las = 1,
main = "Proportion of dominant hand by gender",
args.legend = list(x = 2, y = 0.5),
# or args.legend = list(x = "bottomright")
col = c("plum", "palegreen"),
legend.text = c("boys", "girls"))