# Chapter 6 Data Visualization

One of the strengths of R is the relative ease in which one can produce a nice looking graph. We will again use the formula template for making a graph:

plotname(~variable, data = dataName)

As you can see, there are three pieces of information we must provide to get the graph we want:

• The kind of plot (histogram(), bargraph(), bwplot(), etc)
• The name of the variable(s)
• The name of the dataframe

For example, let’s look at a basic histogram:

library(mosaic)

histogram(~width, data=KidsFeet)

histogram(~width, data=KidsFeet, main = "main title",
xlab = "x-axis title", ylab = "y-axis title",
col = "dark red")                             #use fcol if data is in groups

We can control the appearance of the graph using the following options:

• main = title at top of graph
• xlab = x-axis label
• ylab = y-axis label
• col or fcol = color of filled portion of graph ## 6.1 bargraphs - for one or two catagorical variables

A bargraph is used to display COUNTS for the number of observations in a datafile within certain categories. For example, let’s say we want to view the number of children in each of the boy/girl categories, or the right/left hand categories.

library(mosaic)

bargraph(~sex, data=KidsFeet, main = "sex")
bargraph(~domhand, data=KidsFeet, main = "dominant hand")

bargraph(~sex |domhand, data=KidsFeet, main = "sex by dominant hand")
bargraph(~sex,  groups=domhand, data=KidsFeet, main = "sex grouped by dominant hand")  ## 6.2 gf_bargraph - for categorical data

Note the the gf_bargraph() function uses title= instead of main= for the title of the graph.

library(ggformula)
gf_bar(~sex, fill = ~ domhand,   data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())

gf_props(~sex, fill = ~ domhand,   data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())

gf_bar() uses COUNTS of the data, where as gf_props() uses PROPORTIONS on the y-axis.  ## 6.3 dotplot - for one numerical variable

library(mosaic)

# Notice that the "P" is captialized in dotPlot!
dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")

dotPlot(~width, groups= sex, data=KidsFeet, pch=16,
xlab = "width of foot (inches)")
p1=dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")

p2=dotPlot(~width, groups= sex, data=KidsFeet,
pch=16,                                  # type ?pch in console to see character types
xlim = c(6,11),
xlab = "width of foot (inches)")

print(p1, position = c(0, 0, 0.5, 1), more = TRUE)
print(p2, position = c(0.5,0,1,1))

## 6.4 gf_dotPlot - for one numerical variable

library(ggformula)
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1, stackgroups = FALSE,
xlab = "width of foot (inches)")
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1, stackgroups = TRUE, binpositions="all",
xlab = "width of foot (inches)")  ## 6.5 bwplot - for one numerical variable

library(mosaic)

bwplot(~width, data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)")

bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)", horizontal=TRUE, pch="|")

bwplot(width~sex , data=KidsFeet, xlab = "width of foot (inches)") ## gf_boxplot - for one numerical variable

library(ggformula)

gf_histogram(~width | sex,
binwidth = 0.25,
xlim = c(7,11),
ylab = "Percent of Students",              # y-axis label
data=KidsFeet, xlab = "width of foot (inches)",
layout= c(2,1)) ## 6.6 histogram - for one numerical variable

library(mosaic)

histogram(~width, data=KidsFeet, xlab = "width of foot (inches)")

histogram(~width | sex, data=KidsFeet, xlab = "width of foot (inches)",
layout= c(1,2))                       # layout using 1 column and 2 rows ## 6.7 gf_histogram - for one numerical variable

Notice that the ggformula function for a box-and-whisker plot is gf_boxplot (not gf_bwplot).

library(ggformula)

gf_boxplot(width~sex , data=KidsFeet) ## 6.8 xyplot - for two numerical variables

An xyplot is also called a scatterplot.

library(mosaic)

xyplot(length~width, data=KidsFeet, xlab = "width of foot (inches)")

xyplot(length~width, groups=sex,  pch=16, cex = 1.4,
auto.key=list(space="right", title = "Gender", cex=.75),
data=KidsFeet, xlab = "width of foot (inches)") ## 6.9 gf_xyplot - for two numerical variables

library(ggformula)

gf_point(length~width, data=KidsFeet,
size = 3, color = ~ sex) ## 6.10 barplot - for comparing two proportions

library(data.table)

tally(sex ~ domhand, data=KidsFeet)
##    domhand
## sex  L  R
##   B  5 15
##   G  3 16
barplot(prop.table(tally(sex ~ domhand, data=KidsFeet), margin = 2),
las = 1,
main = "Proportion of dominant hand by gender",
col = c("green", "purple"),
legend.text = c("boys", "girls")) 