Chapter 6 Data Visualization
One of the strengths of R is the relative ease in which one can produce a nice looking graph. We will again use the formula template for making a graph:
plotname(~variable, data = dataName)
As you can see, there are three pieces of information we must provide to get the graph we want:
- The kind of plot (histogram(), bargraph(), bwplot(), etc)
- The name of the variable(s)
- The name of the dataframe
For example, let’s look at a basic histogram:
library(mosaic)
histogram(~width, data=KidsFeet)
histogram(~width, data=KidsFeet,
main = "main title",
xlab = "x-axis title",
xlim = c(min, max),
ylab = "y-axis title",
ylib = c(min,max),
sub = "subtitle",
col = "dark red")
We can control the appearance of the graph using the following options:
- main = title at top of graph
- xlab = x-axis label
- xlim = minimum/maximum of limits on x-axis
- ylab = y-axis label
- ylim = minimum/maximum of limits on y-axis
- sub = subtitle
- col or fcol = color of filled portion of graph (use fcol if data is in groups)
6.1 bargraphs - for one or two catagorical variables
A bargraph is used to display COUNTS for the number of observations in a datafile within certain categories. For example, let’s say we want to view the number of children in each of the boy/girl categories, or the right/left hand categories.
library(mosaic)
bargraph(~sex, data=KidsFeet, main = "sex")
bargraph(~domhand, data=KidsFeet, main = "dominant hand")
bargraph(~sex |domhand, data=KidsFeet, main = "sex by dominant hand")
bargraph(~sex, groups=domhand, data=KidsFeet, main = "sex grouped by dominant hand")
What about with summarized data?
<-data.frame(twoparent =c(407,45),
df2 always =c(61,16),
divorce =c(231,29),
nocohab =c(124,11),
withcohab =c(193,51),
row.names =c("yes", "no"))
df2
## twoparent always divorce nocohab withcohab
## yes 407 61 231 124 193
## no 45 16 29 11 51
barplot(as.matrix(df2), legend.text =T, col =c("purple","magenta"))
6.2 gf_bargraph - for categorical data
Note the the gf_bargraph() function uses title= instead of main= for the title of the graph.
library(ggformula)
gf_bar(~sex, fill = ~ domhand, data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())
gf_props(~sex, fill = ~ domhand, data=KidsFeet,
title = "main title",
xlab = "Gender of student (Boy or Girl)",
position = position_dodge())
gf_bar() uses COUNTS of the data, where as gf_props() uses PROPORTIONS on the y-axis.
6.3 pie - for categorical data
<-c(278, 523, 98, 101)
observed
pie(observed, labels =c("Elementary", "Secondary", "College Credits", "College Degree"))
6.4 dotplot - for one numerical variable
library(mosaic)
# Notice that the "P" is captialized in dotPlot!
dotPlot(~width, data=KidsFeet, xlab = "width of foot (inches)")
dotPlot(~width, groups= sex, data=KidsFeet, pch=16,
xlab = "width of foot (inches)")
6.5 gf_dotPlot - for one numerical variable
library(ggformula)
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.1,
stackgroups = FALSE,
xlab = "width of foot (inches)")
gf_dotplot(~width, fill = ~sex, data=KidsFeet, binwidth = 0.2,
stackgroups = TRUE, binpositions="all",
xlab = "width of foot (inches)")
6.6 bwplot - for one numerical variable or one numerical/one categorical variable
library(mosaic)
bwplot(~width, data=KidsFeet, xlab = "width of foot (inches)")
bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)")
bwplot(sex~width , data=KidsFeet, xlab = "width of foot (inches)", horizontal=TRUE, pch="|")
bwplot(width~sex , data=KidsFeet, xlab = "width of foot (inches)")
6.7 gf_histogram - for one numerical variable
library(ggformula)
gf_histogram(~width | sex,
binwidth = 0.25,
xlim = c(7,11),
ylab = "Percent of Students", # y-axis label
data=KidsFeet, xlab = "width of foot (inches)",
layout= c(2,1))
6.8 histogram - for one numerical variable
library(mosaic)
histogram(~width, data=KidsFeet, xlab = "width of foot (inches)")
histogram(~width | sex, data=KidsFeet, xlab = "width of foot (inches)",
layout= c(1,2)) # layout using 1 column and 2 rows
6.9 gf_boxplot - for one numerical variable
Notice that the ggformula function for a box-and-whisker plot is gf_boxplot (not gf_bwplot).
library(ggformula)
gf_boxplot(width~sex , data=KidsFeet)
6.10 xyplot - for two numerical variables
An xyplot is also called a scatterplot.
library(mosaic)
xyplot(length~width, data=KidsFeet, xlab = "width of foot (inches)")
xyplot(length~width, groups=sex, pch=16, cex = 1.4,
auto.key=list(space="right", title = "Gender", cex=.75),
data=KidsFeet, xlab = "width of foot (inches)")
6.11 gf_xyplot - for two numerical variables
library(ggformula)
gf_point(length~width, data=KidsFeet,
size = 3, color = ~ sex)
6.12 barplot - for comparing two proportions
tally(sex ~ domhand, data=KidsFeet)
## domhand
## sex L R
## B 5 15
## G 3 16
barplot(prop.table(tally(sex ~ domhand, data=KidsFeet), margin = 2),
las = 1,
main = "Proportion of dominant hand by gender",
args.legend = list(x = 2, y = 0.5),
# or args.legend = list(x = "bottomright")
col = c("plum", "palegreen"),
legend.text = c("boys", "girls"))