17.5 Graphs for metric variables

  • hist(data$variable): Histogramm
  • Boxplot
    • boxplot(x): One variable
    • boxplot(data$variable1 , data$variable2, data$variable3): Several variables
    • boxplot(data$variable1 data$variable2): Grouped
  • Scatterplot with regression line
    • plot(independentVariable, outcomeVariable)
    • abline(h=mean(outcomeVariable, na.rm=TRUE): Add straight line for the mean
    • abline(lm(outcomeVariable ~ independentVariable)): Add regression line


17.5.1 Example: Graphs for metric variables

# Graphs for categorical variables
# Generate a categorical variable in the dataset swiss
swiss$dummy.catholic[swiss$Catholic>=70] <- 1
swiss$dummy.catholic[swiss$Catholic<=70] <- 0

swiss$cat.fertility[swiss$Fertility<=64] <- 1
swiss$cat.fertility[swiss$Fertility>64 & swiss$Fertility<=70] <- 2
swiss$cat.fertility[swiss$Fertility>70 & swiss$Fertility<=78] <- 3
swiss$cat.fertility[swiss$Fertility>78] <- 4
swiss <- swiss[order(swiss$Fertility),]
swiss

# Barplot with absolute frequencies
barplot(table(swiss$dummy.catholic))

# Barplot with relative frequencies
barplot(prop.table(table(swiss$dummy.catholic)))

# barplot
barplot(table(swiss$dummy.catholic), horiz=TRUE, las=1)

# Stacked barplot
table(swiss$dummy.catholic, swiss$cat.fertility)
windows()
barplot(table(swiss$dummy.catholic, swiss$cat.fertility), 
        names.arg = c("x <= 64","64 < x <= 70","70 < x <= 78","78 < x"))

# Grouped barplot
barplot(table(swiss$dummy.catholic, swiss$cat.fertility), beside=TRUE)

# Four histograms
windows()
par(mfrow=c(2,2))
for (i in c(.8, .2, 2, 1)){
hist(rnorm(n=101,sd=i))
}


# Plot a function and a legend
windows()
curve(x^2,col="blue", xlim=c(0,10), ylim=c(0,10))
curve(x^3,add=TRUE,col="red")
curve(x^5,add=TRUE,col="green")
legends <- c("xhoch2", "xhoch3", "xhoch5")
legend(8,10, c("xhoch2", "xhoch3", "xhoch5"), pch = 20, 
       col=c("blue","red","green"))
abline(h=5)



17.5.2 Exercise: Graphs 1

  1. The data set swiss contains data for 47 French speaking provinces in Switzerland in the year 1888 (see ?swiss). Inspect the data set with the usual functions (How many variables are there? What class do these variables have? etc.).
  2. Create a scatterplot for the variables Education (x-Achse) and Fertility (y-Achse)
    • The title of the graph is “The relationship between education and fertility”.
    • The label for the x-Axis is “Education”.
    • The label for the y-Axis is “Fertility”.
    • Both the x-Axis as well as the y-Axis have a value range from 10 to 90.
  3. Create the following plot (see the histogram below) for the variable InfantMortality.
    • Tip: Check out Quick R for the normal curve. alt text


17.5.3 Solution: Graphs 1

17.5.4 Exercise: Graphs 2

  1. Open the data set ESS4e04_de.dta and inspect it using different functions. Generate a barplot for the absolute frequencies of the variable wahlentscheidung.str.
  2. Generate a grouped barplot for the variables geschlecht(geschlecht.str) and wahlentscheidung (wahlentscheidung.str). Color the single bars according to the parties. Add a legend that indicates which bar belongs to which party (Tip: ?barplot to check for col).
  3. Create a histogram for the variable Alter (age). Change the number of bars so that about 15 bars a displayed. In addition the x-axis should cover the range from 0 to 100.
  4. Plot the following three graphs on one single page
    • Grouped boxplots for the variables age (x-axis) and geschlecht.str (y-axis).
    • Scatterplot for the variables age (x-axis) and hheinkommen (y-axis).
    • Barplot that shows the means of the variables polinteresse.str (y- Axis) separately for women and men (use geschlecht.str, x-axis).
    • Save this plot under the name “3in1.pdf” in your working directory.


17.5.5 Solution: Graphs 2


CLICK TO SHOW OR HIDE THE PART ON GGPLOT2.


CLICK TO SHOW OR HIDE THE PART ON GGPLOT2.