7 Plots in 2.5 and 2.6 with base R

This section will show you (one way of) how to generate those plots in Slide 2.5&2.6 with base R.

7.1 Set up

We will be using the StudentSurvey dataset from Lock5withR package again and iris dataset from ISLR. Also the CricketChirps dataset from Lock5withR is used in the Regression Line part.

library(Lock5withR)
library(ISLR)

7.2 Scatterplot

Scatterplots are used to visualise paired data, where each data point has values for both quantitative variables.

7.2.1 Two example scatter plots

Preparation for plotting.

dataStu <- na.omit(StudentSurvey[c('TV', 'Exercise')])
corStu <- round(cor(dataStu$TV, dataStu$Exercise), digits=2)
corIris <- round(cor(iris$Petal.Length, iris$Petal.Width), digits=2)

annotation <- data.frame( x = c(30,5),y = c(30,0.5),
  label = c(paste('r=',corStu,sep=''), 
            paste('r=',corIris,sep='')))

7.2.1.1 Exercise hours Vs TV hours

The variables are TV (quantitative variable for the number of hours spent watching TV per week) and Exercise (quantitative variable for the number of hours spent exercising per week). These data come from the StudentSurvey dataset, where the participants are students.23

plot(x = dataStu$TV, y = dataStu$Exercise, 
     pch=20, col='salmon',
     main='Numbers of hours a week of exercise vs TV watching',
     xlab='TV hours', ylab='Exercise hours')
corStu <- round(cor(dataStu$TV, dataStu$Exercise), digits=2)
legend(x = 25,y=30, paste('r=',corStu,sep=''), bg='yellow')

7.2.1.2 Petal Width vs Petal Length

This is the famous Iris (flower) dataset (used by R. A. Fisher, a genius who created the foundation for modern Statistics). The variables are Petal Length (quantitative variable) and Petal Width (quantitative variable) 24

plot(x = iris$Petal.Length, y = iris$Petal.Width, 
     pch=20, col='skyblue',
     main='Petal width vs length of iris',
     xlab='Petal Length (mm)', ylab='Petal Width (mm)')

corIris <- round(cor(iris$Petal.Length, iris$Petal.Width), digits=2)

legend(x = 5,y=1, paste('r=',corIris,sep=''),bg='yellow')

7.2.2 Scatterplots, including a third variable

7.2.2.1 Student and Balance

Average Balance that customer had on their credit card and their Income. Use different colors or symbols to display difference groups in the third variable. In this case, red indicates Student while black represents Non-student. 25

par(mar=c(5, 4, 4, 8), xpd=TRUE)
plot(x = Default$balance, y = Default$income, 
     pch=20, col=Default$student,
     main='Credit Card Data',
     xlab='Balance($)', ylab='Income($)')
legend(x="right", legend = c('Student','Non-student'), 
       # change inset
       inset=c(-0.3,0),col=c('red',"black"), pch=20)

7.3 Regression Line

Is the chirp rate of crickets associated with temperature?

Data was collected (in 1898) by two researchers who measured Chirps (quantitative variable – chirps per minute) and temperature (quantitative variable, recorded in Celsius). The data are available in StatKey, called CricketChirps (although their dataset uses Fahrenheit instead of Celsius).26

7.3.1 Scatter plot

x = CricketChirps$Chirps
y = round((CricketChirps$Temperature-32)/1.8, 1)
# basic
plot(x = x, y = y, 
     pch=20, 
     main='Temperature vs Chirps',
     xlab='Chirps', ylab='Temperature(C)', 
     xaxt = 'n', yaxt = 'n')
# x axis
axis(side=1, at = seq(80,200,20))

# y axis
ytick<-seq(12.5,27.5, 2.5)
axis(side=2, at=ytick, labels = FALSE)

text(par("usr")[1], ytick,  
     labels = format(ytick,1), srt = 0, pos = 2, xpd=TRUE)

abline(h=seq(12.5,27.5, 2.5), v=seq(80,200,20), col="gray", lty=3)

7.3.2 Regression model

Generate a simple linear regression model with lm() function and get the intercept and slope from the model.

model <- lm(formula =y~x)
slope <- round(model$coefficients[2],2)
intercept <- round(model$coefficients[1],2)

Add regression line into the plot and display the equation.

plot(x = x, y = y, 
     pch=20, 
     main='Temperature vs Chirps',
     xlab='Chirps', ylab='Temperature(C)', 
     xaxt = 'n', yaxt = 'n')
# x axis
axis(side=1, at = seq(80,200,20))
# axis(side=2, at = seq(12.5,27.5, 2.5), labels = FALSE)

# y axis
ytick<-seq(12.5,27.5, 2.5)
axis(side=2, at=ytick, labels = FALSE)
text(par("usr")[1], ytick,  
     labels = format(ytick,1), srt = 0, pos = 2, xpd=TRUE)

#grid
abline(h=seq(12.5,27.5, 2.5), v=seq(80,200,20), col="gray", lty=3)

#regression line
lines(x, fitted(model), col="red")

# regression model
equation <- paste(
  'Temp =', format(intercept, nsmall = 2), '+', round(slope, 2),'* Chirps', 
  sep = '')

text(x = 160, y = 17.5, equation)


  1. Slide 2.5, page 3↩︎

  2. Slide 2.5, page 3↩︎

  3. Slide 2.5, page 16↩︎

  4. Slide 2.6, page 3↩︎