7 Plots in 2.5 and 2.6 with base R
This section will show you (one way of) how to generate those plots in Slide 2.5&2.6 with base R
.
7.1 Set up
We will be using the StudentSurvey dataset from Lock5withR package again and iris dataset from ISLR. Also the CricketChirps dataset from Lock5withR is used in the Regression Line part.
library(Lock5withR)
library(ISLR)
7.2 Scatterplot
Scatterplots are used to visualise paired data, where each data point has values for both quantitative variables.
7.2.1 Two example scatter plots
Preparation for plotting.
<- na.omit(StudentSurvey[c('TV', 'Exercise')])
dataStu <- round(cor(dataStu$TV, dataStu$Exercise), digits=2)
corStu <- round(cor(iris$Petal.Length, iris$Petal.Width), digits=2)
corIris
<- data.frame( x = c(30,5),y = c(30,0.5),
annotation label = c(paste('r=',corStu,sep=''),
paste('r=',corIris,sep='')))
7.2.1.1 Exercise hours Vs TV hours
The variables are TV (quantitative variable for the number of hours spent watching TV per week) and Exercise (quantitative variable for the number of hours spent exercising per week). These data come from the StudentSurvey dataset, where the participants are students.23
plot(x = dataStu$TV, y = dataStu$Exercise,
pch=20, col='salmon',
main='Numbers of hours a week of exercise vs TV watching',
xlab='TV hours', ylab='Exercise hours')
<- round(cor(dataStu$TV, dataStu$Exercise), digits=2)
corStu legend(x = 25,y=30, paste('r=',corStu,sep=''), bg='yellow')
7.2.1.2 Petal Width vs Petal Length
This is the famous Iris (flower) dataset (used by R. A. Fisher, a genius who created the foundation for modern Statistics). The variables are Petal Length (quantitative variable) and Petal Width (quantitative variable) 24
plot(x = iris$Petal.Length, y = iris$Petal.Width,
pch=20, col='skyblue',
main='Petal width vs length of iris',
xlab='Petal Length (mm)', ylab='Petal Width (mm)')
<- round(cor(iris$Petal.Length, iris$Petal.Width), digits=2)
corIris
legend(x = 5,y=1, paste('r=',corIris,sep=''),bg='yellow')
7.2.2 Scatterplots, including a third variable
7.2.2.1 Student and Balance
Average Balance that customer had on their credit card and their Income. Use different colors or symbols to display difference groups in the third variable. In this case, red indicates Student while black represents Non-student. 25
par(mar=c(5, 4, 4, 8), xpd=TRUE)
plot(x = Default$balance, y = Default$income,
pch=20, col=Default$student,
main='Credit Card Data',
xlab='Balance($)', ylab='Income($)')
legend(x="right", legend = c('Student','Non-student'),
# change inset
inset=c(-0.3,0),col=c('red',"black"), pch=20)
7.3 Regression Line
Is the chirp rate of crickets associated with temperature?
Data was collected (in 1898) by two researchers who measured Chirps (quantitative variable – chirps per minute) and temperature (quantitative variable, recorded in Celsius). The data are available in StatKey, called CricketChirps (although their dataset uses Fahrenheit instead of Celsius).26
7.3.1 Scatter plot
= CricketChirps$Chirps
x = round((CricketChirps$Temperature-32)/1.8, 1)
y # basic
plot(x = x, y = y,
pch=20,
main='Temperature vs Chirps',
xlab='Chirps', ylab='Temperature(C)',
xaxt = 'n', yaxt = 'n')
# x axis
axis(side=1, at = seq(80,200,20))
# y axis
<-seq(12.5,27.5, 2.5)
ytickaxis(side=2, at=ytick, labels = FALSE)
text(par("usr")[1], ytick,
labels = format(ytick,1), srt = 0, pos = 2, xpd=TRUE)
abline(h=seq(12.5,27.5, 2.5), v=seq(80,200,20), col="gray", lty=3)
7.3.2 Regression model
Generate a simple linear regression model with lm()
function and get the intercept and slope from the model.
<- lm(formula =y~x)
model <- round(model$coefficients[2],2)
slope <- round(model$coefficients[1],2) intercept
Add regression line into the plot and display the equation.
plot(x = x, y = y,
pch=20,
main='Temperature vs Chirps',
xlab='Chirps', ylab='Temperature(C)',
xaxt = 'n', yaxt = 'n')
# x axis
axis(side=1, at = seq(80,200,20))
# axis(side=2, at = seq(12.5,27.5, 2.5), labels = FALSE)
# y axis
<-seq(12.5,27.5, 2.5)
ytickaxis(side=2, at=ytick, labels = FALSE)
text(par("usr")[1], ytick,
labels = format(ytick,1), srt = 0, pos = 2, xpd=TRUE)
#grid
abline(h=seq(12.5,27.5, 2.5), v=seq(80,200,20), col="gray", lty=3)
#regression line
lines(x, fitted(model), col="red")
# regression model
<- paste(
equation 'Temp =', format(intercept, nsmall = 2), '+', round(slope, 2),'* Chirps',
sep = '')
text(x = 160, y = 17.5, equation)