6.6 Continuous vs. Continuous: Scatterplot with optional regression line

Visualizing the relationship between two continuous variables is typically done with a scatterplot.

6.6.1 Base R

The scatterplot is one of the simplest plots to create in base R. Simply call plot() with two continuous variables. Adding a regression line is simple, as well. The lm() function fits a linear model, or linear regression in the case of two continuous variables, and abline(), when fed an lm object, plots the regression line.

# With plot, the order is x, y
plot(mydat$stheight, mydat$stweight,
     xlab = "Height (m)", ylab = "Weight (kg)",
     main = "Regression Line")

# Fit regression
reg.fit <- lm(stweight ~ stheight, data = mydat)

# Add regression line
abline(reg.fit)

To add a smoother that is not constrained to be linear instead of a line use lines() and smooth.spline().

# Add a smoother instead of a line
plot(mydat$stheight, mydat$stweight,
     xlab = "Height (m)", ylab = "Weight (kg)",
     main = "Smoother")

# Must remove missing data to use smooth.spline()
SUB <- !is.na(mydat$stheight) & !is.na(mydat$stweight)
lines(smooth.spline(mydat$stheight[SUB], mydat$stweight[SUB]),
      col = "red", lwd = 2)

# Add both a line and a smoother
plot(mydat$stheight, mydat$stweight,
     xlab = "Height (m)", ylab = "Weight (kg)",
     main = "Smoother and Regression Line")

# Fit regression
reg.fit <- lm(stweight ~ stheight, data = mydat)

# Add regression line
abline(reg.fit, col = "blue", lwd = 2)

# Add smoother
SUB <- !is.na(mydat$stheight) & !is.na(mydat$stweight)
lines(smooth.spline(mydat$stheight[SUB], mydat$stweight[SUB]),
      col = "red", lwd = 2)

par(mfrow=c(1,1))

6.6.2 ggplot

In ggplot(), use geom_points() to plot points and geom_smooth() to plot the regression line. It is very easy to add a confidence band for the regression line, as well, by adding se = TRUE.

mydat %>% 
  ggplot(aes(x = stheight, y = stweight)) +
  geom_point() +
  # method = "lm" assumes the formula is y ~ x
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Height (m)", y = "Weight (kg)")

Alternatively, you can add a smoother by changing the arguments inside geom_smooth().

# Add a smoother instead of a line
mydat %>% 
  ggplot(aes(x = stheight, y = stweight)) +
  geom_point() +
  # stat = "smooth" assumes the formula is y ~ smooth function of x
  geom_smooth(stat = "smooth", se = TRUE) +
  labs(x = "Height (m)", y = "Weight (kg)") 

Or you can add both the linear regression line and a smoother (only the new part of the code shown).

  geom_smooth(method = "lm", se = TRUE) +
  geom_smooth(stat = "smooth", se = FALSE, col = "red") +