Chapter 5 LSLR Regression

  • When do we use this?: When \(Y\) is a numeric variable.

5.1 Fit a LSLR Line

m1 <- lm( y ~ x , data = dataset)

  • m1 = replace with any name you want for your model. However, names cannot contain spaces or special characters (*, &, etc)
  • y = replace with your response variable
  • x = replace with your explanatory variable
  • dataset = replace with the name of your data set

NOTE: This code does NOT plot the line. The job of this code is to do all the math we need for LSLR. It computes the estimated slope and intercept, the R squared, the residuals, the fitted values, and a lot more!

5.2 To Draw the LSLR line on top of a scatter plot

  • library(ggplot2)
  • ggplot( data , aes( x = , y = ) ) + geom_point( ) + geom_smooth(formula = y ~ x, method="lm", se=FALSE)
    • data= replace with the name of your data set
    • x = your x variable (no quotes)
    • y = your y variable (no quotes)
    • DO NOT change anything in the geom_smooth part of the code. Leave it as y ~ x.

5.3 Obtain the Summary Table

summary(m1)

  • m1 = replace with the name of your LSLR model

5.4 Obtain the Estimated Slope and Intercept

m1$coefficients

  • m1 = replace with the name of your LSLR model

5.5 Obtain the R-squared

summary(m1)$r.squared

  • m1 = replace with the name of your LSLR model

5.6 Obtain the residuals

m1$residuals

  • m1 = replace with the name of your LSLR model

5.7 Creating a residual plot

To create a residual plot, you just create a scatter plot with the residuals (above) on the Y axis.

5.8 Obtain the studentized residuals

library(MASS)

studres(m1)

  • m1 = replace with the name of your LSLR model

5.9 Obtain the predicted (fitted) values

m1$fitted.values

  • m1 = replace with the name of your LSLR model

5.10 Find the Correlation between Two Variables

cor(dataset$var1, dataset$var2)

  • dataset = replace with the name of your data set
  • var1 = replace with the name of your first variable
  • var2 = replace with the name of your second variable

5.11 Creating a QQPlot

  • qqnorm(m1$residuals, main ="The Title you Want")
  • qqline(m1$residuals)
    • m1 needs to be replaced with the name of your model

5.12 Analysis of Variance With one model only

  • anova(model)

This gives you the breakdown of the sum of squares for one model.

5.13 Nested F-test

anova(model1, model2)

  • model1 = the smaller model
  • model2 = the larger model; model 1 must be nested in model2

5.14 Best Subset Selection

library(leaps)

BSSOut <- regsubsets( y ~ x1 + x2 + x3, data = , nvmax = )

  • Replace y with your response variable
  • Replace x1, x2, x3 with your possible predictors (as many as you like)
  • data = your data set
  • nvmax = the total number of coefficients you want R to consider
  • NOTE: For categorical variables, you have to make sure this number incorporates the number of levels. One categorical predictor with 4 levels would contribute 3 to the nvmax total.

plot(BSSOut, method = "adjr2")

  • This creates a plot to allow you to see your outcome.
  • You can change adjr2 which means \(R^2_{adj}\), to be bic if you want to use the BIC as a metric or Cp if you want to use Mallows’ Cp.