Chapter 5 LSLR Regression
- When do we use this?: When \(Y\) is a numeric variable.
5.1 Fit a LSLR Line
m1 <- lm( y ~ x , data = dataset)
- m1 = replace with any name you want for your model. However, names cannot contain spaces or special characters (*, &, etc)
- y = replace with your response variable
- x = replace with your explanatory variable
- dataset = replace with the name of your data set
NOTE: This code does NOT plot the line. The job of this code is to do all the math we need for LSLR. It computes the estimated slope and intercept, the R squared, the residuals, the fitted values, and a lot more!
5.2 To Draw the LSLR line on top of a scatter plot
library(ggplot2)
ggplot( data , aes( x = , y = ) ) + geom_point( ) + geom_smooth(formula = y ~ x, method="lm", se=FALSE)
- data= replace with the name of your data set
x =
your x variable (no quotes)y =
your y variable (no quotes)- DO NOT change anything in the
geom_smooth
part of the code. Leave it as y ~ x.
5.3 Obtain the Summary Table
summary(m1)
- m1 = replace with the name of your LSLR model
5.4 Obtain the Estimated Slope and Intercept
m1$coefficients
- m1 = replace with the name of your LSLR model
5.5 Obtain the R-squared
summary(m1)$r.squared
- m1 = replace with the name of your LSLR model
5.6 Obtain the residuals
m1$residuals
- m1 = replace with the name of your LSLR model
5.7 Creating a residual plot
To create a residual plot, you just create a scatter plot with the residuals (above) on the Y axis.
5.8 Obtain the studentized residuals
library(MASS)
studres(m1)
- m1 = replace with the name of your LSLR model
5.9 Obtain the predicted (fitted) values
m1$fitted.values
- m1 = replace with the name of your LSLR model
5.10 Find the Correlation between Two Variables
cor(dataset$var1, dataset$var2)
- dataset = replace with the name of your data set
- var1 = replace with the name of your first variable
- var2 = replace with the name of your second variable
5.11 Creating a QQPlot
qqnorm(m1$residuals, main ="The Title you Want")
qqline(m1$residuals)
- m1 needs to be replaced with the name of your model
5.12 Analysis of Variance With one model only
anova(model)
This gives you the breakdown of the sum of squares for one model.
5.13 Nested F-test
anova(model1, model2)
- model1 = the smaller model
- model2 = the larger model; model 1 must be nested in model2
5.14 Best Subset Selection
library(leaps)
BSSOut <- regsubsets( y ~ x1 + x2 + x3, data = , nvmax = )
- Replace y with your response variable
- Replace x1, x2, x3 with your possible predictors (as many as you like)
- data = your data set
- nvmax = the total number of coefficients you want R to consider
- NOTE: For categorical variables, you have to make sure this number incorporates the number of levels. One categorical predictor with 4 levels would contribute 3 to the nvmax total.
plot(BSSOut, method = "adjr2")
- This creates a plot to allow you to see your outcome.
- You can change
adjr2
which means \(R^2_{adj}\), to bebic
if you want to use the BIC as a metric orCp
if you want to use Mallows’ Cp.