Chapter 6 Logistic Regression

When do we use this?: When $Y$ is a binary variable (a categorical variable with exactly 2 outcomes).

6.1 Fitting the Model

m1 <- glm( y ~ x, data = , family = binomial)

Replace y with your categorical response variable.
Replace x with your explanatory variable.
data = your data set

6.2 Creating Mosaic Plots

mosaicplot( x ~ y, data = , xlab = "X Axis Label", ylab = "Y Axis Label" , main = "Plot Title")

Replace y with your categorical response variable.
Replace x with your categorical explanatory variable.
data = your data set
You can change what is in the parentheses to reflect the labels and title you want.

6.3 Creating an Empirical Logit Plot

logOddsPlot <- function(x, y, xname, formulahere){
  sort = order(x)
  x = x[sort]
  y = y[sort]
  a = seq(1, length(x), by=.05*length(y))
  b = c(a[-1] - 1, length(x))
  
  prob = xmean = ns = rep(0, length(a)) # ns is for CIs
  for (i in 1:length(a)){
    range = (a[i]):(b[i])
    prob[i] = mean(y[range])
    xmean[i] = mean(x[range])
    ns[i] = b[i] - a[i] + 1 # for CI 
  }
  
  extreme = (prob == 1 | prob == 0)
  #prob[prob == 0] = min(prob[!extreme])
  #prob[prob == 1] = max(prob[!extreme])
  prob[prob == 0] = .01
  prob[prob == 1] = .99
  
  g = log(prob/(1-prob))
  
  dataHere <- data.frame("x" = b, "LogOdds" = g)
  
  suppressMessages(library(ggplot2))
  
  ggplot(dataHere, aes(x =xmean, y = LogOdds)) + geom_point() + geom_smooth(formula = formulahere, method = "lm", se = FALSE) + labs(x = xname, y = "Log Odds")
}

Citation: This code was adapted from (http://alexschell.github.io/emplogit.html) [Alex Schell’s website].

To use the function, you will need to specify the inputs, meaning the pieces of information the computer needs in order to run the code.

x: the column containing the x variable (Titanic$Fare)
y: the column containing the y variable (Titanic$Survived). Note that this code sometimes requires you to put as.numeric around your variable (as.numeric(Titanic$Survived)) as it is specifically looking for a number between 0 and 1.
xlab: This is the x axis label you want on your graph, like “Age”. Make sure you put quotes around your label.
formulahere: This is the shape of the relationship you want. If you want a line, use y~x. If you want a 2nd order polynomial, use y~poly(x,2)

6.4 Making Predictions

probabilities <- predict(Model, type = "response")

predicted.Y <- ifelse(probabilities > THRESHOLD , 1, 0)

The first line creates predicted probability that $Y_i = 1$ for each row $i$ .
The second row assign all rows with predicted probabilities greater than THRESHOLD as a 1, and all probabilities less than or equal to THRESHOLD as 0.
Replace “model” with the name of your model.
Replace DATA with the name of your data set.
Replace THRESHOLD with your chosen threshold value.

6.5 Confusion Matrix

knitr::kable( table("Prediction"= predicted.Y, "Actual" = DATA$Y) )

Replace DATA with the name of your data set.
Replace Y with your Y variable.

6.6 Hypothesis Testing

anova(Model1,Model2, test = "LRT")

This code is for a Nested Likelihood Ratio Test.
Replace Model1 with the name of your smaller model, and Model2 with the name of your larger model.

anova(Model1, test = "Chisq")

This code is for a Drop in Deviance Test.
Replace Model1 with the name of your model.