17 Lab 8 (R)

17.1 Lab Goals & Instructions

Goals

Review key concepts from lab
Run through an applied example similar to your final paper
Bonus! Learn about AIC/BIC

Instructions

Today’s lab you will be following along live. If you’d like to try out the examples on your own, I’ve included the incomplete and complete script files from today’s examples.

Lab Files

Download UM_DEIStudentSurvey.rda

Lab file without full code
Download 401-1-Review.R

Lab file with complete code
Download 401-1-Review_Complete.R

Reference code for formatting graphs in ggplot
Download formatggplot.R

17.2 BONUS: AIC/BIC

This is a topic we may not cover in class this quarter, but it is helpful to know when evaluating what your final model should be. As you add or remove variables, you may wonder if they improve your model. Perhaps you add a variable and it is not significant. Should you remove it just because it’s not significant? Should you keep it? Knowing the AIC and BIC of each model can help you decide (of course if your hypotheses states you are testing that variable, it should stay in the model regardless).

What are AIC and BIC?
AIC stands for Akaike Information criteria. BIC stands for Bayesian Information Criteria. Both are forms of probabilistic model selection. Probabilistic model selection or “information criteria” is a type of analysis that creates a score for your model to help you compare model fit and parsimony.

To understand what these tests are doing, you need to understand Maximum Likelihood Estimation (MLE). MLE is the calculus underlying most regression models. It’s what helps produce your coefficient estimates. Without going into a full mathematical explanation of MLE and the log liklihood functions it uses, I will simply state that both AIC and BIC come up with a formula using these underlying statistics to create a score for your model.

AIC and BIC are extremely easy tests to run after a regression, which is why I find them so helpful to compare models. The lower the AIC or BIC, the better the model!

Here’s a resource to learn more about information criteria: https://machinelearningmastery.com/probabilistic-model-selection-measures/