An R Companion to Applied Regression

Author

Peter Baumgartner

Published

2024-06-23 11:09

Preface

This is work in progress

I have finished until chapter 3. Currently I am working on section 4.2 “Multiple linear regression”.

WATCH OUT: This is my personal learning material and is therefore neither an accurate replication nor an authoritative textbook.

I am writing this book as a text for others to read because that forces me to become explicit and explain all my learning outcomes more carefully. Please keep in mind that this text is not written by an expert but by a learner.

Text passages with content I am already familiar I have skipped. Section of the original text where I needed more in-depth knowledge I have elaborated and added my own comments resulted from my personal research.

Be warned! In spite of replicating most of the content this Quarto book may contain many mistakes. All the misapprehensions and errors are of course my own responsibility.

Content and Goals of this Book

This Quarto book collects my personal notes, trials and exercises of An R Companion to Applied Regression by John Fox and Sandford Weisberg (J. Fox and Weisberg 2018).

The R companion refers to a text or course on modern applied regression, such as “Applied Regression Analysis and Generalized Linear Models” (J. Jr. Fox 2015) and “Applied Linear Regression” (Weisberg 2013).

The R companion is associated with three R packages:

  • {car}: Companion to Applied Regression. It includes R functions (programs) for performing many tasks related to applied regression analysis, including a variety of regression graphics. (See: Package Profile A.3)
  • {effects}: Effect Displays for Linear, Generalized Linear, and Other Models. It is useful for visualizing regression models of various sorts that have been fit to data. (See: Package Profile A.5)
  • {carData}: Companion to Applied Regression Data Sets. It provides convenient access to data sets used in the book. (See Package Profile A.4)

Text passages

Quotes and personal comments

My text consists mostly of quotes from the third edition of R companion. Often I made minor editing (e.g., shorting the text) or put the content in my own wording. In this case I couldn’t quote the text as it does not represent a specific annotation in the book. In any case most of the text in this Quarto book is not mine but coming from different resources (R companion book, R help files, websites). My own personal notes I have put either into a remark, resp. notes box or made it clear that it is my own thought.

Glossary

I am using the {glossary} package to create links to glossary entries. (See Package Profile A.10). Glossaries for Markdown and Quarto Documents

R Code 1 : Load glossary

Listing / Output 1: Install and load the glossary package with the appropriate glossary.yml file
## 1. Install the glossary package:
## https://debruine.github.io/glossary/

library(glossary)

## If you want to use my glossary.yml file:

## 1. fork my repo
##    https://github.com/petzi53/glossary-pb

## 2. Download the `glossary.yml` file from
##    https://github.com/petzi53/glossary-pb/blob/master/glossary.yml)

## 3. Store the file on your hard disk
##    and change the following path accordingly

glossary::glossary_path("../glossary-pb/glossary.yml")

If you hover with your mouse over the double underlined links it opens an window with the appropriate glossary text. Try this example: Z-Score.

WATCH OUT! Use the glossary text at your own risk

I have added many of the glossary entries when I was working through other books either taking the text passage of these books I was reading or via an internet recherche from other resources. I have added the source of glossary entry. Sometimes I have used abbreviation, but I need still to provide a key what this short references mean.

One of the books I have read (Statistics With R: Solving Problems Using Real-World Data, abbreviated SwR) by Jenine Harris (Harris 2020) has collected its own glossary that I have used with copy and paste. But I have also used definitions from other books such as Statistical rethinking: a Bayesian course with examples in R and Stan by Richard McElreath (McElreath 2020) or from websites such as Statistics How-To (Glen, n.d.) and Wikipedia (Wikipedia 2024).

If you fork the repository of this quarto book then the glossary will not work out of the box. Load down the glossary.yml file from my glossary-pb GitHub repo, store it on your hard disk and change the path in the code chunk Listing / Output 1.

In any case I am the only responsible person for this text, especially if I have used code from the resources wrongly or misunderstood a quoted text passage.

R Code and Datasets

Packages

The three packages mentioned above ({car}, {effects}, and {carData}) I have downloaded from CRAN and installed on my machine. There is an additional package {carEx} with “Supplemental and Experimental Functions”. These functions are meant to supplement those in the {car} package and experimental functions that may eventually be included in the {car} package.

I have downloaded this packages from R-Forge with the command install.packages("carEx", repos="http://R-Forge.R-project.org"). I got a warning, that the repository was not accessable but in the end my system installed the source package.

Important 1: Note for package binaries

R-Forge provides binaries only for the most recent version of R, but not for older versions. In order to successfully install the packages provided on R-Forge, you have to switch to the most recent version of R or, alternatively, install from the package sources (.tar.gz).

Datasets

R companion

There is a page where one can download the R scripts for every chapter. The files on this page can also be conveniently downloaded by the car::carWeb() function in the {car} package: see ?carWeb for details.

I have downloaded all the files with the command car::carWeb(setup=TRUE) and then stored away in my _archive folder that is ignored by Git.

Referenced book

The datasets for the referenced book “Applied Regression Analysis and Generalized Linear Models” as listed at the Data Sets page is included in the {carData} package. So there was no download necessary. But there are also datasets available for the chapter exercises, which I have downloaded as .zip file and imported at the appropriate place in my note book.

Note 1: Using own data sets

There is an interested note on the page for downloading the datasets:

As a general matter, you should feel free to substitute appropriate data sets of interest to you for those suggested in the various data-analysis exercises.

I am not sure if this will turn out successfully in my learning path, but I will in any case try to follow this recommendation. It seems that it may be not only fun but also an occasion to apply the new knowledge on my own interesting subjects.

Style guide

I am using the Tidyverse Style Guide. I am going to use underscore (_) or snake case to replace spaces as studies has shown that it is easier to read (Sharif and Maletic 2010).

But I will two addition from the Google’s R Style Guide:

  • Start the names of private functions with a dot.
  • Qualify namespace.

Especially the second point (qualifying namespace) is important for my learning. Besides preventing conflicts with functions of identical names from different packages it helps to learn (or remember) which function belongs to which package. I think this justifies the small overhead and helps to make R code chunks self-sufficient. (No previous package loading, or library calls in the setup chunk.) To foster learning the relation between function and package I embrace the package name with curly brakes and format it in bold.

I am using the package name also for the default installation of base R. This wouldn’t be necessary but it helps me to understand where the base R functions come from. What follows is a list of base R packages of the system library included into every installation and attached (opened) by default:

  • {base}: The R Base Package
  • {datsets}: The R Datasets Package
  • {graphics}: The R Graphics Package
  • {grDevices}: The R Graphics Devices and Support for Colours and Fonts
  • {methods}: Formal Methods and Classes
  • {stats}: The R Stats Package
  • {utils}: The R Utils Package

I am not using always the exact code snippets for my replications because I am not only replicating the code to see how it works but also to change the values of parameters to observe their influences.

Resources

Glossary

term definition
CRAN Comprehensive R Archive Network
R-Forge R-Forge offers a central platform for the development of R packages, R-related software and further projects. It is based on FusionForge offering easy access to the best in SVN, daily built and checked packages, mailing lists, bug tracking, message boards/forums, site hosting, permanent file archival, full backups, and total web-based administration. (<a href="https://r-forge.r-project.org/">R-Forge Home</a>)
SwR SwR is my abbreviation of: Harris, J. K. (2020). Statistics With R: Solving Problems Using Real-World Data (Illustrated Edition). SAGE Publications, Inc.
Z-score A z-score (also called a standard score) gives you an idea of how far from the mean a data point is. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is. (<a href="https://www.statisticshowto.com/probability-and-statistics/z-score/#Whatisazscore">StatisticsHowTo</a>)

Session Info

Session Info

Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS Sonoma 14.5
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zagreb
#>  date     2024-06-23
#>  pandoc   3.2 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
#>  colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.4.0)
#>  commonmark    1.9.1      2024-01-30 [1] CRAN (R 4.4.0)
#>  curl          5.2.1      2024-03-01 [1] CRAN (R 4.4.0)
#>  digest        0.6.35     2024-03-11 [1] CRAN (R 4.4.0)
#>  evaluate      0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
#>  fastmap       1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  glossary    * 1.0.0.9003 2024-04-25 [1] Github (debruine/glossary@05e4a61)
#>  glue          1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
#>  highr         0.11       2024-05-26 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
#>  htmlwidgets   1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
#>  jsonlite      1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
#>  kableExtra    1.4.0      2024-01-24 [1] CRAN (R 4.4.0)
#>  knitr         1.47       2024-05-29 [1] CRAN (R 4.4.0)
#>  lifecycle     1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
#>  markdown      1.13       2024-06-04 [1] CRAN (R 4.4.0)
#>  munsell       0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.27       2024-05-17 [1] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
#>  rversions     2.1.2      2022-08-31 [1] CRAN (R 4.4.0)
#>  scales        1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
#>  stringi       1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr       1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
#>  svglite       2.1.3      2023-12-08 [1] CRAN (R 4.4.0)
#>  systemfonts   1.1.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite   0.4.2      2023-05-02 [1] CRAN (R 4.4.0)
#>  xfun          0.45       2024-06-16 [1] CRAN (R 4.4.0)
#>  xml2          1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
#>  yaml          2.3.8      2023-12-11 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────