Chapter 1 Introduction

This tutorial will introduce key concepts in machine learning-based causal inference. It’s an ongoing project and new chapters will be uploaded as we finish them. Topics currently covered:

  • Introduction to Machine Learning
  • Average Treatment Effects
  • Heterogeneous Treatment Effects
  • Policy Evaluation
  • Policy Learning
  • Causal Panel Data
  • Matrix Completion Methods

Please note that this is currently a living document. If you find any issues, please feel free to contact Undral Byambadalai at The “changelog” below will keep track of major updates and additions.

1.1 Getting started

We’ll illustrate key concepts using snippets of R code. Each chapter in this tutorial is self-contained. You can download its RMarkdown source by clicking on the link at the beginning of each chapter. You should be able to rerun most of the code on a different dataset by editing some of the common variables at the top of the notebook (e.g., data, covariates, and outcome). Exceptions to this rule are marked as comments in the text, so please read them carefully.

We try to restrict our package dependencies to a minimum, but at the moment you will at a minimum need to install the latest version of the following packages.

  • ggplot2
  • glmnet
  • grf
  • policytree
  • splines
  • lmtest
  • sandwich
  • causalTree(*)

(*) The causalTree package is not in CRAN, the most common R repository. To install it, first install devtools package and use the command devtools::install_github('susanathey/causalTree').

1.2 Changelog

  • Apr 3, 2021. “Chapter 2: Introduction to Machine Learning” is up.
  • Apr 12, 2021.
    • Fixed broken link to RMD source. Chapters now have a link at the beginning.
    • Introduction to ML: Added code to produce colored tables.
  • Apr 13, 2021.
    • Additional links to RMD source.
    • Added chapter ATE-1.
  • Apr 16, 2021. Minor fixes to chapter ATE-1.
  • Apr 19, 2021. Removing knitr and kableExtra explicit dependency from Chapter 2. Fixed other minor typos and inconsistencies in Chapter 2.
  • Apr 26, 2021. Uploading Chapters 3 and 4 (beta versions).
  • May 4, 2021. Added Chapter 5 (beta version).
  • May 19, 2022. Added a section on assessing heterogeneity with RATE (section 4.3) in Chapter 4.
  • July 20, 2022. Visualization revisions in Chapters 4 and 6.
  • Oct 7, 2022.
    • Chapter 2: added figure numbers/references and fixed typos.
    • Changed code to generate train/test split (added randomization) in Chapters 2 and 6.
  • Oct 19, 2022. Added figure numbers/references and fixed typos in Chapters 3 and 4.
  • Oct 20, 2022. Added figure numbers/references and fixed typos in Chapters 5 and 6.
  • Nov 20, 2022. Minor edits in Chapters 2 and 3.
  • Dec 16, 2022. Minor edits in Chapters 2-6.
  • Apr 6, 2023.
    • Added Chapters 7 and 8 (beta versions).
    • Added Chapter 9 which is Additional Resources.
  • Apr 17, 2023. Updated further readings in Chapter 7.
  • May 11, 2023. Minor edits in Chapters 7 and 8.

1.3 Acknowledgments

This tutorial first started as an extension of documents written in part by research assistants, students, and postdocs at the Golub Capital Social Impact Lab. We thank the authors of those documents: Undral Byambadalai, Vitor Hadad, Kaleb K. Javier, Niall Keleher, Sylvia Klosin, Nicolaj Søndergaard Mühlbach, Janelle R. Nelson, Xinkun Nie, Matt Schaelling, and Erik Sverdrup. We also thank other people who have been reading this current draft and providing “live” feedback: Cesar Augusto Lopez, Undral Byambadalai, Sylvia Klosin, Kristine Koutout, Sarah McDonald, Shanjukta Nath, Emil Palikot, Charles Pebereau, Erik Sverdrup, and Rachel Zhou.