1 Introduction

Welcome to the R guide! This is an evolving document, and I appreciate your feedback about how I can make it most helpful. Please email me (mbrown35@stanford.edu) if you have comments or suggestions.

1.1 Who this guide is for

I have written this guide for two audiences. The narrow audience is students in Prof. Hunt Allcott’s Empirical Environmental Economics (E3) course at Stanford.1 This website is a companion text for that course. E3 involves several data-based problem sets as well as a final project. We want students to be able to take E3 without any prior background in programming. One goal of this guide is to ensure that this is possible: it is supposed to cover all of the coding knowledge that you’ll need to complete the E3 assignments.

I emphasize that even with this guide, E3 will be challenging if it is your first exposure to programming. However, your effort will pay off – after completing the course, you will have confidence with the basic skills of data analysis in R, and you will have practiced these skills in concrete, important contexts. This benefit may be well worth your time.

The broader audience is any economics student who wants to get started with data analysis. The programming skills covered here are in no way specific to E3. Nearly every field of economics relies on empirical evidence. However, most undergraduate economics departments (including Stanford) do not systematically teach students the programming skills they need to conduct empirical investigations with real data. Some students glean these skills on their own, for example by taking an upper level elective that requires data work, providing research assistance to a professor, or writing an undergraduate thesis. This informal process can be frustrating. When I was learning to code as a sophomore in college, I found the existing online resources for coding help to be overwhelming. I couldn’t tell which tips were important and which were extraneous. I wished there was a guide specifically designed for social scientists who were just getting started. My goal is for this website to help fill that gap.

1.2 What this guide will teach you

The goal is to illustrate every step of the process: from downloading R to cleaning datasets to implementing analyses to presenting readable outputs. See the outline below for details about the topics that we’ll cover.

For E3 students in particular, this guide helps with many R concepts that you’ll need to complete the problem sets. That doesn’t mean that you won’t need to do some googling or reference any function documentation.

1.3 What this guide won’t teach you (and some external resources to help with that)

This guide will not teach you how to become a great programmer. I myself am not a great programmer. My code gets the job done. As a rule, the goal of this guide is to get you that far, and no farther.

In service of this goal, I avoid delving into technical details unless necessary. My terminology is loose and I sometimes call programming objects by incorrect names. In general, I try to convey the way I think when I’m writing code, rather than facts about how R works. This differentiates this website somewhat from existing guides. My hope is that this focus will be practically useful for beginners. Students who are interested in more detailed guides can reference R for Beginners, R for data science, ModernDive, and many other online guides. Some of my exposition is directly inspired by these resources.

I do not adhere to any particular style in my example code. I discuss a few best practices, but again – the goal is only to write code that is readable and does what we want it to do. As you code more, you will develop your own stylistic tastes. For a short discussion of principles of style, see the appendix of Gentzkow and Shapiro’s guide. I find the Gentzkow and Shapiro approach particularly helpful when organizing large projects, which E3 students will have to do for the final paper.

Discussion of the causal inference methods and econometrics at the heart of applied economics is well beyond scope of this guide. My favorite resource for these is Scott Cunningham’s online textbook, which comes with convenient sample code snippets in R. Alternatively, Angrist and Pischke’s Mostly Harmless Econometrics is a publicly available classic, and Mastering Metrics by the same authors covers similar material at a more basic level.

1.4 Outline

Section 2 is a “setup” guide: downloading R and RStudio, and running our first lines of code.

Section 3 discusses the foundational concepts of programming in RStudio.

Section 4 discusses plotting with ggplot. I also introduce specialized tools for binned scatterplots and maps.

Section 5 discusses data cleaning using dplyr verbs.

Section 6 discusses regression. I demonstrate fixest, my preferred package for fixed effects and instrumental variables.

Section 7 discusses putting all of these pieces together – the anatomy of an RScript.

1.5 Acknowledgements

I thank Hunt Allcott for his dedication to E3 and his support in creating this guide. I thank Tess Snyder and Lea Bottmer for comments. All errors are my own.