1 Introduction

Welcome to the R guide! This is a work in progress. I’ll continue to update it as the quarter goes along, and I appreciate your feedback about how I can make it most helpful. Please email me (mbrown35@stanford.edu) if you have comments or suggestions about this guide, or you can always mention it when you see me in class or in office hours.

1.1 Who this guide is for

I anticipate that the guide will be useful for two audiences. The narrow audience is students in Prof. Hunt Allcott’s Empirical Environmental Economics (E3) course at Stanford.1 This website is a companion text for that course. E3 involves several data-based problem sets as well as a final project. We want students to be able to take E3 without any prior background in programming. One goal of this guide is to ensure that this is possible: it is supposed to cover all of the coding knowledge that you’ll need to complete the E3 assignments.

I emphasize that even with this guide, E3 will be challenging if it is your first exposure to programming. However, your effort will pay off – after completing the course, you will have confidence with the basic skills of data analysis in R, and you will have practiced these skills in concrete, important contexts. This benefit may be well worth your time.

The broader audience is any economics student who wants to get started with data analysis. The programming skills covered here are in no way specific to E3. Nearly every field of economics, from economics of education to labor economics to macroeconomics to market design, relies on empirical evidence. However, most undergraduate economics departments (including Stanford) do not systematically teach students the programming skills they need to conduct actual empirical investigations. Some students glean these skills on their own, for example by taking an upper level elective that requires data work, providing research assistance to a professor, or writing an undergraduate thesis. This informal process can be frustrating. When I was learning to code as a sophomore in college, I found the existing online resources for coding help to be overwhelming. I couldn’t tell which tips were important and which were extraneous. I wished there was a guide specifically designed for social scientists who were just getting started. My hope is that this website can fill that gap.

1.2 What this guide will teach you

The goal is for this guide to help you through every step of the process: from downloading R to cleaning datasets to implementing analyses to presenting readable outputs. See the outline below for details about the topics that we’ll cover.

For E3 students in particular, this guide should touch on every R concept that you’ll need to complete the problem sets. That doesn’t mean that you won’t need to do some googling or reference any function documentation. And, since Fall 2023 is the first quarter that we are offering the guide, there will certainly be oversights. We plan to modify the guide in response to feedback from students, so that it can become more complete and helpful for future cohorts of E3 students.

1.3 What this guide won’t teach you (and some external resources to help with that)

This guide will not teach you how to become a great programmer. This is because I myself am not a great programmer. I have zero training in computer science and I have never worked on a project where my code had to be thoroughly optimized for speed and readability. My code gets the job done. As a rule, the goal of this guide is to get you that far, and no farther.

In service of this goal, I avoid delving into technical details unless necessary. My terminology is often loose. I sometimes call programming objects by incorrect names. In general, I try to convey the way I think when I’m writing code, rather than facts about how R works. This differentiates this website somewhat from existing guides. My hope is that this focus will be practically useful for beginners. Students who are interested in a more detailed guide to R can reference R for Beginners, R for data science, ModernDive, and many other online guides. Some of my exposition is directly inspired by these resources.

I do not adhere to any particular style in my example code. I discuss a few best practices, but again – the goal is only to write code that is readable and does what we want it to do. As you code more, you will develop your own stylistic tastes. For a short discussion of principles of style, see the appendix of Gentzkow and Shapiro’s guide

Discussion of the causal inference methods and econometrics at the heart of applied economics is well beyond scope of this guide. My favorite resource for these is Scott Cunningham’s online textbook, which comes with convenient sample code snippets in R. Alternatively, Angrist and Pischke’s Mostly Harmless Econometrics is a publicly available classic, and Mastering Metrics by the same authors covers similar material at a more basic level.

1.4 Outline

Section 2 is a “setup” guide: downloading R and RStudio, and running our first lines of code.

Section 3 discusses the foundational concepts of programming in RStudio.

Section 4 discusses plotting with ggplot. I also introduce specialized tools for binned scatterplots and maps.

Section 5 discusses data cleaning using dplyr verbs.

Section 6 discusses regression. I demonstrate fixest, my preferred package for fixed effects and instrumental variables.

Section 7 discusses putting all of these pieces together – the anatomy of an RScript.

Section 8 (CURRENTLY UNDER CONSTRUCTION!) discusses best practices for organizing medium-sized and large projects that have more than one script. E3 students who are working on their final project, I recommend that you start here.

Section 9 (CURRENTLY UNDER CONSTRUCTION!) introduces more advanced topics. We learn to write our own functions, construct for-loops and while-loops, and simulate economic models.

Section 10 (CURRENTLY UNDER CONSTRUCTION!) is a guide to writing formatted documents in Markdown.

1.5 Acknowledgements

I thank Hunt Allcott for his dedication to E3 and his support in creating this guide. I thank Tess Snyder and Lea Bottmer for comments.