2 Introduction

The tutorials and labs contained in this book are not meant to constitute a complete R based course in probability theory. They are also not meant to constitute a complete course in R programming.

Rather, the aim of these materials is to complement an undergraduate course in probability theory for Economics majors called Statistics for Economists (ECON 41) that is offered at the University of California, Los Angeles (UCLA). A very brief (and vague) description of that course is available at the link below, but its contents are basically the same as any course in probability theory that is offered at any American university. For instance, it covers the same topics as MATH 461 at the University of Illinois, Urbana Champaign. ECON 41 is meant to be an introductory course in Statistics at UCLA, but at other universities it is not.

https://www.registrar.ucla.edu/Academics/Course-Descriptions/Course-Details

https://netmath.illinois.edu/college/math-461

With that in mind, if you stumbled upon this book by accident, it still may contain examples or code that you might find useful. But it does assume substantial knowledge of R because I require my students to complete lots of R courses on DataCamp.com before beginning the tutorials which constitute most of this book. The tutorials are meant to focus the knowledge they acquire through those courses in order to complete the labs that have been written for this course. This book also assumes knowledge of basic statistics and probability because all of my students take AP Statistics before taking this class. Little to no attention is devoted to reviewing the basics of R, probability or statistics as a result.

2.0.1 Purpose

To my knowledge, ECON 41 at UCLA historically has not included a substantial statistical programming or data analysis component. The course is taught out of a book and students are graded based on their performance on homework assignments and exams that are done with pen and paper. This is reflected in a syllabus from a summer 2017 version of the course that is linked below.

https://economics.ucla.edu/wp-content/uploads/2017/02/syllabus41_Gu.pdf

For people who are intrinsically interested in math, this abstract approach to probability theory may not be a problem, and some may even prefer it. But for others, this abstractness obscures the value of learning about this stuff at all. And even when the instructor endeavors to include practical examples that bring the material to life, without a statistical programming component the student cannot learn how to dig through real data in order to find insights of their own using what they learn in class. These tutorials and labs are meant to take this course in a new direction and enable students who take it to do just that.

2.0.2 Why R?

There are countless software packages and programming languages that could have been used for this course instead of R. So why did I pick this language instead of any of the others?

First, I believe through personal experience that R is a great first programming language. Its syntax and data structures are relatively simple, and the things that can be accomplished in it with just a few weeks of practice are substantial. R is also similar to other programming languages like Python, so getting a good grip on this language will set a student up to confidently approach and rapidly learn new programming languages later. And if a student enters this course with some previous programming experience, they will find R to be very easy, especially if they were learning Java or C before.

Second, R is free. Although lots of resources such as books and online courses about R programming cost money, the R language and the RStudio platform cost nothing. And the internet is full of free information about R programming that you can access for free in order to reach your programming goals. This is also why we are not using Stata despite its popularity in the Economics field. (Stata licenses cost a fortune. Just look up the prices.)

Third, R is flexible. Software packages like SPSS and Microsoft Excel are not. Practically any statistical programming task you can think of is possible in R, and execution is simply a matter of typing the right commands instead of trying to click through a bunch of windows and buttons that may lead nowhere in the end. As we will see, it’s also possible to make beautiful and highly customizable data visualizations in R that Excel users can only dream of. (To the extent that is possible in a course on probability theory, anyway.)

Finally, the RStudio platform. If you have no programming background, it will be hard to appreciate. But if you do, you will love writing and testing code on this platform. It’s even better than Spyder or Jupyter Notebooks for Python, which is saying a lot.

2.0.3 Why not Python?

Python is probably the most popular programming language for data analysis today. But I chose not to base this part of ECON 41 in Python for a couple of reasons.

First, Python is significantly more complex than R. Learning enough Python to do the stuff we will do in our labs would take too long and be too difficult, and it would drain too much time and energy from the main part of the course as a result.

Second, Python is a programming language that can be used for just about anything, not just data analysis. This makes using Python harder than using R, which was developed specifically for this purpose. These challenges would distract even more from the content of this course as it is currently structured.

To be clear, I am not saying that Python is not worth learning. It is, and if you’re thinking about a career in data science or something related to it, it is something you must learn. But I believe that R is a better gateway to statistical programming than Python.

2.0.4 Last words

This book is a work in progress and for me it is the first work of its kind. As a result it may contain typos and other errors and its content may change substantially over time. If you notice problems or have suggestions, feel free to contact me. I am always on the lookout for new applications and new datasets to enrich my classes, and of course I am eager to fix mistakes in my tutorials and labs so that my students benefit as much as possible from these materials.