# Chapter 1 Introduction

Chapter goals

In this chapter we will:

• Review educational goals and establish course expectations
• Gather resources and tools, including all needed computer software.

## 1.1 Course goals and context

ECON 233 is the first course in the two-course econometrics sequence that is required for all economics majors. If you’ve never seen the word before, “econometrics” just means statistics and data analysis for economics.

Course goals

By the end of this course:

You will develop computer skills:

1. Clean and analyze data in Excel.
2. Analyze and graph data in R.
3. Follow recomended practices for data management and reproducible analysis.

You will become familiar with basic statistical concepts:

1. Calculate and interpret probabilities and expected values.
2. Explain the relationship between population and sample.
3. Describe the properties of a statistic or estimator including its probability distribution, expected value, variance, bias, and mean squared error.
4. State and apply the law of large numbers and central limit theorem.

You will be able to apply these skills in combination to analyze real-world economic data:

1. Construct and interpret common charts including histograms, scatter plots, and time-series plots.
2. Construct and interpret freqency tables and cross-tabulations.
3. Construct and interpret common univariate and bivariate statistics, including mean, variance, standard deviation, covariance and correlation.
4. Construct and interpret hypothesis tests and confidence intervals.

We will be switching back and forth between theory, data analysis and applications. All three skill sets are valuable.

Hopefully you are in this course because you are fascinated by the course material, and would take it even if it wasn’t required for the economics major. But that isn’t the case for many of you, so I’d like to motivate everyone to take this course as an opportunity to learn some very useful skills.

Today’s world is awash in data:

• retailers maintain databases of transactions
• manufacturers track product quality and costs
• marketers collect data on customers and potential customers
• government records everyone’s interactions with schools, tax authorities, social welfare, healthcare and criminal justice,
• employers maintain detailed personnel records.

These databases can be linked and analyzed in various ways, and many of the world’s most successful companies rely heavily on the innovative gathering and usage of data:

• Google’s core product (the search engine) is built on the innovative analysis of massive amounts of data.
• Both Google and the major social media companies are based on providing valuable “free” services in order to gather data on consumers that can then be sold (in some form) to other businesses.
• Amazon and other retailers use what is called A/B testing to fine-tune product descriptions and set prices so as to maximize profits.

Some of this data analysis is done by computer scientists, but much of it is done by economists: for example, Amazon is the second-largest employer of PhD economists in the US (after the Federal Reserve System).

I always tell students thinking about the future to remember supply and demand in the labour market. In the labour market your skills and effort are the product, and you are the seller. Like all sellers, you want to be expensive. This requires that you have skills that are:

• Useful (high demand)
• Uncommon (low supply)

The ability to analyze data in a sophisticated way, and to explain the results in written or oral presentation, is an extremely useful and uncommon skill. Most of you do not have the technical skills of your colleagues in Computer Science, but if you can combine a reasonable level of computer skills with writing, knowledge of the underlying statistical principles, and the ability to recognize the economic considerations in a situation, you will do quite well.

All economics majors have the option of taking ECON 233 or BUS 232, so you may be wondering what the difference is. Either course is suitable preparation for ECON 333, but there are some key differences:

• Tools: ECON 233 uses both Excel and R, while BUS 232 uses Excel.
• You are likely to use R in ECON 333 and other upper-division ECON courses, so it is nice to get used to it now.
• Applications: ECON 233 emphasizes economics applications, while BUS 232 emphasizes business applications.

ECON 233 is part of the Social Data and Analytics (SDA) minor; if you are an economics student and are interested in that minor you are recommended to take ECON 233.

Related courses

ECON 333 is the second course in the two-course econometrics sequence requirred for all economics majors. In ECON 333, you will learn more advanced techniques including linear regression, you will use R more extensively, and you will go deeper into the theory.

If you find you enjoy and/or do well in this course, I would strongly encourage you to take further courses in econometrics:

I would also encourage you to take courses outside of the economics department, and to consider a Statistics minor or the new interdisciplinary Social Data Analytics (SDA) minor .

## 1.2 Organization and expectations

The course Canvas page is available at https://canvas.sfu.ca/courses/59191. It includes information on lectures, tutorials, quizzes, and assignments.

The course is constructed under the assumptions that:

• You have taken introductory microeconomics (ECON 103 at SFU) and introductory macroeconomics (ECON 105 at SFU). We will use ideas from those courses in applications and examples.
• You have seen some probability and statistics content in high school, though you may not remember much.
• You can do high-school level math including algebra and basic set theory and have taken or are currently taken an introductory calculus course.
• You have access to a desktop or laptop computer, and basic computer skills.

This is not a class in introductory economics, high school math, or basic computer skills. If you are a little behind in those skills you will need to ask for help, but I am happy to help anyone who asks.

## 1.3 Computer resources

To do the computer work you will need a computer with internet access and the following software packages installed:

• Microsoft Excel
• R
• RStudio

The required software packages are available free of charge for SFU students. They are also installed on all campus lab computers,

I will teach the course using Windows, and can provide technical support for Windows. All of the necessary tools are also available for macOS, but my ability to provide technical support is more limited. I’ll do the best I can.

### 1.3.1 Installing Microsoft Excel

Microsoft Excel is a well-known spreadsheet program that is available for both Windows and macOS. Alternatives to Excel include Google Sheets and Apple Numbers.

SFU has a licensing agreement with Microsoft that allows its students free installation of the entire Microsoft Office suite, including Excel. Installation instructions are available at

Once you have installed Excel, you should confirm that it is working by starting the program. You should see something that looks like this:

### 1.3.2 Installing R and RStudio

Later in the semester, we will also be using a more specialized statistical program called R, and a related program called RStudio.

• R is a programming language used for statistical analysis.
• RStudio is an “Integrated Development Environment” for R, that is it is an integrated set of tools for building and running R programs.

Both R and RStudio are open-source, and are available free of charge for both Windows and macOS. Installation instructions are available at:

NOTE: install R first, then RStudio.

After installing R and RStudio, you should confirm that they are working by opening RStudio. You should see something like this:

### 1.3.3 Installing the Tidyverse package for R

One of the most useful features of R is that it allows users to write and distribute packages that extend its capabilities.

One of the most popular and useful packages is called the Tidyverse. R is a very powerful program, but it is also a very old one: the underlying language (called “S”) was originally created in 1976. The result of this is that some of the original commands are outdated in design and aren’t well suited for modern capabilities or principles of software development. The Tidyverse solves this problem by adding new, more modern versions of these commands. You can learn more about the Tidyverse at https://www.tidyverse.org/.

To install the Tidyverse package:

1. Open RStudio if it isn’t already open.
2. Click in the Console window (you will see it towards the bottom of the screen)
3. Enter install.packages("tidyverse") (i.e., type it and hit the <enter> key)

Once the installation is concluded and the > prompt reappears you can test to make sure the installation worked.

1. Enter library(tidyverse) in the console window.
• If you don’t get an error message, the installation worked.
• If you get an error message, drop by office hours to get help.

If you run into trouble, don’t worry. We will not need the Tidyverse for a few weeks, so there is plenty of time to get help.

## 1.4 Conventions of this book

This book uses consistent visual conventions to convey information.

• Organization:
• Each chapter corresponds to one week of the course.
• Typography:
• Computer code or other inputs are shown like this.
• When new terminology is introduced, it is shown like this.
• Boxes:
• Pull-out information is shown in colored boxes.

Boxes like this are for examples

Boxes that look like this are for showing course or chapter goals.

Boxes like this are for providing economic background

Boxes like this are for providing optional information that might be of interest to some students.