Chapter 1 Introduction

Before we dive into the day-to-day course material, it is important to understand the big picture of what we are trying to do here, and to get set up for our work.

Chapter goals

In this chapter we will:

Review educational goals and establish course expectations
Gather resources and tools, including all needed computer software.

1.1 Course goals and context

This is an introductory course in statistics for economics. It is similar to courses taught all over the world to first and second year university students in business, economics, and other social sciences.

Course goals

By the end of this course:

You will develop computer skills:

Clean, analyze, and graph data in Excel.
Clean, analyze and graph data in R.
Follow recommended practices for data management and reproducible analysis.

You will become familiar with basic statistical concepts:

Calculate and interpret probabilities and expected values.
Explain the relationship between population and sample.
Describe the properties of a statistic or estimator including its probability distribution, expected value, variance, bias, and mean squared error.
State and apply the law of large numbers and central limit theorem.

You will be able to apply these skills in combination to analyze real-world economic data:

Construct and interpret common charts including histograms, scatter plots, and time-series plots.
Construct and interpret frequency tables and cross-tabulations.
Construct and interpret common univariate and bivariate statistics, including mean, variance, standard deviation, covariance and correlation.
Construct and interpret hypothesis tests and confidence intervals.

We will be switching back and forth between theory, data analysis and applications. All three skill sets are valuable.

Hopefully you are in this course because you are fascinated by statistics and can’t wait to learn more about it. But most of you are taking it because it’s a required course.

So I’d like to motivate everyone to take this course as an opportunity to learn some very useful skills. Today’s world is awash in data:

retailers maintain databases of transactions
manufacturers track product quality and costs
marketers collect data on customers and potential customers
government records everyone’s interactions with schools, tax authorities, social welfare, health care and criminal justice,
employers maintain detailed personnel records.

These databases can be linked and analyzed in various ways, and many of the world’s most successful companies rely heavily on the innovative gathering and usage of data:

Google’s core product (the search engine) is built on the innovative analysis of massive amounts of data.
Both Google and the major social media companies are based on providing valuable “free” services in order to gather data on consumers that can then be sold (in some form) to other businesses.
Amazon and other retailers use what is called A/B testing to fine-tune product descriptions and set prices so as to maximize profits.

Some of this data analysis is done by computer scientists, but much of it is done by economists: for example, Amazon is the second-largest employer of PhD economists in the US (after the Federal Reserve System).

This course will not qualify you for those jobs, but it is a first step in that direction.

Be the Mona Lisa

I always tell students thinking about the future to remember supply and demand in the labour market. In the labour market your skills and effort are the product, and you are the seller. Like all sellers, you want to be expensive. This requires that you have skills that are both:

Useful (high demand)
Uncommon (low supply)

In other words, you need to be like the Mona Lisa. If your skills are useful but common (like water), or rare but useless (like my one-of-a-kind drawing of the Mona Lisa) your labour will sell at a low price.

	High demand	Low demand
Low supply
High supply

The ability to analyze data in a sophisticated way, and to explain the results in written or oral presentation, is an extremely useful and uncommon skill. Most of you do not have the technical skills of your colleagues in Computer Science, but if you can combine a reasonable level of computer skills with writing, knowledge of the underlying statistical principles, and the ability to recognize the economic considerations in a situation, you will do quite well.

1.2 Expectations

The course is constructed under the assumptions that:

You have taken introductory microeconomics and introductory macroeconomics.
- We will use ideas from those courses in applications and examples.
You have seen some probability and statistics content in high school.
- It’s OK if you do not remember much.
You can do high-school level math including algebra and basic set theory and have taken or are currently taking an introductory calculus course.
- I will not ask you to take derivatives or solve integrals; instead I will refer to concepts like functions, sequences and limits.
- The math review appendix provides material and practice problems if you need to review these concepts and tools.
You have access to a desktop or laptop computer, and basic computer skills.

This is not a class in introductory economics, high school math, or using a computer. If you are a little behind in those skills you will need to ask for help, but I am happy to help anyone who asks.

1.3 SFU-specific information

ECON 233 is the first course in the two-course econometrics sequence that is required for all economics majors. If you’ve never seen the word before, “econometrics” just means statistics and data analysis for economics. The course Canvas page is available at https://canvas.sfu.ca/courses/62548. It includes information on lectures, tutorials, quizzes, and assignments.

ECON 233 or BUS 232?

All economics majors have the option of taking ECON 233 or BUS 232, so you may be wondering what the difference is. Either course is suitable preparation for ECON 333, but there are some key differences:

Tools: ECON 233 uses both Excel and R, while BUS 232 uses Excel.
- You are likely to use R in ECON 333 and other upper-division ECON courses, so it is nice to get used to it now.
Applications: ECON 233 emphasizes economics applications, while BUS 232 emphasizes business applications.

ECON 233 is part of the Social Data and Analytics (SDA) minor; if you are an economics student and are interested in that minor you are recommended to take ECON 233.

ECON 333 is the second course in the two-course econometrics sequence required for all economics majors. In ECON 333, you will learn more advanced techniques including linear regression, you will use R more extensively, and you will go deeper into the theory.

Related courses

If you find you enjoy and/or do well in this course, I would strongly encourage you to take further courses in econometrics:

ECON 334: Data Visualization and Economic Analysis is an elective focusing on exploratory data analysis and visualization
ECON 335: Introduction to Causal Inference and Policy Evaluation is an elective focusing on the problem of inferring cause-and-effect from economic data, and using data to forecast the effects of economic policies.
ECON 433: Financial and Time Series Econometrics is an advanced elective focusing on techniques for analyzing the kind of time series data that is used in macroeconomics and financial markets.
ECON 435: Econometric Methods is an advanced course in statistics and econometrics that is part of our honours sequence. It gives you the opportunity and tools to write a serious empirical research paper. Non-honours students are eligible to take it if they have a 3.0 CGPA and the course prerequisites.

I would also encourage you to take courses outside of the economics department, and to consider a Statistics minor or the new interdisciplinary Social Data Analytics (SDA) minor .

1.4 Computer resources

To do the computer work you will need a computer with internet access and the following software packages installed:

Microsoft Excel
R
RStudio

Excel is a commercial application, while R and RStudio are both open-source (free). They are available for both Windows and macOS. The examples in the textbook use Windows.

The required software packages are available free of charge for SFU students, and are installed on all campus lab computers,

1.4.1 Installing Microsoft Excel

Microsoft Excel is a well-known spreadsheet program that is available for both Windows and macOS. Alternatives to Excel include Google Sheets and Apple Numbers.

Installing Excel at SFU

SFU has a licensing agreement with Microsoft that allows its students free installation of the entire Microsoft Office suite, including Excel. Installation instructions are available at

https://www.sfu.ca/itservices/technical/software/office365.html.

Once you have installed Excel, you should confirm that it is working by starting the program. You should see something that looks like this:
Excel blank workbook

1.4.2 Installing R and RStudio

Later in the semester, we will also be using a more specialized statistical program called R, and a related program called RStudio.

R is a programming language used for statistical analysis.
RStudio is an “Integrated Development Environment” for R, that is it is an integrated set of tools for building and running R programs.

Both R and RStudio are open-source, and are available free of charge for both Windows and macOS. Installation instructions are available at:

https://rstudio.com/products/rstudio/download/#download.

Be sure to install R first, then RStudio.

After installing R and RStudio, you should confirm that they are working by opening RStudio. You should see something like this:
RStudio open screenshot

1.4.3 Installing the Tidyverse

One of the most useful features of R is that it allows users to write and distribute packages that extend its capabilities.

One of the most popular and useful packages is called the Tidyverse. R is a very powerful program, but it is also a very old one: the underlying language (called “S”) was originally created in 1976. The result of this is that some of the original commands are outdated in design and aren’t well suited for modern capabilities or principles of software development. The Tidyverse solves this problem by adding new, more modern versions of these commands. You can learn more about the Tidyverse at https://www.tidyverse.org/.

To install the Tidyverse package:

Open RStudio if it isn’t already open.
Click in the Console window (you will see it towards the bottom of the screen)
Enter install.packages("tidyverse") (i.e., type it and hit the <enter> key)

Once the installation is concluded and the > prompt reappears you can test to make sure the installation worked.

Enter library(tidyverse) in the console window.
- If you get an error message like Error in library(tidyverse) : there is no package called ‘tidyverse’ drop by office hours for help.
- If you don’t get an error message (you will get some message about “Conflicts”), the installation worked.

If you run into trouble here, don’t worry. We will not need the Tidyverse for a few weeks, so there is plenty of time to get help.

1.5 Conventions of this book

This book uses consistent visual conventions to convey information.

Organization:
- Each chapter corresponds to one full week of the course.
Typography:
- Computer code or other inputs are shown like this.
- Math is usually shown like $this$ .
- When new terminology is introduced, it is shown like this.
Boxes:
- Pull-out information is shown in colored boxes.

Example 1.1 Boxes like this are for examples

Boxes like this are for showing course or chapter goals.

Boxes like this are for providing economic background

Boxes like this are for providing optional information that might be of interest to some students.