Tutorial 1 Introduction to R and R Studio

1.1 Administration

This will be gone through in more detail in the first lecture.

Tutorial participation counts 5% towards your final mark - please ensure that you attend the tutorial that you are officially enrolled in and note that you have to register to join each week.

You should also run the R code provided before each class and come armed with any questions you may have about the code or content of that tutorial.

There are three assignments to be completed this semester. Submissions can be individual or groups of up to 3 students. If you form a group you will have to register your group for each assignment on the Subject Home Page in Canvas.

1.2 R and R Studio

R is an open-source and free statistical program for doing econometrics. R-Studio is a companion program that makes working with R user friendly.

Throughout ECOM20001, we will undertake tutorials and assignments in R Studio.

Tutorial 1 contains information on how to install R and R Studio.

If you have any problems doing this please let me know; also note that you should install version 4.0.3 of R (not the latest version 4.0.4 as there appears to be some compatibility issues with this version and R Studio).

1.2.1 Tutorial files

The R program files and CSV data files are stored under Modules->Tutorials on the Subject Home Page.

It is recommended that you set up the following directory structure on your PC or laptop.

this ensures that all the output generated using R during the tutorial will reside in the sub-folder for that tutorial.

1.3 Practice using R and R Studio

Now that we have R and R Studio installed and the files for Tutorial 1, tute1_tutors.csv and tute1.R in the Tutorial 1 sub-folder we can start exploring how to use R.

1.3.1 Set the Working Directory

You will notice this line in the R code provided:

setwd("/Users/byrned/Dropbox/Teaching/20001/Tutorials/Tutorial1")

I’d suggest that you comment this line out (each week); you can do this by placing a hash (#) at the start of the line e.g. 

# setwd("/Users/byrned/Dropbox/Teaching/20001/Tutorials/Tutorial1") 

notice how the text is now green indicating that this line will not be run by R.

Then go to the top menu and select Session -> Set Working Directory -> Choose Directory

There are a few other ways to do this. One option, in particular, is useful. Include the following line in your R script code:

setwd(dirname(rstudioapi::getSourceEditorContext()$path))

this automatically set your working directory to the script directory using rstudioapi. There is no need to run

# setwd("/Users/byrned/Dropbox/Teaching/20001/Tutorials/Tutorial1") 

nor use the menu bar option to set the working directory first - just double click on the R script file in your local directory and the R working directory will be automatically set to the local directory where the R script file resides.

Note, you need to install the package rstudioapi first.

To make sure you have the correct directory then run

getwd()
## [1] "C:/Users/Richard/OneDrive - The University of Melbourne/My Documents/Econ 1 Sem 1 2021/Econ1_gitbook"

1.3.2 Load and View the R file

Click on tute1.R in the Files Window and you should now see the R code in the R-Scripting Window.

1.3.3 Create a dataframe

the next thing we want to do is import the CSV data into a R dataframe. To do this run

data=read.csv(file="data/tute1_tutors.csv")

you should see the following in the Environment window.

To view the data either click on the “spreadsheet” button shown above or use

print(data)
##    instructor nationality fav_icrecream      fav_number
## 1   David B.       Canada    Strawberry            2904
## 2   David M.      England       Vanilla               7
## 3   Jia Sheen    Malaysia       Vanilla               3
## 4        Kael   Australia     Chocolate               7
## 5        Abby     Vietnam       Vanilla               8
## 6     Richard   Australia       Vanilla               5
## 7         Roy       China       Vanilla               6
## 8      Sahiba       India     Chocolate               5
## 9        Thai   Australia     Chocolate               7
## 10      Simon       Korea       Vanilla             109
## 11       Paul   Australia     Chocolate 3.14555323 (pi)

while this is OK for small datasets such as this, we will be using data sets with over 15,000 observations in the coming weeks.
Another way to check whether all the data has been created in the dataframe correctly is to look at the first few rows of data and the last few rows.
This can be done by running

head(data,4)
##   instructor nationality fav_icrecream fav_number
## 1  David B.       Canada    Strawberry       2904
## 2  David M.      England       Vanilla          7
## 3  Jia Sheen    Malaysia       Vanilla          3
## 4       Kael   Australia     Chocolate          7
tail(data,3)
##    instructor nationality fav_icrecream      fav_number
## 9        Thai   Australia     Chocolate               7
## 10      Simon       Korea       Vanilla             109
## 11       Paul   Australia     Chocolate 3.14555323 (pi)

If you would like to see, for example, records for observations 4-6, you could use

data[c(4:6),]
##   instructor nationality fav_icrecream fav_number
## 4       Kael   Australia     Chocolate          7
## 5       Abby     Vietnam       Vanilla          8
## 6    Richard   Australia       Vanilla          5

1.3.4 Running the R script

Please read the comments provided in the R script file and then run the code “chunk” e.g. 

## Print Hello world
print("Hello world!")
## [1] "Hello world!"
## Print your second R output!
print("R says: Hello! How are you?")
## [1] "R says: Hello! How are you?"
## Print can also print numbers without quotes
print(20001)
## [1] 20001

This way you get to know how the commands work - remember you will have to write your own code to complete assignments.

1.3.5 Data Types and Structures

Everything in R is an object.

R has 6 basic data types. (In addition to the five listed below, there is also raw which we will not worry about.)

  • character
  • numeric (real or decimal)
  • integer
  • logical
  • complex

Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type.

To get a list of variable names in our dataframe use:

names(data)
## [1] "instructor"    "nationality"   "fav_icrecream" "fav_number"

To reference a variable you need to include the dataframe name then $ and the the variable name; e.g. to print out the tutors nationalities use

print(data$nationality)
##  [1] "Canada"    "England"   "Malaysia"  "Australia" "Vietnam"   "Australia"
##  [7] "China"     "India"     "Australia" "Korea"     "Australia"

to find out what type of variables we have use

sapply(data,class)
##    instructor   nationality fav_icrecream    fav_number 
##   "character"   "character"   "character"   "character"

The variable fav_number should be numeric; because the last entry has characters (e.g. pi) R is treating the whole vector as character. To change this:

# get the dimensions of the dataframe
dim(data)
## [1] 11  4
# change this observation
data[11,4]=3.145553  

# another way (good to find strings in large datasets)
# without using packages
# which shows that the string 3.14 is in row 11 for the 
# variable vector fav_number
grep("3.14",data$fav_number)
## [1] 11
# coerce the variable object to numeric
data$fav_no <- as.numeric(data$fav_number)
    
# check it is now numeric 
sapply(data,class)
##    instructor   nationality fav_icrecream    fav_number        fav_no 
##   "character"   "character"   "character"   "character"     "numeric"

NOTE: you will not have to do this for any of the datasets provided for tutorials nor assignments from now on. This was included to show an example of data classes used in R (and how to use a few more basic commands) so if you find thi sconfusing don’t worry about it too much.

1.3.6 Installing and Using Packages

R has around 15,000 packages.

Anyone can write an R package; and they add and delete packages almost everyday!

We will try to keep the use of packages to a minimum however you do need to know how to install and call up packages required to run the R script files.

There are two ways to do this.

  1. Use R Studio

Go to the Files Window and click on Packages then click the Install button
you should see

type in stargazer and then Install.

I’d suggest you use this method first.

  1. install directly using R code e.g. 
## Install "stargazer" package
install.packages("stargazer")

## Install "AER" package, where "AER" stands for "Applied Econometrics" 
# Note how it automatically installs other packages that "AER" depends on
install.packages("AER")

Now sometimes there is a problem with R and R Studio relating to “dependency” packages, if you try to install a package and run into this issue try adding

## Install "stargazer" package and problem with dependency packages
install.packages("stargazer",dependecies=TRUE)

So, that’s about it for this tutorial!

As noted in the Administration section (above) you should download the tute2.R and tute2_crime.csv to your tutorial 2 sub-folder; then follow the instructions we went through today to create a dataframe.

Next open tute2.pdf and have a look at the questions; if you are able to match the relevant R code contained in the tute2.R file that would be great.