Tutorial 1 Introduction to R and R Studio
1.1 Administration
This will be gone through in more detail in the first lecture.
Tutorial participation counts 5% towards your final mark - please ensure that you attend the tutorial that you are officially enrolled in and note that you have to register to join each week.
You should also run the R code provided before each class and come armed with any questions you may have about the code or content of that tutorial.
There are three assignments to be completed this semester. Submissions can be individual or groups of up to 3 students. If you form a group you will have to register your group for each assignment on the Subject Home Page in Canvas.
1.2 R and R Studio
R is an open-source and free statistical program for doing econometrics. R-Studio is a companion program that makes working with R user friendly.
Throughout ECOM20001, we will undertake tutorials and assignments in R Studio.
Tutorial 1 contains information on how to install R and R Studio.
If you have any problems doing this please let me know; also note that you should install version 4.0.3 of R (not the latest version 4.0.4 as there appears to be some compatibility issues with this version and R Studio).
1.2.1 Tutorial files
The R program files and CSV data files are stored under Modules->Tutorials on the Subject Home Page.
It is recommended that you set up the following directory structure on your PC or laptop.
this ensures that all the output generated using R during the tutorial will reside in the sub-folder for that tutorial.
1.3 Practice using R and R Studio
Now that we have R and R Studio installed and the files for Tutorial 1, tute1_tutors.csv and tute1.R in the Tutorial 1 sub-folder we can start exploring how to use R.
1.3.1 Set the Working Directory
You will notice this line in the R code provided:
I’d suggest that you comment this line out (each week); you can do this by placing a hash (#) at the start of the line e.g.
notice how the text is now green indicating that this line will not be run by R.
Then go to the top menu and select Session -> Set Working Directory -> Choose Directory
There are a few other ways to do this. One option, in particular, is useful. Include the following line in your R script code:
this automatically set your working directory to the script directory using rstudioapi. There is no need to run
nor use the menu bar option to set the working directory first - just double click on the R script file in your local directory and the R working directory will be automatically set to the local directory where the R script file resides.
Note, you need to install the package rstudioapi first.
To make sure you have the correct directory then run
## [1] "C:/Users/Richard/OneDrive - The University of Melbourne/My Documents/Econ 1 Sem 1 2021/Econ1_gitbook"
1.3.2 Load and View the R file
Click on tute1.R in the Files Window and you should now see the R code in the R-Scripting Window.
1.3.3 Create a dataframe
the next thing we want to do is import the CSV data into a R dataframe. To do this run
you should see the following in the Environment window.
To view the data either click on the “spreadsheet” button shown above or use
## instructor nationality fav_icrecream fav_number
## 1 David B. Canada Strawberry 2904
## 2 David M. England Vanilla 7
## 3 Jia Sheen Malaysia Vanilla 3
## 4 Kael Australia Chocolate 7
## 5 Abby Vietnam Vanilla 8
## 6 Richard Australia Vanilla 5
## 7 Roy China Vanilla 6
## 8 Sahiba India Chocolate 5
## 9 Thai Australia Chocolate 7
## 10 Simon Korea Vanilla 109
## 11 Paul Australia Chocolate 3.14555323 (pi)
while this is OK for small datasets such as this, we will be using data sets with over 15,000 observations in the coming weeks.
Another way to check whether all the data has been created in the dataframe correctly is to look at the first few rows of data and the last few rows.
This can be done by running
## instructor nationality fav_icrecream fav_number
## 1 David B. Canada Strawberry 2904
## 2 David M. England Vanilla 7
## 3 Jia Sheen Malaysia Vanilla 3
## 4 Kael Australia Chocolate 7
## instructor nationality fav_icrecream fav_number
## 9 Thai Australia Chocolate 7
## 10 Simon Korea Vanilla 109
## 11 Paul Australia Chocolate 3.14555323 (pi)
If you would like to see, for example, records for observations 4-6, you could use
## instructor nationality fav_icrecream fav_number
## 4 Kael Australia Chocolate 7
## 5 Abby Vietnam Vanilla 8
## 6 Richard Australia Vanilla 5
1.3.4 Running the R script
Please read the comments provided in the R script file and then run the code “chunk” e.g.
## [1] "Hello world!"
## [1] "R says: Hello! How are you?"
## [1] 20001
This way you get to know how the commands work - remember you will have to write your own code to complete assignments.
1.3.5 Data Types and Structures
Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there is also raw which we will not worry about.)
- character
- numeric (real or decimal)
- integer
- logical
- complex
Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type.
To get a list of variable names in our dataframe use:
## [1] "instructor" "nationality" "fav_icrecream" "fav_number"
To reference a variable you need to include the dataframe name then $ and the the variable name; e.g. to print out the tutors nationalities use
## [1] "Canada" "England" "Malaysia" "Australia" "Vietnam" "Australia"
## [7] "China" "India" "Australia" "Korea" "Australia"
to find out what type of variables we have use
## instructor nationality fav_icrecream fav_number
## "character" "character" "character" "character"
The variable fav_number should be numeric; because the last entry has characters (e.g. pi) R is treating the whole vector as character. To change this:
## [1] 11 4
# change this observation
data[11,4]=3.145553
# another way (good to find strings in large datasets)
# without using packages
# which shows that the string 3.14 is in row 11 for the
# variable vector fav_number
grep("3.14",data$fav_number)
## [1] 11
# coerce the variable object to numeric
data$fav_no <- as.numeric(data$fav_number)
# check it is now numeric
sapply(data,class)
## instructor nationality fav_icrecream fav_number fav_no
## "character" "character" "character" "character" "numeric"
NOTE: you will not have to do this for any of the datasets provided for tutorials nor assignments from now on. This was included to show an example of data classes used in R (and how to use a few more basic commands) so if you find thi sconfusing don’t worry about it too much.
1.3.6 Installing and Using Packages
R has around 15,000 packages.
Anyone can write an R package; and they add and delete packages almost everyday!
We will try to keep the use of packages to a minimum however you do need to know how to install and call up packages required to run the R script files.
There are two ways to do this.
- Use R Studio
Go to the Files Window and click on Packages then click the Install button
you should see
type in stargazer and then Install.
I’d suggest you use this method first.
- install directly using R code e.g.
## Install "stargazer" package
install.packages("stargazer")
## Install "AER" package, where "AER" stands for "Applied Econometrics"
# Note how it automatically installs other packages that "AER" depends on
install.packages("AER")
Now sometimes there is a problem with R and R Studio relating to “dependency” packages, if you try to install a package and run into this issue try adding
## Install "stargazer" package and problem with dependency packages
install.packages("stargazer",dependecies=TRUE)
So, that’s about it for this tutorial!
As noted in the Administration section (above) you should download the tute2.R and tute2_crime.csv to your tutorial 2 sub-folder; then follow the instructions we went through today to create a dataframe.
Next open tute2.pdf and have a look at the questions; if you are able to match the relevant R code contained in the tute2.R file that would be great.