2 Getting started with R
R is a programming language, mostly used by statisticians, to process and analyze data. Coding in R means to use functions
from different packages
meaning that you (usually) don’t have to write your own code from scratch.
2.1 Installing R and RStudio
What you need to get started is a distribution of R, downloadable from here. This tutorial is created using R 4.0.5
.
It is also strongly recommended to use a graphical interface to help you along, and the most commonly used interface is RStudio, found here.
Once you have downloaded and installed both programs you are ready to start up RStudio for the first time.
2.2 The interface
RStudio consists of four windows; the console, the editor, the environment and the explorer.
- The console window is mostly used for printing code that is run and producing outputs. Anything that is written here is not saved between sessions.
- The editor window is where most of your code is written. You can start a new code document (extension
.R
) by going to File -> New File -> R Script or pressing CTRL + SHIFT + N. Anything you write in this document can be saved between sessions. Running code from this document is done by standing on the row you want to run and pressing CTRL + ENTER. If you want to run multiple lines of code at the same time, you can highlight the code you want to run and press CTRL + ENTER.
- The environment window is where R saves all information that you choose to save, for instance data that is loaded or the result of some analysis.
- The explorer window is where you can view documentation of functions and packages, plot output and your files.
It is the first three windows that this tutorial will focus in on.
2.3 Your first lines of code
The first thing to note is that R is case-sensitive, meaning that a
and A
are different things. R also uses <-
as a way to store information into a variable
in the environment, for instance we can store the value 3 in the variable a
as follows:
Copy the code above into your code document and press CTRL + ENTER while standing on that row. You should now see a variable added into your environment called a
with the value 3.
You can also print the value to the console by writing a
into the console (run it by pressing ENTER) or on a separate line in your code document and running the line of code similar to earlier.
## [1] 3
You should now get the output in the console similar to above.
Functions
in R are structured in a similar way, function(arguments = values)
, where arguments that are used for the function are written within the parentheses. For instance we can use length()
to see how many values are present in a variable. The argument for specifying the variable in the function is x
and here we use =
to connect a
to the argument.
## [1] 1
Running this code will only produce an output but we can save this information by storing the value into another variable with <-
like:
Now this value is stored in the environment as size
and can be used later on.
2.4 Installing and loading new packages
Easy calculations can be done with the packages already installed loaded in the base R distribution. More advanced calculations and functions must first be installed on your computer and then loaded into the session in order for them to be used.
The functions used in this tutorial comes from the packages tibble
, tidyr
, dplyr
and googlesheets4
. These packages can be considered tool-kits with different types of tools that we want to use for different problems. Similar to a real life tool-kit you will need to buy it from the store before you can use them, so the first step is to install the packages using install.packages()
. In order to do this quickly we can create a vector
of values containing all the package names, and then use the vector as an argument in the function.
Once we have bought the tool-kits to our garage or shed, we must now put them in our workspace to gain access to the tools within. We load the packages and their functions into the current session with require()
.
Once you have installed a package, you do not need to install it anymore for your current version of R, but every new session needs to load the packages with require()
.
In order to use googlesheets4
you will need to link a Google account to R which is done the first time you use a function from the packages.
2.4.1 Functions for FHM
The functions I have created for FHM does not exist as a package but can be accessed though the SHL Github. In order to load these functions into your session in R, you need to use another function source()
. This function reads and runs entire R scripts in one go and will create functions that can later be used in your session. The following code should import all functions that you will need to use to your environment, read directly from the SHL Github. Remember that you will have to run this code, bring out the tool-kit, everytime you start a new session, similar to loading packages.
2.5 Importing data sets
Manually inputting large data sets into R is not feasible, but luckily there is a way of importing raw text-files. Franchise Hockey Manager 6 can export .csv
-files from a saved game and it is these files that R can parse and aggregate. In order to import the files, R needs to know where on your computer the FHM saved games exist so you need to provide a path to that folder.
A path is considered a string
which means that you need to save the path within quotation marks, "path"
. The folder where FHM saves your games is usually found under C:\Users\USER\Documents\Out of the Park Developments\Franchise Hockey Manager 6\saved_games
, but you need to find where your saved games are located. One thing to note is that R cannot handle the use of \
in a string, so they need to be replaced with /
instead.
My path for the saved games can be found here: