3 Reading the data and aggregating it
Now we have loaded all the functions we need to read the exported csv
-files from FHM6, and we have also set the path to the folder via path
. Now we can start reading and aggregating the data for analysis.
If you only want to look at one test sim, you can use the fhm6Parser()
function. The arguments that you need to provide are the path to the save game folder, and the save game name.
The gameData
variable is now a list of three objects:
teams
: Team standings and statistics,schedule
: The schedule of individual played games,players
: Player statistics
In order to look at one of these objects more in detail as a data frame, you can use the $
operator. For instance if you want to look more in detail on the standings:
If you want to combine and compare multiple test sims at the same time you can use the above code and save each new save game as a new variable, but it might just be easier to read everything at once. The function fhm6Aggregator()
changes one of the arguments to just a baseName (the function adds on the “.lg” extension) and nSims
that sets the number of test sims that you have saved.
### Sets the base name for each test sim
baseName <- "SHL-S59-Casino-"
### Sets the number of save files present with the baseName start
nSims <- 10
###
testSims <-
fhm6Aggregator(
saveFolder = path,
saveGame = baseName,
nSims = nSims
)
The variable testSims
now contain a list of nSims
lists, one for each test sim, similar to the lists from before in gameData
.
3.1 Filtering the data
The csv
-files contain a lot of information so the first step before looking into it further is to filter and select the relevant variables.
3.1.1 Team filters
The function teamParserRaw()
selects the more relevant team information from the extracted FHM6 data and creates a data frame also containing information about which save file the information comes from in the last column.
3.1.2 Player filters
At the time of writing there does not exist any functions for filtering or selecting specific statistics for players.
3.2 Summarizing the data
Raw data might be fun to look at but it’s usually difficult to see general trends. The purpose of doing multiple test sims, with the same tactics or line-ups, is to increase the sample size of results, but the data becomes much larger. It is then necessary to aggregate or summarize the data in some way. One easy way to interpret the results is to use the mean value of a statistic over all the test sims.
Using the function teamAggregator()
you can summarize the mean of all statistics found in teamData
. If you would like to use another function, the argument fun
can be changed to any of mean
, sum
, sd
(for standard deviation), or median
.
3.2.1 Casino predictions
For casino predictions it is of interest to compare the number of wins for each team in the test sims to a casino line. These are set at the start of every season so we can create this information in R in two different ways. The easiest way is to create your own data set with the information.
### ID variable for each team
TeamId <-
c(18,7,0,8,1,9,2,13,4,10,20,5,14,21,12,19,6,15,3,11)
### Name variable for each team
Team <-
c(
"Atlanta Inferno",
"Baltimore Platoon",
"Buffalo Stampede",
"Calgary Dragons",
"Chicago Syndicate",
"Edmonton Blizzard",
"Hamilton Steelhawks",
"Los Angeles Panthers",
"Manhattan Rage",
"Minnesota Monarchs",
"Montreal Patriotes",
"New England Wolfpack",
"New Orleans Specters",
"Philadelphia Forge",
"San Francisco Pride",
"Seattle Argonauts",
"Tampa Bay Barracuda",
"Texas Renegades",
"Toronto North Stars",
"Winnipeg Aurora"
)
### All lines are written in alphabetical order.
Casino <-
c(
37.5,
43.5,
47.5,
34.5,
46.5,
29.5,
46.5,
39.5,
10.5,
31.5,
21.5,
33.5,
16.5,
29.5,
31.5,
29.5,
38.5,
47.5,
43.5,
11.5
)
### Creates a data.frame with all information
teamCasino <-
data.frame(
TeamId,
Team,
Casino
)
The variable teamCasino
now contains all relevant information about the casino lines that the following functions will use.
Another way to do it is to import information from a Google Sheet, for example the sheet that myself and Krazko are using every season. This code assumes that you have connected your Google account to the googlesheets4
package as explained in 2.4. read_sheet()
takes the arguments:
ss
: url to the documentsheet
: name of the specific sheet in the documentrange
: specific range in the sheet to read
teamCasino <-
read_sheet(
ss = "https://docs.google.com/spreadsheets/d/1kisvxMASJvX26djRXzDVDxMz1ODqbaaMfuwUXtVwpWw/edit#gid=1074258598",
sheet = "Teams",
range = "A:D"
)
With these lines we can now use another function; casinoAggregator()
. This function summarizes all of the test files but focuses in on each team’s number of wins. The resulting data frame contain:
- the mean number of wins,
- mean number of points,
- standard deviation of wins as an indication of the uncertainty/variation of the tests,
- median number of wins (the median is robust against outliers and might give a better representation of the true value),
- lower and upper quartile number of wins.
3.2.2 Writing data to Google Sheets
As you might have seen in Krazko and I’s sheet, both teamData
and casinoPredictions
from our test sims are there in different sheets. Once the data is in Google Sheets you can also put in conditional formatting to visualize the data better.
This first code writes the teamData
to a specific sheet into the same document used above.
rawDataWriter(
data = teamData,
file = "https://docs.google.com/spreadsheets/d/1kisvxMASJvX26djRXzDVDxMz1ODqbaaMfuwUXtVwpWw/edit#gid=1074258598",
sheet = "Canadice's Raw Data"
)
This second code writes the casinoPredictions
to another sheet in the same document.