3 Reading the data and aggregating it
Now we have loaded all the functions we need to read the exported
csv-files from FHM6, and we have also set the path to the folder via
path. Now we can start reading and aggregating the data for analysis.
If you only want to look at one test sim, you can use the
fhm6Parser() function. The arguments that you need to provide are the path to the save game folder, and the save game name.
gameData variable is now a list of three objects:
teams: Team standings and statistics,
schedule: The schedule of individual played games,
players: Player statistics
In order to look at one of these objects more in detail as a data frame, you can use the
$ operator. For instance if you want to look more in detail on the standings:
If you want to combine and compare multiple test sims at the same time you can use the above code and save each new save game as a new variable, but it might just be easier to read everything at once. The function
fhm6Aggregator() changes one of the arguments to just a baseName (the function adds on the “.lg” extension) and
nSims that sets the number of test sims that you have saved.
testSims now contain a list of
nSims lists, one for each test sim, similar to the lists from before in
3.1 Filtering the data
csv-files contain a lot of information so the first step before looking into it further is to filter and select the relevant variables.
3.1.1 Team filters
teamParserRaw() selects the more relevant team information from the extracted FHM6 data and creates a data frame also containing information about which save file the information comes from in the last column.
3.1.2 Player filters
At the time of writing there does not exist any functions for filtering or selecting specific statistics for players.
3.2 Summarizing the data
Raw data might be fun to look at but it’s usually difficult to see general trends. The purpose of doing multiple test sims, with the same tactics or line-ups, is to increase the sample size of results, but the data becomes much larger. It is then necessary to aggregate or summarize the data in some way. One easy way to interpret the results is to use the mean value of a statistic over all the test sims.
Using the function
teamAggregator() you can summarize the mean of all statistics found in
teamData. If you would like to use another function, the argument
fun can be changed to any of
sd (for standard deviation), or
3.2.1 Casino predictions
For casino predictions it is of interest to compare the number of wins for each team in the test sims to a casino line. These are set at the start of every season so we can create this information in R in two different ways. The easiest way is to create your own data set with the information.
### ID variable for each team TeamId <- c(18,7,0,8,1,9,2,13,4,10,20,5,14,21,12,19,6,15,3,11) ### Name variable for each team Team <- c( "Atlanta Inferno", "Baltimore Platoon", "Buffalo Stampede", "Calgary Dragons", "Chicago Syndicate", "Edmonton Blizzard", "Hamilton Steelhawks", "Los Angeles Panthers", "Manhattan Rage", "Minnesota Monarchs", "Montreal Patriotes", "New England Wolfpack", "New Orleans Specters", "Philadelphia Forge", "San Francisco Pride", "Seattle Argonauts", "Tampa Bay Barracuda", "Texas Renegades", "Toronto North Stars", "Winnipeg Aurora" ) ### All lines are written in alphabetical order. Casino <- c( 37.5, 43.5, 47.5, 34.5, 46.5, 29.5, 46.5, 39.5, 10.5, 31.5, 21.5, 33.5, 16.5, 29.5, 31.5, 29.5, 38.5, 47.5, 43.5, 11.5 ) ### Creates a data.frame with all information teamCasino <- data.frame( TeamId, Team, Casino )
teamCasino now contains all relevant information about the casino lines that the following functions will use.
Another way to do it is to import information from a Google Sheet, for example the sheet that myself and Krazko are using every season. This code assumes that you have connected your Google account to the
googlesheets4 package as explained in 2.4.
read_sheet() takes the arguments:
ss: url to the document
sheet: name of the specific sheet in the document
range: specific range in the sheet to read
With these lines we can now use another function;
casinoAggregator(). This function summarizes all of the test files but focuses in on each team’s number of wins. The resulting data frame contain:
- the mean number of wins,
- mean number of points,
- standard deviation of wins as an indication of the uncertainty/variation of the tests,
- median number of wins (the median is robust against outliers and might give a better representation of the true value),
- lower and upper quartile number of wins.
3.2.2 Writing data to Google Sheets
As you might have seen in Krazko and I’s sheet, both
casinoPredictions from our test sims are there in different sheets. Once the data is in Google Sheets you can also put in conditional formatting to visualize the data better.
This first code writes the
teamData to a specific sheet into the same document used above.
This second code writes the
casinoPredictions to another sheet in the same document.