A (completely unnecessary) Bayesian Approach to Age of Empires Matchups

Author

Gio Circo

Show code

library(tidyverse)
library(knitr)
library(brms)

# DATA LOADING & PROCESSING
#================================#
civ_id <- read_csv("civ_ids.csv")
map_id <- read_csv("map_ids.csv")

# get all 1v1 games, filtering out matches with only 1v1
# and with non-missing data on both players
player_game <-
  read_csv("all_games.csv") %>%
  left_join(civ_id, by = c("civilization" = "id")) %>%
  left_join(map_id, by = c("map_id" = "map_id")) %>%
  group_by(id) %>%
  mutate(player = seq_along(civ)) %>%
  filter(sum(player) == 3,
         victory >= 0) %>%
  ungroup() %>%
  mutate(
    elo = case_when(
      average_elo < 1000 ~ "<1000",
      between(average_elo, 1000, 1250) ~ "1000-1250",
      between(average_elo, 1251, 1650) ~ "1250-1650",
      average_elo > 1650 ~ ">1650"),
    elo = fct_relevel(elo, c("<1000","1000-1250","1250-1650"))
  )


# identify top 15 maps
# making up ~90% of all games played
topmap <- player_game %>%
  count(map) %>%
  arrange(desc(n)) %>%
  mutate(prop = n/sum(n),
         cumprop = cumsum(prop)) %>%
  slice(1:15) %>%
  pull(map)

# convert to a wide format

# df1 = civ*civ*elo 
# elo analysis
df1 <-
  player_game %>%
  ungroup() %>%
  select(id, elo, civ, victory, player) %>%
  pivot_wider(names_from = player, values_from = c(victory, civ)) %>%
  group_by(civ_1, civ_2, elo) %>%
  summarise(
    victory_1 = sum(victory_1),
    victory_2 = sum(victory_2),
    count = n()
  )

# df2 = civ*civ*top15map
# maptype analysis
df2 <-
  player_game %>%
  ungroup() %>%
  filter(map %in% topmap) %>%
  select(id, map, civ, victory, player) %>%
  pivot_wider(names_from = player, values_from = c(victory, civ)) %>%
  group_by(civ_1, civ_2, map) %>%
  summarise(
    victory_1 = sum(victory_1),
    victory_2 = sum(victory_2),
    count = n()
  ) 

# HELPER FUNCTIONS
#================================#

# Function to extract a civ v civ matchup
# and combine probabilities (e.g. civ1 v civ2, civ2 v civ1)

civ_match <- function(preds, civ1, civ2, k){
  
  # setup matchup
  # find the matchups where the specified civs are either
  # in the civ1 or civ2 slot, where the "main" slot is
  # civ_1 and the opponent slot is civ_2

  matchup <-
    preds %>%
    filter(
      civ_1 == civ1 | civ_2 == civ1,
      civ_1 == civ2 | civ_2 == civ2
    ) %>%
    mutate(
      main = ifelse(civ_1 == civ1, 1, 0),
      win = case_when(main == 0 ~ 100 - prob, main == 1 ~ prob),
      match = paste0(civ1, "-", civ2)
    )
  
  # combine and export
  # this merges the civ_1 and civ_2 winrates to give us
  # the overall winrate when either played in the main or
  # opponent slot
  matchup %>%
    group_by(.data[[k]], match) %>% 
    summarise(across(c(win, err),mean), .groups = "drop") %>%
    mutate(lo = win - 2*err,
           hi = win + 2*err)
  
}

Age of Empires: Civilization Matchups

I am a big fan of Age of Empires 2 (AOE2) - which has been around for over 20 years and still has a thriving online community. One of the best parts of the game is the complexity resulting from the 42(!) different civilizations in the game. While many are subtly different, there is enough variety in the game to make different match ups between civilizations more or less favored. For example: the Franks are a cavalry-based civilization with a very strong early game, while the Koreans are a beefy siege-based late-game civ. The Franks are often very favored here because they can close out games well before the Koreans can build up in the mid-Castle or Imperial age. Combine this kind of complexity across all the 1,764 possible match ups, you have a lot of possibilities to work with!

With that in mind, I thought it appropriate to apply some of my modeling knowledge to explore civilization vs civilization win rates under different circumstances. While there are a number of different ways to analyze this, I decided to look at win rates based on Elo ranking and map types.

Getting Started

The data we are using represents 5.1 million Age of Empires 2 games hosted as part of a 100gb sqlite database from aoepulse. The data we are specifically focused on represent 1v1 random match games. While there are numerous other game types in AOE (2v2, 3v3, 4v4) 1v1 games are much easier to analyze for civ v. civ matchups. When we think about civilization balance, we almost exclusively focus on these 1v1 games.

Here’s the SQL query I used to pull the data, for full transparency. I’m focusing on games since the launch of the most recent expansion in April 2022 (patch 61591).

Show code

-- query to merge disaggregated civ v civ games
-- to map, elo, and patch ids
-- takes ~ 5 min to run, yields 3.6 mil rows

SELECT
    id,
    average_elo,
    map_id,
    patch_number,
    civilization,
    victory
FROM
(
-- match status, patch id, and elo
        SELECT
             id,
             average_elo,
             map_id,
             patch_number
         FROM matches
         WHERE patch_number >= 61591
 ) t0
INNER JOIN
(
--civ v civ matchups & victory
        SELECT
            match_id,
            civilization,
            victory
        FROM match_players
) t1
ON 
    t0.id = t1.match_id

The important variables we have here are the player civilizations, and the game’s average Elo ranking (a rough approximation for the players’ skill levels). What we’re going to do here is apply a Bayesian approach on this data to develop a robust model to help us understand civilization v. civilization win rates. There are roughly a million different conversations in the AOE2 community about which civilizations are best or worst, and which civilization match ups are the most one-sided.

An Aside: Why is this Useful?

First off - there are many websites that display 1v1 match up data, and I am far from the first person to do this (see: here or here for starters). In fact, some of this information is even available on the main AOE2 website. Why in the world do we need Bayesian statistics to analyze something that has been covered so broadly? Simply put, modelling approaches (especially hierarchical linear models) are good at handling sparse or rare outcomes. I previously touched on this in a previous post where I adjusted warehouse injury rates using a simple varying-intercepts model to account for the (seemingly high) injury rates in warehouses with very few employees. The same issue applies here. Let’s use an example to explain how:

An Example: Britons, Franks, Malians, and Bengalis

In AOE2 there are some civilizations which are much more popular than others. There are also map types which are more popular than others (Arabia being the obvious favorite), and Elo rankings which comprise more or fewer players. Therefore, when we start stratifying the data by civilization * Elo, or civilization * map type, or civilization * map type * Elo we quickly have fewer and fewer cases to look at. Looking at the plot below, we can see that some civs are quite a bit more popular - like the Franks, Mongols and Britons.

Show code

count(player_game, civ) %>% 
  mutate(prop = n/sum(n)) %>%
  ggplot() +
  geom_area(aes(x = prop, y = fct_reorder(civ,prop), group = 1), fill = "darkblue", orientation = "y", alpha = .3) +
  geom_line(aes(x = prop, y = fct_reorder(civ,prop), group = 1), linewidth = 1, color = "darkblue") +
  labs(x = "Play Rate") +
  theme_bw() +
  theme(axis.text = element_text(size = 11, color = "black"),
        axis.title.y = element_blank())

Franks, Mongols, and Britons are the most popular civs played

If we look at a very common match up at the lower Elo ranks - Franks vs Britons we can get a fairly good sampling. In this case we have 3523 games to rely on and see that the raw numbers imply the Franks are favored to win (winning in 53.3% of games played).

Show code

df1 %>%
  filter(civ_1 == "Franks", civ_2 == "Britons", elo == "<1000") %>%
  select(civ_1, civ_2, elo, victory_1, count) %>%
  kable()

civ_1	civ_2	elo	victory_1	count
Franks	Britons	<1000	1880	3523

What about a more esoteric match up? How about Malians versus Bengalis (two of least popular civilizations)? In this example we only have 13 historical games to rely on, in which the Malians won 9 (meaning a 69.2% win rate). While this seems high, we are relying on relatively little data. If I was a betting person I might still favor the Malians, but doI really think their win rate against the Bengalis is really 70%? With only 13 match ups I wouldn’t be too sure.

Show code

df1 %>%
  filter(civ_1 == "Malians", civ_2 == "Bengalis", elo == "<1000") %>%
  select(civ_1, civ_2, elo, victory_1, count) %>%
  kable()

civ_1	civ_2	elo	victory_1	count
Malians	Bengalis	<1000	9	13

So where does that leave us? Well, we can use modelling approaches to pool information about civilization win rates at various Elo ranks to fill in information about specific match ups. In cases where there are few observed match ups we can contribute information about overall win rates at different Elo ranks to help fill in information. In statistical language we call this partial-pooling. In addition, by using a statistical model we will be able to build in reasonable levels of uncertainty as well.

It is difficult to estimate win rates with sparse outcomes

Fitting the Model

The type of model we are fitting is a Bayesian hierarchical linear model (HLM) using a binomial distribution. Specifically we are fitting an aggregated binomial by modeling the number of victories by one player (victory_1) as a proportion of the total number of match ups for each civ * civ * Elo combination. We can then convert these counts to predicted probabilities and use them to help understand different match ups.

This is not a terribly complex model, so we will just throw some generic (weakly-informative) priors over the model standard deviations and intercepts. Here I’m using a student_t(3,0,1) prior for both. There certainly could be some argument for stronger priors (especially if you have more expert knowledge), but in a data set of this size it won’t be very useful. The main goal here is to provide regularization for the sparsely-populated cells.

Show code

# set model priors
model_prior <- c(prior(student_t(3,0,1), class = sd),
                 prior(student_t(3,0,1), class = Intercept))

# FIT 1: civ*civ*elo
brm_fit1 <- brm(
  victory_1 | trials(count) ~ 1 +
    (1 | civ_1 * civ_2 * elo),
  data = df1,
  prior = model_prior,
  family = "binomial",
  file = "elo_fit",
  chains = 4,
  cores = 4,
  backend = "cmdstanr"
)

After fitting the model we need to get predictions on all possible civ * civ * Elo combinations - which, in this case is \(42*42*4=7056\).

Show code

# set up a dataframe to predict against
# containing all civ1 * civ2 * elo combos
# set count to 100 to simulate 100 matchups

post_strat <-
  expand.grid(
    civ_1 = unique(df1$civ_1),
    civ_2 = unique(df1$civ_2),
    elo = unique(df1$elo),
    count = 100
  )

# extract fitted predictions on the response (probability) scale
preds <- fitted(brm_fit1, newdata = post_strat, type = "response")

# put the results in a separate dataframe
# get the point estimate and errors
pred_df <- post_strat %>%
  mutate(prob = preds[,1],
         err = preds[,2])

Voila! Now we have predictions for each type of match up with the associated error estimates. In this case prob refers to the number of games (out of 100) that civ_1 is predicted to win against civ_2 for each Elo.

head(pred_df)

       civ_1  civ_2   elo count     prob      err
1     Aztecs Aztecs <1000   100 47.46138 1.490057
2   Bengalis Aztecs <1000   100 43.88352 2.151690
3    Berbers Aztecs <1000   100 50.28497 1.612578
4  Bohemians Aztecs <1000   100 47.26388 1.837228
5    Britons Aztecs <1000   100 47.37382 1.243221
6 Bulgarians Aztecs <1000   100 54.32245 1.572924

If we want to see our example from above we can get estimates for Franks v. Britons, where we can see our estimates (while being a bit higher) than the observed raw rate, still clearly favor the Franks. The error estimates are also relatively small, given how popular this match up is. For lower-ranked games where the majority of the gameplay is, the margin of error is less than 2%.

Show code

civ_match(pred_df, "Franks", "Britons", k = 'elo')%>%
  kable(digits = 2)

elo	match	win	err	lo	hi
<1000	Franks-Britons	56.53	0.72	55.08	57.97
1000-1250	Franks-Britons	57.36	0.90	55.57	59.15
1250-1650	Franks-Britons	57.49	1.11	55.27	59.70
>1650	Franks-Britons	56.36	1.49	53.39	59.34

We can also look at our Malians-Bengalis match up as well. Again, this matches up with the raw rates where the Malians are clearly favored, but the point estimates are much more reasonable. We also see the error estimates are considerably larger, because we have many fewer observed game to rely on. In the >1000 Elo category the error is more than 3 times higher! In fact, because the error is so large, we can’t completely rule out that at lower Elos the Bengalis might also be favored (however above 1000 Elo it’s clearer that Malians are likely favored).

Show code

civ_match(pred_df, "Malians", "Bengalis", k = 'elo') %>%
  kable(digits = 2)

elo	match	win	err	lo	hi
<1000	Malians-Bengalis	53.44	2.39	48.65	58.23
1000-1250	Malians-Bengalis	56.35	2.24	51.87	60.83
1250-1650	Malians-Bengalis	56.47	2.25	51.97	60.97
>1650	Malians-Bengalis	55.56	2.40	50.75	60.36

A Tale of Two Civs: The Goths and Chinese

We can extend this logic to look at many different match ups at the same time. Let’s look at an exhaustive set of match ups for single civilizations based on Elo ranking. Because our model is quite extensible, so we can get estimates for any given civ. v. civ. match up for each (binned) Elo ranking. An advantage of this approach is that we can visually assess the predicted probabilities for each match up along a variety of Elo rankings. This might help understand how specific match ups change as the player skill levels increase.

For this example let’s focus specifically on the Goths and the Chinese. Why these two specific civilizations? Well, to start, they both have fairly low sub-50% raw win rates: 48.6% for the Goths and 45.7% for the Chinese. However, if we look at their performance across different civilization match ups in different Elo brackets, we might find some interesting patterns that aren’t immediately obvious.

The Goths

To start, we’ll run some code to compute estimates for all possible Goth games in all 4 of the Elo brackets.

Show code

# now do all civs
main_civ = "Goths"

allcivs <- unique(player_game$civ)
allcivs <- allcivs[!allcivs %in% main_civ]


# initalize list to store results
# then run all civ*civ*elo games
alist <- list()

for (i in 1:length(allcivs))
{
  alist[[i]] <- civ_match(pred_df, main_civ, allcivs[i], k = 'elo')
}

match_preds <- do.call(rbind, alist)

# optional, strip out team name for plot
match_preds$match <- gsub("(.*)-", match_preds$match,  replacement = "")

If we look at the overall win rate for Goths (across all civ match ups and all Elos) we see that they have an estimated win rate of about 47.6% - not too great. But what does that tell us about their overall match ups against specific civilizations and specific Elo levels? Using our model results we can plot out all the estimates and plot them against all combinations. What get is below:

Show code

ggplot(match_preds, aes(x = elo)) +
  geom_hline(yintercept = 50, color = "grey90", linewidth = 1.5) +
  geom_line(aes(y = win, group = match), color = "darkblue") +
  geom_point(aes(y = win, group = match), color = "white", fill = "darkblue",shape = 21, stroke = 2) +
  geom_ribbon(aes(ymin = lo, ymax = hi, group = match), alpha = .3, fill = "skyblue") +
  scale_x_discrete(guide = guide_axis(angle = 60)) +
  labs(x = "Elo", y = "Est. Win Rate", title = paste0(main_civ," Estimated Win Rates, by ELO"))  +
  facet_wrap(~match, nrow = 4) +
  theme_bw() +
  theme(axis.text = element_text(size = 10, color = "black"),
        panel.grid.minor = element_blank())

So in this case we see a pretty clear pattern! While the average win rate is pretty low, it is even lower at higher Elo levels. This is likely because at higher levels of play, their weaknesses are even more pronounced. Some of their worst match ups at the 1650+ level are against the Poles or Vikings, where their win rate is in the low 40s! Surprisingly, even match ups where we might think the Goths would be logically favored (like against Britons or Mayans) their win rates plummet as ELO increases. This Elo-related pattern is even more evident if we convert it to a heat map:

Show code

ggplot(match_preds, aes(x = elo, y = match)) +
  geom_tile(aes(fill = win)) +
  coord_equal() +
  khroma::scale_fill_sunset(midpoint = 50) +
  khroma::scale_color_sunset(midpoint = 50) +
  scale_x_discrete(guide = guide_axis(angle = 60)) +
  labs(color = "Win%", fill = "Win%") +
  theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text = element_text(size = 10, color = "black"),
        panel.grid.minor = element_blank())

Goths are largely unfavored for most match ups above 1650 Elo.

Here we can see that at low ELO rankings, a lot of the Goths’ weaknesses (namely weak early economy) is not as big of an issue. As we progress to higher Elos, these weaknesses are likely more easily exploited in higher-level play. However, there are some very one-sided match ups (like versus the Vikings, Turks, Teutons and Poles) which don’t seem to matter much by Elo. Sometimes one civilization is just better at exploiting anothers’ weaknesses!

The Chinese

On the flip side, we can look at the Chinese. Let’s try this again performing the same type of analysis.

Show code

# now do all civs
main_civ = "Chinese"

allcivs <- unique(player_game$civ)
allcivs <- allcivs[!allcivs %in% main_civ]


alist <- list()

for (i in 1:length(allcivs))
{
  alist[[i]] <- civ_match(pred_df, main_civ, allcivs[i], k = 'elo')
}

match_preds2 <- do.call(rbind, alist)

# optional, strip out team name for plot
match_preds2$match <- gsub("(.*)-", match_preds2$match,  replacement = "")

Here we see that, on average, the Chinese have a raw win rate even lower than the Goths. However, this belies an unusal pattern when we examine the model estimates across Elo ranks. Let’s plot them out the same way as above:

Show code

ggplot(match_preds2, aes(x = elo)) +
  geom_hline(yintercept = 50, color = "grey90", linewidth = 1.5) +
  geom_line(aes(y = win, group = match), color = "darkred") +
  geom_point(aes(y = win, group = match), color = "white", fill = "darkred",shape = 21, stroke = 2) +
  geom_ribbon(aes(ymin = lo, ymax = hi, group = match), alpha = .3, fill = "tomato") +
  scale_x_discrete(guide = guide_axis(angle = 60)) +
  labs(x = "Elo", y = "Est. Win Rate", title = paste0(main_civ," Estimated Win Rates, by ELO"))  +
  facet_wrap(~match, nrow = 4) +
  theme_bw() +
  theme(axis.text = element_text(size = 10, color = "black"),
        panel.grid.minor = element_blank())

Chinese win rates rise as Elo increases.

Interesting! In contrast to the Goths’ decline in win rate from low Elos to high, we observe the exact opposite here! As Elo ranks increase, Chinese players are increasingly favored to win. Why? Likely much of the Chinese play style is a bit more difficult to work out (a more high skill ceiling). On the flip side, the Goths are a fairly straightforward infantry civilization that suffers from weak early game economy

Show code

ggplot(match_preds2, aes(x = elo, y = match)) +
  geom_tile(aes(fill = win)) +
  coord_equal() +
  khroma::scale_fill_sunset(midpoint = 50) +
  khroma::scale_color_sunset(midpoint = 50) +
  scale_x_discrete(guide = guide_axis(angle = 60)) +
  labs(color = "Win%", fill = "Win%") +
  theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text = element_text(size = 10, color = "black"),
        panel.grid.minor = element_blank())

Chinese players are favored for many matchups above 1650 rating.

Map Type Favorability

Aside from Civ vs. Civ x Elo match ups, we can also look at how certain civilizations perform on different map types. For our purposes this necessitates fitting a second model. Rather than stratifying by Elo ranking, we are going to instead stratify by map type. In this case we’re going to only focus on the 15 most commonly played maps. The general model is exactly the same as the first, but we’re using some tighter priors (normal(0,1)) to help regularize estimates across the larger number of sparse cells. This just means because there are likely many match ups with few observed games, we need to provide stronger prior information to help the model converge.

Show code

# FIT 2: civ*civ*map
brm_fit2 <- brm(
  victory_1 | trials(count) ~ 1 +
    (1 | civ_1 * civ_2 * map),
  data = df2,
  prior = model_prior2,
  family = "binomial",
  file = "elo_fit_map",
  chains = 4,
  cores = 4,
  backend = "cmdstanr"
)

post_strat_map <-
  expand.grid(
    civ_1 = unique(df2$civ_1),
    civ_2 = unique(df2$civ_2),
    map = unique(df2$map),
    count = 100
  )

preds_map <- fitted(brm_fit2, newdata = post_strat_map, type = "response", allow_new_levels = TRUE)

pred_df_map <- post_strat_map %>%
  mutate(prob = preds_map[,1],
         err = preds_map[,2])

Nomad Maps

We have a lot of options to consider, but I think examining Nomad maps provides an interesting look at how map types can affect win rates. Nomad-type maps are particularly interesting because they deviate substantially from the normal play style (starting with a town center in an established, largely standardized spot). This shakes up the established game up a bit, and gives some civilizations an advantage.

With this in mind, we can see how a civilization like the Spanish fares on all map types. Looking at this, we see a pretty striking pattern - they are incredibly strong on Nomad-type maps! What is interesting about this is that normally unfavorable match ups (like Spanish-Gurjaras) are flipped to being in favor of the Spanish on nomad.

Show code

main_civ = "Spanish"
allcivs <- unique(player_game$civ)
allcivs <- allcivs[!allcivs %in% main_civ]

# map type
alist_map <- list()

for (i in 1:length(allcivs))
{
  alist[[i]] <- civ_match(pred_df_map, main_civ, allcivs[i], k = 'map')
}

match_preds_map <- do.call(rbind, alist)
match_preds_map$match <- gsub("(.*)-", match_preds_map$match,  replacement = "")

ggplot(match_preds_map, aes(x = map, y = match)) +
  geom_tile(aes(fill = win)) +
  coord_equal() +
  khroma::scale_fill_sunset(midpoint = 50) +
  khroma::scale_color_sunset(midpoint = 50) +
  scale_x_discrete(guide = guide_axis(angle = 60)) +
  labs(color = "Win%", fill = "Win%") +
  theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text = element_text(size = 10, color = "black"),
        panel.grid.minor = element_blank())

Spanish players excel in Nomad-type maps for most match ups.

Where are these estimates coming from? Well, we can look at the random effects from our model to see the “favorability” bias for the civ*map type. No surprise, Spanish have the highest boost to their win rate on Nomad (followed by Turks on Arabia and Malians on Nomad). In the case of the Spanish, we estimate that playing on Nomad boosts a Spanish player’s win rate by about 60% (\(exp(.48) = 1.61\)). In a hypothetical Spanish vs. Gurjaras game this is the difference between a 40% win rate on Arabia, versus a 63% win rate on Nomad.

Show code

data.frame(ranef(brm_fit2)$`civ_1:map`) %>%
  arrange(desc(`Estimate.Intercept`)) %>%
  slice(1:5) %>%
  kable(digits = 2, col.names = c("win", "err", "lo", "hi"), caption = "Estimated civilization * map 'favorability' bias")

Estimated civilization * map ‘favorability’ bias
	win	err	lo	hi
Spanish_Nomad	0.48	0.05	0.39	0.58
Turks_Arena	0.47	0.04	0.38	0.55
Malians_Nomad	0.36	0.06	0.24	0.48
Poles_Arena	0.34	0.04	0.26	0.43
Portuguese_Arena	0.33	0.05	0.24	0.43

Show code

civ_match(pred_df_map, "Spanish", "Gurjaras", k = 'map')%>%
  filter(map %in% c("Arabia", "Nomad")) %>%
  kable(digits = 2)

map	match	win	err	lo	hi
Arabia	Spanish-Gurjaras	39.94	1.59	36.77	43.11
Nomad	Spanish-Gurjaras	62.74	2.61	57.52	67.96

Unknown Civ vs. Civ Map Predictions

One of the biggest advantages of this modelling approach is that we can get estimates for civ v. civ x map type even for games where we don’t even have any observed matches! For example, we don’t have any observed match ups between the Aztecs and Dravidians on Socotra. But we can still make predictions about this hypothetical match up based on observed data! Here, we give the Dravidians a slight edge (51%), but given we don’t have a lot of Aztec vs. Dravidian games and even fewer Socotra maps, our uncertainty about this match up is higher. In the 95% credible interval, it is plausible that either civ might be more favored. Simply put, we don’t have enough data to be very confident, but we still can rely on what data we do have to come to a decision!

Show code

civ_match(pred_df_map, "Aztecs", "Dravidians", k = 'map')%>%
  filter(map == "Socotra") %>%
  kable(digits = 2)

map	match	win	err	lo	hi
Socotra	Aztecs-Dravidians	49.06	3.94	41.17	56.95