## 17.4 Exercises

Ideas for corresponding exercises and projects:

### 17.4.1 Reflecting on Pac-Man

The video game of Pac-Man is far too fast, rich, and complicated to be modeled as one of our exercises. However, it is instructive to reflect on what would be required to implement a corresponding model.

How could the environment (i.e., the maze, pellets, time steps and rounds) be represented in a computer simulation?

What would a model of the Pac-Man agent require (in terms of goals, internal states, and actions)?

What would change if we were to model the ghosts instead?

**Bonus:**What’s the name of the 4th ghost in Pac-Man?

### 17.4.2 Learning in coordination and conflict games

The *prisoner’s dilemma* (PD) is a notorious example of a two-player coordination game that shows why two rational players may not cooperate, even if it is in their best interests to do so (see Wikipedia for details).
Table 17.4 shows a payoff matrix of a PD game.

Options | cooperate | defect |
---|---|---|

cooperate |
\((-1, -1)\) |
\((-3, +0)\) |

defect |
\((+0, -3)\) | \((-2, -2)\) |

Your tasks:

Classify the PD’s type of game by describing its payoff matrix.

Implement a population of 10 x 10 agents that play the game for 100 rounds (using the lattice/torus setup from Section 17.2.1). What do they learn? Why?

- Repeat 1. and 2. for the
*battle of the sexes*(BoS) game — a two-player coordination game that pits diverging preferences against each other and thus may involve some conflict (see Wikipedia for details). As the standard version assigns somewhat sexist roles to males and females, Table 17.5 shows a payoff matrix of a BoS game as a dilemma to attend a performance playing either music composed by “Bach” or by “Stravinsky”:

Options | Bach | Stravinsky |
---|---|---|

Bach |
\((3, 2)\) |
\((0, 0)\) |

Stravinsky |
\((0, 0)\) | \((2, 3)\) |

### 17.4.3 Playing rock-paper-scissors (RPS)

Table 17.6 illustrates the payoff matrix of the *rock-paper-scissors* (RPS) game.

Options | rock | paper | scissors |
---|---|---|---|

rock |
\((0, 0)\) | \((-1, +1)\) | \((+1, -1)\) |

paper |
\((+1, -1)\) | \((0, 0)\) | \((-1, +1)\) |

scissors |
\((-1, +1)\) | \((+1, -1)\) | \((0, 0)\) |

Classify this type of game and justify your choices by describing the payoff matrix.

What is the most rational strategy when playing this game against an opponent? Why?

Implement the game for a population of 100 agents arranged in a 10 x 10 lattice/torus design, in which each agent plays its 8 surrounding agents on each of \(N_t = 100\) rounds (as in Section 17.2.1). What do the agents learn?

**Hint:** This exercise is based on a simulation by Wataru Toyokawa, which can be found here.
The original example uses a population of \(42^2 = 1764\) agents, which takes longer, but yields more stable results.

**Bonus**: Let’s try to reproduce an interesting result based on a Japanese children’s game that uses a biased payoff matrix (see Wataru’s remarks on “Gu-ri-co”).

Assume a RPS game in which each winning player receives the number of points that corresponds to the number of characters in the name of her or his chosen option. This results in a biased payoff matrix that yields 8 points for winning with scissors (as`nchar(scissors)`

is 8), 5 points for winning with paper, and 4 points for winning with rock (and ties and losses earning 1 and 0 points, respectively).**Bonus**: Assume that your opponent(s) selects her options randomly, but with biased preferences. Specifically, she plays rock on 50% of all trials, paper on 30% of all trials, and scissors on 20% of all trials. Show that and how your learning agent would beat this agent.

#### Solution

ad 1.: The RPS game is a purely competitive zero-sum game. Its three options are symmetrical in the sense as each option is winning against one and losing against one other option.

ad 2.: The most rational strategy in playing the RPS game is to be maximally unpredictable. This is achieved by selecting each of the three options with a random probability of \(p=1/3\).

ad 3.: We need to re-define the parameters of the game (i.e., options and payoffs), as well as change the auxiliary simulation code to accommodate three options (e.g., their densities).

Here is the resulting code:

Game definition: Change payoff matrices to accommodate three options (and avoid negative payoffs):

```
# Game:
<- 3 # NEW: Number of options
N_opt
# NEW payoff matrices (per player) of the RPS game:
<- matrix(c( 0, -1, +1,
payoff_p1 +1, 0, -1,
-1, +1, 0),
nrow = N_opt, byrow = TRUE) # Player 1 payoffs
<- -1 * payoff_p1 # Player 2 payoffs
payoff_p2
# HACK to avoid negative payoffs (NEW):
<- payoff_p1 + 1
payoff_p1 <- payoff_p2 + 1
payoff_p2
<- function(opt_p1, opt_p2){
get_payoffs
# Look up values in matrices:
<- payoff_p1[opt_p1, opt_p2]
pay_p1 <- payoff_p2[opt_p1, opt_p2]
pay_p2
<- c(pay_p1, pay_p2)
pay_offs
return(pay_offs)
}
## Check:
# get_payoffs(3, 1)
```

Simulation parameters (unchanged):

```
<- 10 # length/width of lattice
LAT <- LAT^2 # population size/number of players
N_p <- 100 # number of games/rounds/time steps
N_t
# The boundary condition of the lattice/torus:
<- function(i) { # i is x or y coordinate:
boundary_cond
if (i < 1) {
return (i + LAT)
else if (i > LAT) {
}
return (i - LAT)
else {
}
return (i)
} }
```

Utility functions (unchanged):

```
# exp_ceiling() limits the exponential of x
# to a maximum value of 2^1023:
<- function(x) {
exp_ceiling
<- exp(x) # compute exp once
exp_x
<- which(is.infinite(exp_x)) # ix of infinite values
ix_inf
if (length(ix_inf) == 0) {
return(exp_x) # return as is
else { # limit infinite values:
}
<- 8.988466e+307 # 2^1023
exp_x[ix_inf]
return(exp_x)
}
}
## Check:
# x <- c(10, 100, 1000) # Note: exp(710) => Inf
# exp(x)
# exp_ceiling(x)
# Simple vector calculation functions (to use in apply()):
= function (num, den) {
divide_vec
return(num/den)
}
= function (Q, beta) {
multiply_vec
return(Q * beta)
}
```

Learner parameters (unchanged):

```
# Learner:
<- 0.10 # exploration rate (epsilon-greedy model)
epsilon <- 0.50 # learning rate/step size parameter
alpha <- 0 # initial payoff expectation
Q_initial <- array(dim = c(N_opt, N_t, N_p))
Q 1, ] <- Q_initial Q[ ,
```

Data structures for storing intermediate results: Adding a `density_3`

vector to store the probability density of a 3rd option.

```
# Storing history:
<- matrix(nrow = N_p, ncol = N_t) # in wide format
choices <- matrix(nrow = N_p, ncol = N_t)
payoffs <- matrix(nrow = N_p, ncol = N_t)
performance
<- array(0, dim = c(N_opt, N_p))
choiceCounter <- matrix(nrow = N_p, ncol = N_t)
riskyChoiceProb <- array(dim = c(N_opt, N_t, N_p))
netChoiceProb 1, ] <- 1/N_opt
netChoiceProb[ ,
<- rep(NA, N_t)
density_1 <- rep(NA, N_t)
density_2 <- rep(NA, N_t) # NEW
density_3
# this is a list where the tactics distribution of each time step will be recorded
<- list() all_choices
```

Running the simulation as a `for`

loop (for each time step \(t\)): Adding a line to collect `density_3`

values on each time step.

```
for (t in 1:N_t) {
# Each individual chooses one option based on his/her choice probability netChoiceProb[ , t, ]:
= apply(netChoiceProb[ , t, ], MARGIN = 2,
choices[ , t] FUN = function(netChoiceProbInstance){
sample(1:N_opt, 1, prob = netChoiceProbInstance, replace = FALSE)})
$step[[t]] <- matrix(choices[ , t], nrow = LAT, ncol = LAT, byrow = TRUE)
all_choices
# Calculate the density of each option:
<- which(choices[ , t] == 1) %>% length() %>% divide_vec(., LAT^2)
density_1[t] <- which(choices[ , t] == 2) %>% length() %>% divide_vec(., LAT^2)
density_2[t] <- which(choices[ , t] == 3) %>% length() %>% divide_vec(., LAT^2) # NEW
density_3[t] # OR: # density_3[t] <- 1 - (density_1[t] + density_2[t])
# Calculate payoffs of each player:
# Assumption: Each player plays the game once against each of the 8 neighbors:
for (p in 1:N_p) {
# the position (address?) of the focal player in the lattice:
<- (p - 1) %/% LAT + 1 # quotient
i <- (p - 1) %% LAT + 1 # remainder
j <- 0 # initialize value
payoffs[p, t]
for (g in c(-1, 0, 1)) { # neighbor x
for (h in c(-1, 0, 1)) { # neighbor y
if (g^2 + h^2 > 0) { # is FALSE iff g == h == 0, which eliminates the focal player:
# Determine current payoffs (given player choices):
<- get_payoffs(opt_p1 = all_choices$step[[t]][i, j],
cur_payoffs opt_p2 = all_choices$step[[t]][boundary_cond(i+g), boundary_cond(j+h)])
<- cur_payoffs[1] # payoff of focal player
cur_payoff_p1
<- payoffs[p, t] + cur_payoff_p1 # accumulate payoffs of focal player
payoffs[p, t]
# if.
}
# for h.
} # for g.
}
# for p.
}
# cur_choices tracks which option was chosen by each individual on this trial:
<- choices[ , t] + N_opt * (1:N_p - 1)
cur_choices
# Updating Q values (only if another trial is following):
if (t < N_t) {
# Copying the previous Q value to the current Q value:
+ 1, ] <- Q[ , t, ]
Q[ , t
<- aperm(Q, c(1, 3, 2)) # transpose dimensions 2 and 3 of Q
QQ dim(QQ) <- c(N_opt * N_p, N_t)
# Q value updating (note: alpha is a step-size parameter which is 1/n; the sample averaging rule):
+ 1] <- QQ[cur_choices, t] + alpha * (payoffs[ , t] - QQ[cur_choices, t])
QQ[cur_choices, t
dim(QQ) <- c(N_opt, N_p, N_t)
<- aperm(QQ, c(1, 3, 2))
Q
###############
## Softmax choice (based only on Q values):
###############
<- (Q[ , t + 1, ] * rep(20, each = N_opt)) %>%
Q_exp apply(., MARGIN = 2, FUN = exp_ceiling) %>% round(., 5)
<- Q_exp %>%
softmaxMatrix apply(1, divide_vec, den = apply(Q_exp, MARGIN = 2, FUN = sum)) %>%
t() %>% round(., 5)
###############
## Softmax end.
###############
## Update choice probability matrix:
<- apply(X = softmaxMatrix, MARGIN = 1, FUN = multiply_vec, beta = (1 - epsilon)) %>%
netMatrix t() + apply(matrix(rep(1/N_opt, N_opt * N_p), N_opt:N_p), MARGIN = 1, FUN = multiply_vec, beta = epsilon) %>%
t() %>% round(., 4)
<- aperm(netChoiceProb, c(1, 3, 2))
netChoiceProbAperm dim(netChoiceProbAperm) <- c(N_opt * N_p, N_t)
dim(netMatrix) <- c(N_opt * N_p, 1)
+ 1] <- netMatrix
netChoiceProbAperm[ , t dim(netChoiceProbAperm) <- c(N_opt, N_p, N_t)
<- aperm(netChoiceProbAperm, c(1, 3, 2))
netChoiceProb
# if (t < N_t).
}
# for t. }
```

Results: Density dynamics of option choices over 100 trials: Adding some details for 3rd option:

```
# Prepare data frame:
<- data.frame(t = 1:N_t,
density_data density_1 = density_1,
density_2 = density_2,
density_3 = density_3 # NEW
)
<- density_data %>%
density_long pivot_longer(cols = c(-t), names_to = 'option', values_to = 'density')
ggplot(density_long) +
geom_hline(yintercept = 1/N_opt, linetype = 'dashed', color = "grey50")+
geom_line(aes(t, density, color = option), size = 1, alpha = 3/4)+
scale_color_manual(values = c('density_1' = usecol(Pinky),
'density_2' = usecol(Seeblau),
'density_3' = usecol(Seegruen)), # NEW
labels = c("Rock", "Paper", "Scissors")) + # NEW
ylim(c(0, 1)) +
labs(title = "Density dynamics of options over time",
x = "Time step t", y = "Option density", color = "Options:") +
theme_ds4psy()
```

**Interpretation:** There is no evidence for any dominant strategy. As soon as one strategy is chosen more frequently than by chance, its opposite strategy becomes more successful. Thus, the population oscillates around the random baseline at \(p=1/3\), which is the optimal strategy in this competitive game with three symmetrical options.

- ad 4.: Here is the biased payoff matrix and some modified parameter settings:

```
# BIASED payoff matrices (per player) of the RPS game:
<- matrix(c(1, 0, 4,
payoff_p1 5, 1, 0,
0, 8, 1),
nrow = N_opt, byrow = TRUE) # Player 1 payoffs
<- matrix(c( 1, 0, 8,
payoff_p2 1, 1, 0,
0, 2, 1),
nrow = N_opt, byrow = TRUE) # Player 2 payoffs
# NEW simulation/agent parameters:
<- 200 # number of games/rounds/time steps
N_t <- 0.01 # exploration rate (epsilon-greedy model)
epsilon
set.seed(468) # reproducible randomness
```

To obtain the following result (shown in Figure 17.6), we reduced the model’s exploration rate `epsilon`

to 0.01 (to obtain smoother curves) and increased the number of trials `n_T`

to 200:

**Interpretation:** It seems that the option with the highest payoff (i.e., “scissors”) is favored over the other options, but not by a wide margin. Interestingly, the second highest option (at the end of 200 trials) is “rock.” Winning with this option yields the least number of points, but it beats the popular “scissors” strategy.
Overall, even the biased payoffs do not allow the population to gravitate toward a single strategy, as any strategy can be beaten by another.

### 17.4.4 Sequential mate search

A fundamental and important social game in the biological realm is provided by the problem of finding a mate to procreate. Mate search simulations are well-suited to illustrate the dynamics of social situations. However, they also contain many simplifying assumptions that may not hold in nature and society. In this exercise, we first create a basic simulation to explore some dynamics, before critically evaluating its results and deficits.

Assume a simulation for the following situation:

Assumption 1:

*Population*:

Imagine a population of simple creatures: \(N_p = 200\) individuals, equally distributed into 2 types (i.e., \(N_p/2 = 100\) individuals per type), with procreation depending on a long-term hetero-sexual partnership.Assumption 2:

*Attractiveness*:

All relevant features of an individual \(i\) for the this task can be summarized in its overall attractiveness score \(A_{i}\). As all individuals of the opposite type share the same criteria for evaluating attractiveness, an individual’s attractiveness for individuals of the other type are fully described by the*rank*of its attractiveness score (i.e., a value of 1–100, assuming no ties). However, the individuals of the population do initially not know their own attractiveness score and do not know the distribution of attractiveness values in the population (i.e., they initially do not know how many individuals with values above 50 exist).Assumption 3:

*Dating*and*marrying*:

Two individuals of opposite types meet in a random fashion in a series of \(N_t = 100\) rounds of “dating.” In each round, the two individuals evaluate each other (based on some strategy that is only known to each individual) and decide whether they want to partner up with their match and make an offer if they do. Given a mutual choice for each other, the couple is matched (“married”) and removed from any further rounds of dating (i.e., married couples stay together).Assumption 4:

*Offspring*and*relationship value*:

In each round after their marriage, a married couple produces an individual offspring with a probability \(P_{off}=.10\). As the attractiveness value of this offspring is the average of its parents, the value of a relationship can be summarized by \(N_{off} \cdot \frac{A_{i} + A_{j}}{2}\), where \(A_{i}\) and \(A_{j}\) are the attractiveness values of the married partners and \(N_{off}\) is the total number of a couple’s offspring.

Your task is to devise a successful strategy for 1/3 (34%) of each type of the population in an environment in which two other strategies exist (and are randomly distributed over all individuals):

1/3 (33%) of the population (of each type) follows an experience-based aspiration level strategy that wants to partner up with any date that exceeds the average attractiveness of the first 10 dates.

1/3 (33%) of the population (of each type) first aims to estimate their own attractiveness by monitoring the offers received on the first 20 dating rounds. They then estimate their own attractiveness (as the average of the attractiveness values of the partners who made positive offers up to this point) and then set their aspiration level to a partner that is at least as attractive as themselves.

Justify why you believe that your strategy is more successful than the other two.

- Create the simulation in R and evaluate the performance of all three strategies (numerically and visually). Which measures (beyond each strategy’s overall value) is informative regarding the success of a strategy?

#### Solution

General simulation parameters:

```
<- 200 # population size
N_p <- 1:N_p/2 # attractiveness ranks of type 1
A_1 <- 1:N_p/2 # attractiveness ranks of type 2
A_2 <- 100 # number of time steps/trials/rounds N_t
```

Note that the assumed value of a relationship can be directly computed when a match is made (at \(t_{match}\)) and is a linear function of \(P_{off}\), the remaining time periods (\(N_{t} - t_{match}\)), and the attractiveness values of both partners (\(A_i + A_j\)):

```
<- function(t_match, p_off, A_1, A_2){
match_value
<- N_t - (t_match + 1) # time left
t_left
<- p_off * t_left # expected number of offspring in time left
N_off
<- (A_1 + A_2)/2 # mean attractiveness of partners
A_mean
* A_mean
N_off
}
```

- Discuss in which ways these assumptions are not valid for human societies (which ones?), how more realistic assumptions would change the corresponding simulation, and what this means for the results from corresponding models.

#### Solution

There are many simplifying assumptions that clearly — and fortunately — do not apply to a realistic population of human beings. For instance:

- Each individual is assumed to belong to one of two types and assumed to seek an exclusive and stable relationship with an individual of the opposite type. Thus, both the population setup and the dating regime disregard the possibility of other genders, non-permanent relationships, and any non hetero-sexual preferences. Similarly, individuals who do not seek a relationship or remain childless are not accounted for and even discounted as worthless by the assumed value function.

Thus, the presence of superficial analogies in structure, process, and terminology (e.g., “attractiveness,” “dating,” “marriage,” etc.) must not trick us into assuming that this model reflects something fundamental about human societies. Essentially, the results of any simulation are only as valid as its assumptions — and we should at least be skeptical when a model makes many unrealistic ones.

Note, however, that we also emphasized that all models are wrong, but may still be useful (in Section 14.1). Thus, reasons for skepticism do not automatically render all model results useless. An advocate of this particular model could claim that it may fail for human societies, but may still mimick important aspects of some animal population. But beyond the flaws that may be fatal for depicting a human society, there are more subtle deficits that may also be violated by most animal organisms. For instance:

All individuals (in the population and within both types) are assumed to share the same preferences.

The assumption that individuals’ strategies are independent of their attractiveness score (due to being distributed randomly) is unrealistic.

An individual’s attractiveness score is an abstraction that serves multiple functions. First, it matters crucially for finding a match, then plays no role in the probability of procreation, but determines the value of a match again.

The round value \(t\) paces the dating regime (i.e., meeting one condidate per round), but also the rate of procreation (and thus the overall value of a partnership). In most animal populations, both time scales would differ from each other and not all individuals may follow the same rate.

Allowing for non-permanent relationships and extra-marital affairs could completely change the dynamics of the situation.

Thus, the simulation may be suited to illustrate some dynamics of social situations, but provides no realistic model of mate search in most human or non-human societies. Interestingly, if we re-frame the two types in terms of supply and demand (i.e., as repeated interactions between sellers and buyers) many people may be less critical and willing to consider the same simulation as a decent economic model of market interactions…