16.1 Introduction

Imagine a simple-minded organism that faces an unknown environment. The organism has a single need (e.g., find food) and a limited repertoire for action (e.g., move, eat, rest). To fulfill its solitary need, it navigates a barren and rugged surface that occasionally yields a meal, and may face a series of challenges (e.g., obstacles, weather conditions, and various potential dangers). Which cognitive and perceptual-motor capacities does this organism require to reach its goal and sustain itself? And which other aspects of the environment does its success depend on?

An ant foraging for food in an environment. (Image from this blog post.)

Figure 16.1: An ant foraging for food in an environment. (Image from this blog post.)

This example is modeled on a basic foraging scenario discussed by Simon (1956), but is reflected in many similar models that study how some organism (e.g., a simple robot) explores and exploits an initially unknown environment (e.g., a grid-world offering both rewards and risks).

Although this may be a toy scenario, it allows distilling some key elements that are required for addressing basic questions of rationality. For instance, the scenario is based on a conceptual distinction between an organism and its environment. As the organism can be simple or complex — think of an ant, a human being, or an entire organization — we will refer to it as an agent. Given some goal, the agent faces the problem of “behaving approximately rationally, or adaptively, in a particular environment” (Simon, 1956, p. 130).

The conceptual distinction between agent and its environment is helpful, but not as simple and straightforward as it seems. For instance, perceptions of environmental states, memories of past events, and cues or traces left by earlier explorations may all be important sources of information — but are these information sources part of the agent or of its environment? Researchers from artificial intelligence and the fields of embodied and distributed cognition have argued that the boundaries are blurred. Additionally, the status of the category labels “agent” vs. “environment” is a matter of perspective: From the perspective of any individual, all other agents are part of the environment.

Note two key challenges on elementary dimensions: Things can be unknown or uncertain. When things are unknown (or risky), we can discover them by exploring or familiarizing us with them. Doing so assumes some flexibility on part of the agent (typically described as acquiring knowledge or learning), but can lead to reliable estimates of the unknown parameters. By contrast, when things are uncertain (e.g., when options or rewards can change in unpredictable ways), even intimate familiarity does not provide reliable estimates or certainty for any particular moment. Thus, both ignorance and uncertainty are problematic, but uncertainty is the more fundamental problem.

Importantly, both problems are inevitable when things are dynamic (i.e., changing). The fact that changes tend to create both ignorance and uncertainty makes it essential that our models can accommodate changes. However, changes are not just a source of problems, but also part of the solution: When aspects of our environment change, processes of adaptation and learning allow us to adjust again. (Note that this may sound clever and profound, but is really quite simple: When things change, we need to change as well — and the reverse is also true.)

Simon addresses the distinction between optimizing and satisficing. Organisms typically adapt well-enough to meet their goals (i.e., satisfice), but often fail to optimize.

Clarify key terminology: What changes when things are dynamic?

  • agents (goals, knowledge and beliefs, policies/strategies, capacities for action)

  • environments (tasks, rewards, temporal and spatial structures)

  • interactions between agents and environments

16.1.1 What is dynamic?

The term dynamics typically implies movement or changes.

By contrast, our simulations so far were static in two ways:

  • the environment was fixed (even when being probabilistic): One-shot games, without dependencies between subsequent states.

  • the range of actions was fixed a priori and behavioral policy of agents did not change)

Three distinct complicating aspects:

  1. Environmental dependencies: Later states in the environment depend on earlier ones.

  2. Learning: Behavior of agents (i.e., their repertoire or selection of actions) changes as a function of experience or time.

  3. Interaction with agents: Environment or outcomes change as a function of (inter-)actions with agents.

Different types of dependencies (non-exhaustive):

  • temporal dependencies: Later states depend on previous ones

  • interactions: Interdependence between environment and an agent’s actions.

  • types of learning: agents can change their behavioral repertoire (actions available) or policies (selection of actions) over time

If the environment changes as a function of its prior states and/or the actions of some agent(s), it cannot be simulated a priori and independently of them (i.e., without considering actions).

Typical programming elements:

  • Trial-by-trial structure of simulations (i.e., using loops or tables that are filled incrementally). As agents or environments change over time, we need to explicitly represent time and typically proceed in a step-wise (i.e., iterative) fashion.
    For instance, an agent may interact with an environment over many decision cycles, observes an outcome, and adjusts its expectations or actions accordingly. Time is typically represented as individual steps \(t\) (aka. periods, rounds, or trials) that range from some starting value (e.g., 0 or 1) to some large number (e.g., \(n_T = 100\)). Often, we will even need inner loops within outer loops (e.g., for running several repetitions of a simulation, to assess the robustness of results).

  • More abstract representations: For instance, rather than explicitly representing each option in an environment, we could simply use a vector of integers to represent a series of chosen options (i.e., leave the state of the environment implicit). Similarly, an agent’s beliefs regarding the reward quality of options may be represented as a vector of their choice probabilities.


  • Environments with some form of “memory” (e.g., planting corn or wheat, changes in crop cycle)

  • Interactions: Environments that change by agent’s actions:

    • TRACS (environmental options change as function of agent’s actions) vs.
    • Tardast (dynamic changes over time and agent actions)

16.1.2 Topics and models addressed

In this chapter, we provide a glimpse on some important families of models, known as
learning agents and multi-armed bandits (MABs).

Typical topics addressed by these paradigms include: Learning by adjusting expectations to observed rewards and trade-off between exploration vs. exploitation.

Typical questions asked in these contexts include: How to learn to distinguish better from worse options? How to optimally allocate scarce resources (e.g., limitations of attention, experience, time)?

We can distinguish several levels of (potential) complexity:

  1. Dynamic agents: Constant environment, but an agent’s beliefs or repertoire for action changes based on experience (i.e., learning agent).

  2. Dynamic environments: Assuming some agent, the internal dynamics in the environment may change. MAB tasks with responsive or restless bandits (e.g., adding or removing options, changing rewards for options).

  3. Benchmarking the performance of different strategies: Evaluating the interactions between an agent (with different strategies) and environments: Agent and/or environment depend on strategies and previous actions.

A further level of complexity will be addressed in the next chapter on Social situations (see Chapter 17): 4. Tasks involving multiple agents: Additional interactions between agents (e.g., opportunities for observing and influencing others, using social information, social learning, including topics of communication, competition, and cooperation, etc.).

16.1.3 Models of agents, environments, and interactions

The key motivation for dynamic simulations is that things that are alive constantly keep changing. Thus, our models need to allow for and capture changes on a variety of levels. The term “things” is used in its widest possible sense (i.e., entities, including objects and subjects), including environments, agents, representations of them, and their interactions.

General issues addressed here (separating different types of representations and tasks):

  • Modeling an agent (e.g., internal states and overt actions)

  • Modeling the environment (e.g., options, rewards, and their probabilities)

A mundane, but essential aspect: Adding data structures for keeping track of changes (in both agents or environments).

Note: When the term “dynamic” makes you think of video games and self-driving cars, the environments and problems considered here may seem slow and disappointing. However, we will quickly see that even relatively simple tasks can present plenty of conceptual and computational challenges.

Key goals and topics of this chapter:

  • Implementing a basic model of learning (e.g., RL, Section 16.2)

  • Implementing dynamic environments (e.g., with uncertain rewards, MAB, Section 16.3)

  • Evaluating dynamic models (e.g., by benchmarking strategy performance, RTA, Section 16.4)


Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129–138. https://doi.org/10.1037/h0042769