17.5 Conclusion

17.5.1 Summary

Summary of three key questions:

  1. What is being predicted? What type of predictive task is being performed?

  2. How is predictive success evaluated?

  3. What is the baseline performance and the performance of alternative benchmarks?

Actually, asking these three questions are useful for any kind of agent modeling or measuring performance in general. (See Neth, Sims, & Gray, 2016, for applications in assessments of human rationality.)

17.5.2 Beware of biases

A final caveat:

A story: Inspecting WW-II bomber planes after their missions: Assume that

  • 90% of planes show bullet hits on wings;
  • 10% of planes show bullet hits on tanks.

Where should we add more reinforcements?

An instance of survivorship bias.

More generally, we may be susceptible to biases due to the availability of data. Thus, when training and evaluating algorithms, we must always ask ourselves where the data is coming from.

17.5.3 Resources

Pointers to sources of inspirations and ideas:

Books, chapters, and packages

Articles and blogs