19.5 Conclusion

19.5.1 Summary

Summary of three key questions:

  1. What is being predicted? What type of predictive task is being performed?

  2. How is predictive success evaluated?

  3. What is the baseline performance and the performance of alternative benchmarks?

Actually, asking these three questions are useful for any kind of agent modeling or measuring performance in general. (See Neth et al., 2016, for applications in assessments of human rationality.)

19.5.2 Beware of biases

A final caveat:

A story: Inspecting WW-II bomber planes after their missions: Assume that

  • 90% of planes show bullet hits on wings;
  • 10% of planes show bullet hits on tanks.

Where should we add more reinforcements?

An instance of survivorship bias.

More generally, we may be susceptible to biases due to the availability of data. Thus, when training and evaluating algorithms, we must always ask ourselves where the data is coming from.

19.5.3 Resources

Pointers to sources of inspirations and ideas:

Books, chapters, and packages

Articles and blogs


Baumer, B. S., Kaplan, D. T., & Horton, N. J. (2021). Modern Data Science with R (2nd ed.). Chapman; Hall/CRC. https://mdsr-book.github.io/mdsr2e/
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63–90. https://doi.org/10.1023/A:1022631118932
Neth, H., Sims, C. R., & Gray, W. D. (2016). Rational task analysis: A methodology to benchmark bounded rationality. Minds and Machines, 26(1-2), 125–148. https://doi.org/10.1007/s11023-015-9368-8
Phillips, N. D., Neth, H., Woike, J. K., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344–368. http://journal.sjdm.org/17/17217/jdm17217.html