Overall, models are indispensable tools for thinking and for making sense of data. They explicate theoretical assumptions, identify essential parts and processes, allow to reason and evaluate predictions, provide existence proofs for hypothesized mechanisms, inform actions and policies, and enable scientists and other stakeholders to understand and communicate results.
Despite these benefits, we end on a cautionary note:
The fact that models are indispensable does not imply that we should blindly trust their results, of course.
Like any method or tool, models can be used wisely or unwisely.
Any model-based result is only as good as the assumptions and data that went into its creation.
Even small deviations from seemingly arbitrary premises or slight perturbations in parameters can result in vastly different results.
However, these crucial dependencies are often disguised by an impression of formal rigor and mathematical precision. A careful examination of models should always take into account their variability and uncertainty. If, however, their results are taken at face value, a blind reliance on models can be dangerous and border on religious idolatry. For instance, Pilkey & Pilkey-Jarvis (2007) argue that many of the quantitative models that politicians and administrators use to guide environmental policies are seriously flawed. When models are based on unrealistic or false assumptions, they can be used to support unwise or even harmful policies.
Here is a brief summary of key terms and insights:
Models are (formal or informal) descriptions of phenomena. They are tools or methodological vehicles that help us grasp the world.
Creating and evaluating models is the activity of modeling.
Explanatory models allow us to design, reason, and explore scenarios.
Thus, models can help to explicate assumptions, parameters, and processes, that otherwise remain implicit and vague.
Explanation is cheap. Successful models should also be good at predicting empirical phenomena. Predictions can be categorical or numerical.
Models have multiple goals: Models are never “true,” but rather aim to be useful. An important use of models lies in making ideas and processes more communicable to and verifyable by others. Successful models can also guide actions and policies.
Some pointers to resources for inspirations and ideas on models, modeling, and simulations:
Books on modeling in R
There are many good books on models and modeling. As the tradition of data modeling is much older than that of R or current R packages, students should not be tempted to read only the most recent textbooks, but rather stick to established classics.
Modeling in R and the tidyverse is an area with a very long tradition (under the name of statistics and machine learning) and many active developments. Although we often tend to favor recent tools and resources, many of them are partial and unfinished. Hence, we recommend starting with the basics and acquiring some sound background knowledge that allows evaluating the most recent developments.
Solid recommendations include:
- Applied predictive modeling by M. Kuhn & Johnson (2013).
Related resources include:
- An introduction to statistical learning (ISLR) book by James et al. (2013). Related resources include:
- Available online at https://mdsr-book.github.io/mdsr2e/
Valuable contributions with some rough or unfinished parts:
Statistical Inference via Data Science (by Chester Ismay and Albert Y. Kim) provides a hands-on approach to modeling in R and the tidyverse.
The R packages at tidymodels.org provide a collection of tools for modeling and machine learning in accordance with tidyverse principles. The emerging book Tidy Modeling with R (by Max Kuhn and Julia Silge) aims to show how to use them.
Blogs and online sites
The Learning Machines blog (by Holger K. von Jouanne-Diedrich) discusses engaging topics of data science and provides numerous inspirations for a wide range of data science projects. The post on Learning Data Science: Modelling Basics is a good place to start (although it discusses quantitative models that we only consider in Chapter 19 on Prediction).
Disentangling data science provides additional inspirations
The term data science is often used as an alternative name for statistical learning or machine learning:
More models by Leonardo da Vinci:
- Scientific drawings and inventions (on Wikipedia |
- on YouTube: Leonardo: Anatomist, by Nature Video, 2012)
More Kraftwerk soundtracks for computer modeling: