7.2 Classical statistics vs. machine learning
- Two cultures of statistical analysis (Breiman 2001; Molina and Garip 2019, 29)
- Data modeling vs. algorithmic modeling (Breiman 2001)
- \(\approx\) generative modelling vs. algorithmic modeling (Donoho 2017)
- Data modeling vs. algorithmic modeling (Breiman 2001)
- Generative modeling (classical statistics, Objective: Inference)
- Goal: understand how an outcome is related to inputs
- Analyst proposes a stochastic model that could have generated the data, and estimates the parameters of the model from the data
- Leads to simple and interpretable models BUT often ignores model uncertainty and out-of-sample performance
- Predictive modeling (Objective: Prediction)
- Goal: prediction, i.e., forecast the outcome for unseen (Q: ?) or future observations
- Analyst treats the underlying generative model for the data as unknown and considers the predictive accuracy of alternative models on new data.
- Leads to complex models that perform well out of sample BUT can produce black-box results that offer little insight on the mechanism linking the inputs to the output (but Interpretable ML)
- See also James et al. (2013, Ch. 2.1.1)
References
Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” SSO Schweiz. Monatsschr. Zahnheilkd. 16 (3): 199–231.
Donoho, David. 2017. “50 Years of Data Science.” J. Comput. Graph. Stat. 26 (4): 745–66.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.
Molina, Mario, and Filiz Garip. 2019. “Machine Learning for Sociology.” Annu. Rev. Sociol., July.