6.1 Classical statistics vs. machine learning

  • Two cultures of statistical analysis (Breiman 2001; Molina and Garip 2019, 29)
    • Data modeling vs. algorithmic modeling (Breiman 2001)
      • \(\approx\) generative modelling vs. algorithmic modeling (Donoho 2017)
  • Generative modeling (classical statistics, Objective: Inference)
    • Goal: understand how an outcome is related to inputs
    • Analyst proposes a stochastic model that could have generated the data, and estimates the parameters of the model from the data
    • Leads to simple and interpretable models BUT often ignores model uncertainty and out-of-sample performance
  • Predictive modeling (Objective: Prediction)
    • Goal: prediction, i.e., forecast the outcome for future inputs
    • Analyst treats the underlying generative model for the data as unknown and considers the predictive accuracy of alternative models on new data.
    • Leads to complex models that perform well out of sample BUT can produce black-box results that offer little insight on the mechanism linkingthe inputs to the output
  • See also James et al. (2013 Ch. 2.1.1)

References

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” SSO Schweiz. Monatsschr. Zahnheilkd. 16 (3): 199–231.

Donoho, David. 2017. “50 Years of Data Science.” J. Comput. Graph. Stat. 26 (4): 745–66.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.

Molina, Mario, and Filiz Garip. 2019. “Machine Learning for Sociology.” Annu. Rev. Sociol., July.