6.4 Terminological differences (2)

Well-established labels in the older literatures vs. new ML terminology
- Sample used to estimate the parameters vs. training sample
- Model is estimated vs. Model is trained
- Regressors, covariates, predictors vs. features (or inputs)
- Dependent variable/outcome vs. output
- Regression parameters (coefficients) vs. weights
Supervised vs. unsupervised machine learning (James et al. 2013, 1; Athey and Imbens 2019, 689)
- Good analogy: Child in Kindergarden sorts toys (with or without teacher’s input )
- Supervised statistical learning: involves building a statistical model for predicting, or estimating, an output based on one or more inputs
  - We observe both features $x_{i}$ and the outcome $y_{i}$
- Unsupervised statistical learning: There are inputs but no supervising output; we can still learn about relationships and structure from such data
  - Only observe $X_{i}$ and try to group them into clusters

References

Athey, Susan, and Guido W Imbens. 2019. “Machine Learning Methods That Economists Should Know About.” Annu. Rev. Econom. 11 (1): 685–725.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.