\(\text{Unit i} \quad\) | \(Name \quad\) | \(X1_{i}^{Age} \quad\) | \(X2_{i}^{Educ.} \quad\) | \(D_{i}^{Unempl.} \quad\) | \(Y_{i}^{Lifesat.} \quad\) |
---|---|---|---|---|---|
1 | Sofia | 29 | 1 | \({\color{red}{1}}\) | 3 |
2 | Sara | 30 | 2 | \({\color{red}{1}}\) | 2 |
3 | José | 28 | 0 | \({\color{blue}{0}}\) | 5 |
4 | Yiwei | 27 | 2 | \({\color{red}{1}}\) | ? |
5 | Julia | 25 | 0 | \({\color{blue}{0}}\) | 6 |
6 | Hans | 23 | 0 | \({\color{red}{1}}\) | ? |
.. | .. | .. | .. | .. | .. |
1000 | Hugo | 23 | 1 | \({\color{blue}{0}}\) | 8 |
Descriptive/causal inference vs. prediction
Learning outcomes/objective:
- Understand difference between descriptive/causal inference and prediction from a data perspective
- Clarification of different terminology: Inference; Prediction; Forecasting; Imputation; etc.
1 Inference (1): Descriptive inference
- Goal of descriptive inference: Estimate a parameter in a population
- e.g., Research question: What is the average of life satisfaction/unemployment among Mannheim University students?
- Q: Easy to find out? What could be the problem?
- e.g., Research question: What is the average of life satisfaction/unemployment among Mannheim University students?
- Table 1 displays our sample
- Assuming it were the population we could add a vector \(R_{i}\) that indicates whether someone in the population has been sampled (cf. Abadie et al. 2020)
2 Inference (2): Causal inference
- Goal of causal inference: Identify whether particular cause/treatment \(D\) has a causal effect on \(Y\) in a population
- e.g., Research question: What is the causal effect of unemployment \(D\) on life satisfaction \(Y\) among Mannheim students?
\(\text{Unit i} \quad\) | \(Name \quad\) | \(X1_{i}^{Age} \quad\) | \(X2_{i}^{Educ.} \quad\) | \(D_{i}^{Unempl.} \quad\) | \(Y_{i}^{Lifesat.} \quad\) | \(Y_{i}({\color{blue}{0}})\quad\) | \(Y_{i}({\color{red}{1}})\quad\) |
---|---|---|---|---|---|---|---|
1 | Sofia | 29 | 1 | \({\color{red}{1}}\) | 3 | ? | 3 |
2 | Sara | 30 | 2 | \({\color{red}{1}}\) | 2 | ? | 2 |
3 | José | 28 | 0 | \({\color{blue}{0}}\) | 5 | 5 | ? |
4 | Yiwei | 27 | 2 | \({\color{red}{1}}\) | ? | ? | ? |
5 | Julia | 25 | 0 | \({\color{blue}{0}}\) | 6 | 6 | ? |
6 | Hans | 23 | 0 | \({\color{red}{1}}\) | ? | ? | ? |
.. | .. | .. | .. | .. | .. | .. | .. |
1000 | Hugo | 23 | 1 | \({\color{blue}{0}}\) | 8 | 8 | ? |
3 Inference (3): Causal inference
Causal inference: Every-day notion of causality \(\rightarrow\) formalized through potential outcomes framework (Rubin 1974, ~2012)
- \(\delta_{i} =\) \(Y_{i}({\color{red}{1}}) - Y_{i}({\color{blue}{0}})\), e.g., \(\delta_{Sofia}\) \(= \text{Life satisfaction}_{Sofia}({\color{red}{Unemployed}}) - \text{Life satisfaction}_{Sofia}({\color{blue}{Employed}})\)
- FPCI (Holland 1986): Either observe \(Y_{i}({\color{red}{1}})\) or \(Y_{i}({\color{blue}{0}})\) … missing data problem!
- Usual focus on average treatment effect: \(ATE = E[Y_{i}(1) - Y_{i}(0)]\) (or ATT)
Designs, methods & models (with examples from my own research)
- experiments (Bauer et al. 2019, Bauer & Clemm 2021, Bauer et al. 2021, Bauer & Poama 2020), matching, instrumental variables (Bauer & Fatke 2014), regression discontinuity design, difference-in-differences, fixed-effects model (Bauer 2015, 2019), etc. (e.g., Gangl 2010 for overview)
Potential outcomes & identification revolution (Imai 2011):
- Statistical inference: Models + statistical assumptions \(\rightarrow\) Causal inference: Models + statistical assumptions + identification assumptions
4 Inference (3): Missing data perspective
\(\text{Unit i} \quad\) | \(Name \quad\) | \(X1_{i}^{Age} \quad\) | \(X2_{i}^{Educ.} \quad\) | \(D_{i}^{Unempl.} \quad\) | \(Y_{i}^{Lifesat.} \quad\) | \(Y_{i}({\color{blue}{0}})\quad\) | \(Y_{i}({\color{red}{1}})\quad\) |
---|---|---|---|---|---|---|---|
1 | Sofia | 29 | 1 | \({\color{red}{1}}\) | 3 | ? | 3 |
2 | Sara | 30 | 2 | \({\color{red}{1}}\) | 2 | ? | 2 |
3 | José | 28 | 0 | \({\color{blue}{0}}\) | 5 | 5 | ? |
4 | Yiwei | 27 | 2 | \({\color{red}{1}}\) | ? | ? | ? |
5 | Julia | 25 | 0 | \({\color{blue}{0}}\) | 6 | 6 | ? |
6 | Hans | 23 | 0 | \({\color{red}{1}}\) | ? | ? | ? |
.. | .. | .. | .. | .. | .. | .. | .. |
1000 | Hugo | 23 | 1 | \({\color{blue}{0}}\) | 8 | 8 | ? |
- Data perspective: Both causal inference and machine learning are about missing data!
- Causal inference perspective
- Replace (predict) Sofia’s (and others’) missing potential outcome(s) on variable \(\text{Life satisfaction}\) with other people’s observed outcomes!
- Prediction/ML perspective
- Train model to predict missing observations on variable \(\text{Life satisfaction}\) (see “?”s)
8 Timeline of statistical learning (James et al. 2013, 6–7)
- Beginning of the 19th century: Legendre and Gauss - method of least squares
- Earliest form linear regression [Astronomy, quantitative output values
- 1936: Fisher - Linear Discriminant Analysis
- 1940s: various authors - Logistic Regression
- 1970: Nelder and Wedderburn - Generalized Linear Models (GLM) of which linear and logistic regression are special cases
- By end of the 1970s: Many more techniques available but almost exclusively linear methods
- Fitting non-linear relationships was computationally infeasible at the time
- By the 1980s: Better computing technology facility non-linear methods
- Mid 1980s: Breiman, Friedman, Olshen and Stone - Classification and Regression Trees
- practical implementation including cross-validation for model selection
- 1986: Hastie/Tibshirani coin term “generalized additive models” for a class of non-linear extensions to generalized linear models (+ practical software implementation)
- Since then statistical learning has emerged as a new subfield!
References
Abadie, Alberto, Susan Athey, Guido W Imbens, and Jeffrey M Wooldridge. 2020. “Sampling-Based Versus Design-Based Uncertainty in Regression Analysis.” Econometrica 88 (1): 265–96.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.
Salganik, Matthew J, Ian Lundberg, Alexander T Kindel, Caitlin E Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M Altschul, et al. 2020. “Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration.” Proc. Natl. Acad. Sci. U. S. A. 117 (15): 8398–8403.