Chapter 14 Applications
This chapter marks a new part (and course): Applications.
In this part, we use what we have learned so far to solve interesting problems. In fact, as the previous chapters essentially provided an introduction to R and exploratory data analysis, the term data science only starts to make sense here.
Note that concepts like modeling and Machine Learning are popular, ubiquitous, and often used interchangeably. We will try to clarify and disentangle them by asking more specific questions like
What types of representations are being used?
What task(s) are being solved?
How is performance evaluated?
Topics in this part
The topics addressed in this part include:
- Essentials of R:
- Objects and data structures
- Programming
- Modeling:
- Simulations (e.g., solving Bayesian puzzles, cognitive illusions)
- Benchmarking (e.g., dynamic environments, bandits, RTA)
- Prediction (e.g., binary, categorical, decision trees, vs. LR, mLR, logR)
- Learning (e.g., foraging, MAB, RL)
- Social networks and games (e.g., rock-paper-scissors, tic-tac-toe, …)
- Visualization:
- Defining and using colors (defining colors and color palettes)
- Tailoring visualizations to tasks (e.g., Bayesian situations, risk perception)
- R pour l’art (visualizing randomness, plotting text)
Mastering any of these topics and skills requires dedicated practice. The first parts also contain regular exercises that deepen and practice the skills conveyed in each chapter. However, students will also have ample time for working on a project of their choice (towards the end of the course).
14.0.1 Resources
Some pointers to resources for inspirations and ideas:
On models, modeling, and simulations
- All models are right, most are useless: Andrew Gelman’s StatModeling blog, (2012-03-04)
Modeling in R
Later chapters in the Modern Data Science with R book (Baumer et al., 2021)
Chapters 22 to 25 of the r4ds book (Wickham & Grolemund, 2017).
Applied predictive modeling book by M. Kuhn & Johnson (2013).
Related resources include:The R packages at tidymodels.org provide a collection of tools for modeling and machine learning in accordance with tidyverse principles. The emerging book Tidy Modeling with R (by Max Kuhn and Julia Silge) shows how to use them.
Visualization
- Hannah Yan Han medium page and associated GitHub project
Machine Learning
The term data science is often used as a fancy name for statistical learning or machine learning:
- An introduction to statistical learning (ISLR) book by James et al. (2013).
Related resources include:- StatLearning.com
- R package ISLR
- Data school videos on statistical learning
100 days of ML code: in Python, but great infographics
Other collections and sources of inspiration
Learning Machines discusses engaging topics of data science that can be great inspirations for larger modeling or simulation projects.