15.12 The universal workflow of machine learning
- Defining the problem and assemble a dataset (Chollet and Allaire 2018, Ch. 4.5)
- Define the problem at hand and the data on which we’ll train. Collect this data, potentially annotate with labels (supervised learning)
- Choose a measure of success
- Choose how we’ll measure success on your problem. Which metrics will you monitor on our validation data?
- Deciding on an evaluation protocol
- Determine our evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should we use for validation?
- Prepare our data
- Developing a model that does better than a baseline (baseline?)
- Scaling up: developing a model that overfits
- Regularize our model and tune our hyperparameters based on performance on the validation data
- A lot of machine-learning research tends to focus only on this step-but keep the big picture in mind
References
Chollet, Francois, and J J Allaire. 2018. Deep Learning with R. 1st ed. Manning Publications.