15.12 The universal workflow of machine learning

  1. Defining the problem and assemble a dataset (Chollet and Allaire 2018, Ch. 4.5)
    • Define the problem at hand and the data on which we’ll train. Collect this data, potentially annotate with labels (supervised learning)
  2. Choose a measure of success
    • Choose how we’ll measure success on your problem. Which metrics will you monitor on our validation data?
  3. Deciding on an evaluation protocol
    • Determine our evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should we use for validation?
  4. Prepare our data
  5. Developing a model that does better than a baseline (baseline?)
  6. Scaling up: developing a model that overfits
  7. Regularize our model and tune our hyperparameters based on performance on the validation data
    • A lot of machine-learning research tends to focus only on this step-but keep the big picture in mind

References

Chollet, Francois, and J J Allaire. 2018. Deep Learning with R. 1st ed. Manning Publications.