Chapter 1 Introduction

Two developments over the past few years have created great opportunities to use analytics to support and enhance decision making in areas such as business, health care and medicine, government, and not-for-profit organizations but Four trends have converged to create emerging disciplines with labels such as data science or analytics. The four trends are:

  1. An explosion in the amount, variety, and velocity of data.
  2. Data as a driver of strategy. Greater management sophistication (and expectations) regarding the use of data to support decision making.
  3. Better, faster, and cheaper hardware and software.

In recent years terms such as business analytics, predictive analytics, data mining, machine learning, big data, and others have are used to describe various aspects of dealing with data.1 The variety of terms At first many of these terms were thought to represent a rebranding of the discipline of statistics. But statistics is only part of the story.

It is important to conceptualize business analytics as a process. It is not a one-shot effort. The business analytics process can be represented as a series of steps. These steps are conducted sequentially, but there almost always involves iteration, since the results of one step may require that previous steps are re-examined and perhaps redefined. There are several process frameworks that have been published including CRISP (Cross Industry Standard Process for Data Mining) (“The Crisp-DM User Guide,” n.d.), SEMMA (Sample, Explore, Modify, Model and Assess) (“SEMMA from SAS,” n.d.), KDD (“KDD and Data Mining,” n.d.) (Knowledge Discovery in Databases), Microsoft’s TDSP (“What Is TDSP?” n.d.) (Team Data Science Process), and others. While the CRISP framework has not been updated since it was developed, it has remained the most popular model according to a KDnuggets survey in 2014 CRISP. [“CRISP-DM, Still the Top Methodology for Analytics, Data Mining, or Data Science Projects” (n.d.)} Many analysts have developed their own custom models, but the CRISP framework remains a viable approach, especially due to its emphasis on business understanding as the first step in a project. (Figure 1.1)

The CRISP model.

Figure 1.1: The CRISP model.

1.1 The analytics process model

This chapter focuses on the process of building analytics models. While exploratory and prescriptive models could be considered, this book is mainly concerned with the prediction. Descriptive models are often created as a prelude to constructing a predictive model, since it is important to describe the data to inform model building. Also, while prescriptive models are becoming more common, it is usually necessary to have a predictive model to inform the prescriptive recommendations. Furthermore, the line between predictive and descriptive models is frequently blurred when it comes to deployment. Deploying a predictive model to production means that the output or outputs of the predictive model become scores used to support decision making. In fact, predictive models alone, without deployment of some form, are usually considered to be of little value to organizations.

Despite the widespread use of the CRISP model, I propose a revised modeling process. The revised model (Figure 1.2) groups data preparation, exploratory analytics, and feature engineering into a single step. Based on my experience with creating models, it seems more logical to group these tasks since they are closely related and usually involve iteration. I added “communication” to the deployment step, since creating a clear understanding of the model is important to achieving buy-in by management.. Finally, I added a step labeled “Performance monitoring,” since a deployed model should be periodically evaluated to make sure that it is performing as intended. If it is not performing well, the back arrow signals that the process starts anew.

In practice the various steps are likely to be iterative, which is indicated by the dual arrows linking each step. Results from one step may requiring revisiting or revising a previous step or even more than one step. (Figure 1.2)

The process based on modifications of the CRISP model.

Figure 1.2: The process based on modifications of the CRISP model.

References


  1. While there are legitimate distinctions and nuances associated with each of these terms, they are all part of a revolution in the use of data to address all types of problems.↩︎