The Art of Data Science

7.2 General Framework

We can apply the basic epicycle of analysis to the formal modeling portion of data analysis. We still want to set expectations, collect information, and refine our expectations based on the data. In this setting, these three phases look as follows.

Setting expectations. Setting expectations comes in the form of developing a primary model that represents your best sense of what provides the answer to your question. This model is chosen based on whatever information you have currently available.
Collecting Information. Once the primary model is set, we will want to create a set of secondary models that challenge the primary model in some way. We will discuss examples of what this means below.
Revising expectations. If our secondary models are successful in challenging our primary model and put the primary model’s conclusions in some doubt, then we may need to adjust or modify the primary model to better reflect what we have learned from the secondary models.

7.2.1 Primary model

It’s often useful to start with a primary model. This model will likely be derived from any exploratory analyses that you have already conducted and will serve as the lead candidate for something that succinctly summarizes your results and matches your expectations. It’s important to realize that at any given moment in a data analysis, the primary model is not necessarily the final model. It is simply the model against which you will compare other secondary models. The process of comparing your model to other secondary models is often referred to as sensitivity analyses, because you are interested in seeing how sensitive your model is to changes, such as adding or deleting predictors or removing outliers in the data.

Through the iterative process of formal modeling, you may decide that a different model is better suited as the primary model. This is okay, and is all part of the process of setting expectations, collecting information, and refining expectations based on the data.

7.2.2 Secondary models

Once you have decided on a primary model, you will then typically develop a series of secondary models. The purpose of these models is to test the legitimacy and robustness of your primary model and potentially generate evidence against your primary model. If the secondary models are successful in generating evidence that refutes the conclusions of your primary model, then you may need to revisit the primary model and whether its conclusions are still reasonable.