Chapter 2 Business understanding and problem definition

Before collecting data or thinking about which technique to use, it is critical to understand the business problem. While it may seem obvious that the business problem should be clearly stated, it is frequently reported that the biggest reason for failures of analytics projects is a poor understanding of the problem. It perhaps goes without saying that in order to define a problem that will result in adding value to an organization, it must be preceded by a good understanding the business.

The elements of this phase of the analytics process include understanding the business, identifying the stakeholders, recognizing the type of problem, and finally framing the problem.
There is a temptation to take a request from a client or manager without careful thought and to go directly to gathering and analyzing data. If this path is taken, it is very likely that the most important business question will not be answered. This causes wasted time and resources.

The benefits of a careful effort to understand the business and define the problem include:

  1. The work of the analyst becomes more efficient because fewer dead-ends are encountered. Data collection will be more efficient, with more likely focus on getting the right data and ignoring variables that are not relevant. Rather than just getting all the data that can be found, a more directed search for variables can occur. This also provides stronger justification for asking the client for the addiontal data.

  2. Second, business analytics is about adding value. The problem should be well structured so that the results can be implemented and that they will add value. With the right problem structure getting a useful solution is more likely. Scattershot model building is likely to lead to poor predictions and, importantly, little guidance as to what is wrong and how to improve the model.

  3. Third, by having a clearly defined problem that can be communicated to everyone involved, two traps of providing information that is already known or information that is so unusual as to be unbelievable are avoided.

2.1 Expert views

Business understanding and defining the problem are among the most important tasks in analytics projects. These quotes from a variety of experts attest to the importance of this phase of the analytics process.

“The difference between great and mediocre data science is not about math or engineering: it is about asking the right question(s)…. No amount of technical competence or statistical rigor can make up for having solved a useless problem.” (Cady 2017)

Einstein is quoted as having said that if he had one hour to save the world he would spend fifty-five minutes defining the problem and only five minutes finding the solution.

“The most serious mistakes are not being made as a result of wrong answers. The truly dangerous thing is asking the wrong questions.”
Peter Drucker

“Successful analytic teams spend more time understanding the business problem and less time wading through lakes of data.” (Taylor 2017)

The following case study demonstrates the importance of delving deeper into a situation and not simply focusing on the approach asked for by the client.2

One of the common applications of analytics to is help predict and understand churn for telecommunications providers. Subscribers to cell phone plans are notorious for cancelling contracts and signing up with another carrier. People churn because they are unhappy with service, or they want a new phone, or they get a good price from another carrier. There are all sorts of reasons that people leave. This can be very expensive for the provider. It is usually far cheaper for the provider to take steps to keep a customer than to go find a new one.

A large cell phone carrier in the United States called in a consultant to help with this problem. The carrier was experiencing considerable churn and asked the business analyst to find a way to predict who was most likely to leave the company and to determine what might be done to prevent that customer from leaving. This seemed like a reasonable request and one that could be addressed with analytics. The cell phone company had lots of data on its customers and a history of those customers that stayed and those that left. The consultant decided to look into the problem before digging into the data.

What the consultant found was that the churn was occurring right around the time that the two-year contract was up. He asked managers in the company for a detailed explanation of how customers were contacted, how they were asked to sign up for a new contract, and what sort of communications were being sent to the customers. It turned out that that about two months prior to a customer’s contract being up, the company sent a reminder letter saying that the contract was coming up for renewal and asking that the customer sign up for another two years. This same letter was sent to all customers.

Imagine you are a consumer with a cell phone company and you get a letter from the company saying that your contract ends in 60 days. What behavior does this trigger? Well, for many customers, it got them thinking about shopping around. It turned out that the very efforts to get customers to sign up for another contract were actually motivating many of them to leave the company. Many people don’t know when their contracts are up, and if left alone, they would continue with the company for years without switching. The company’s marketing department was triggering the churn. These letters were stopped and the churn went down.

If the consultant had just gone directly to analyzing data, he might have totally missed this root cause of the problem. In fact he would be solving the wrong problem. The point is management wanted a predictive model for churn. The consultant could have done this, but the real solution would not be found. Understanding the real problem is therefore extremely important.

2.2 Understanding the business

At the start, the goal of a project can be stated in vague terms. It is important to meet with stakeholders (those with an interest in the results of the project) to refine the objectives into operational terms and to obtain buy-in from those stakeholders. Instead, those individuals, the stakeholders, involved with the project should be considered. Stakeholders can include IT, executives in the customer department (marketing, accounting, finance, etc.), those who may be expected to use the resulting model, and those who might be affected by the project.

The following tasks are needed during the business understanding phase (adapted from the CRISP model ):

  1. Determine business objectives; that is, what is the basic goal of the project? In business situations it might be to gain new customers, increase the loyalty of current customers, or to reduce lost due to fraud.
  2. Translate the business objectives into the technical metrics that can result from the analysis. For example, if customer loyalty is the problem, the goal of the analysis could be to identify those customers likely to stop using the company’s products or services. This objective might be further refined by stating a needed lead time prior to customer loss so that remedial action could be taken.
  3. Develop a project plan – list the stages, required resources, risks, time duration, contingencies, and evaluation metrics.

2.3 Identifying stakeholders

Stakeholders – the people in the organization who care about the problem - can be part of the solution, can potentially derail the solution, can provide the resources needed to arrive at a solution, or can be in a position to take action on the results.

Some questions to ask about stakeholders:

  • Which executives have a stake in the outcome of the analysis?
  • Have these executives been briefed on the problem and the solution approach?
  • Can the key executives provide the necessary resources to implement recommendations?
  • Do the key executives support analytic approaches to making decisions? Are there people with vested interests in a particular solution?
  • Is there a plan for regular communication and feedback with interim results and progress? Do key stakeholders have certain styles of using information and making decisions?3 (Based on (Davenport and Kim 2013).)

2.4 Structured versus unstructured problems

The types of problems that could be be addressed with analytics are wide-ranging:

  • How can the Red Cross increase blood donations?
  • Should a consumer products company increase ad spending for a certain brand or lower the price to stimulate sales?
  • How do you increase the number of people who agree to be organ donors?
  • How can a university increase its student retention rates?
  • How much of a discount should a resort offer for booking tickets online 6 months early?
  • Should a hotel offer a “surprise discount” on rooms at the time of check-in or should they offer it as people are leaving?
  • How can a casino increase the lifetime value of a customer?
  • Should a particular person be approved for a loan?
  • Is this credit card transaction fraudulent?

While these may seem to be very specific questions, attempting to structure the question to guide an analytics project would likely reveal that the questions are not well structured.

Many challenging analytics problems are unstructured and difficult. Professionals are rewarded for their ability to solve difficult problems, not simply following rote procedures or relaying memorized information. When faced with an unstructured problem, it is even more important to work on carefully defining it. The following, based on (Jonassen 1997), shows some of the characteristics which distinguish well-structured versus unstructured problems. (Figure2.1)

Structured versus unstructured problems.

Figure 2.1: Structured versus unstructured problems.

2.5 Framing the problem

The term “framing” means the process of describing and interpreting a situation: framing focuses attention. The problem frame helps define the importance of a problem and also sets the direction for solving it. Depending upon how a problem or question is framed can lead to very different answers. For example, these two questions ask about the number 10 in different ways which will lead to different answers:

  • What is the sum of 5 plus 5?
  • What two numbers add up to 10?

When a tentative analytics problem is identified, different frames will lead to different analytics approaches. For example, an insurance company may want to reduce the cost of fraudulent claims. Analytics could be used to address this problem, but depending on how the problem is framed will lead to different target variables and probably different analytic techniques. Consider the following different goals associated with reducing fraud:

  • Identify cases that had the highest propensity of fraud?
  • Identify cases where there is the highest likelihood to recover monies?
  • Identify case with largest potential dollar amount of fraud?
  • Identify cases that will maximize the hourly return of investigators assigned to deal with fraudulent cases?

The importance of problem framing is illustrated in this old story (Shoemaker and Russo 2001).

There is an old story about a Franciscan priest and a Jesuit priest who were both heavy smokers and somewhat troubled by their human fragility, especially about smoking when praying to the Lord.

The Franciscan met with his prefect and asked, “Father, would it be permitted to smoke while praying to the Lord?” The answer was a resounding, “No!”

The Jesuit also sought counsel, but asked his question differently. “Father, when in moments of weakness I smoke, would it be permitted to say a prayer to the Lord?” The answer was, “Yes, of course my son.”

In some cases, the first attempt at a problem statement focuses on symptoms, which may not address the root problem. For example, a symptom might be: “We are losing market share rapidly.” The problem definition might then be: “How do we regain market share?” (Hauch 2018) But this may not direct the analyst team to the real issue.

One suggestion to get at the root of a problem through framing is to use “The 40-20-10-5 rule.” For all those problems that are difficult to define quickly, you can apply this rule that goes through 4 basic steps. State your problem in 40 words. Cut it down to 20, then to 10 and end up with a 5 words problem statement. If you can not keep it simple, probably you have not reached the roots of it yet.

Another data scientist summarized key considerations to guide a team through a problem framing exercise. (Arnuld 2020)

  1. See if you can solve the problem without using ML. Sometimes a simple heuristic is good enough.
  2. Be clear about what you want and state it without using ML. 3. Write your desired outcome in simple English
  3. What do you expect the model to output?
  4. What kind of ML problem you are solving e.g., if supervised then what kind: classification or regression. If classification then what kind? binary or multi-class? etc.
  5. What is a successful model in this case?
  6. What is a failed model in this case? Write all the cases. “Not being able to succeed” is not the only failure case.
  7. Quantify your success. How will you measure it in your model?
  8. Try to keep the model simple. A simple model’s result can justify if you need a complex model. Complex models are slower to train and difficult to understand.
  9. Build a simple model and deploy it. The biggest gain from ML tends to be the first launch. You can improve later and launch another version

Example of solving a simpler problem

Often, engineers make things more complicated than necessary. Consider the example of the NASA space pen. During the 1960s, NASA focused a major program on developing a pen that would write in zero gravity, while the Soviet space program used the much simpler and already-invented pencil.

I don’t know for sure because I wasn’t there, but I strongly suspect this came about because of a miss in the problem statement. The Americans were working on this problem: How do we get a pen to write in zero gravity? The Soviets were working on a different problem: How do we write in zero gravity? It seems like such a small, subtle difference, but it had a huge impact on what work followed. (Flinchbaugh 2009)

2.6 Summary

Preparing an effective, actionable definition of the problem prior to machine learning is universally recognized as being important. However, specifying exactly how to create such as problem definition cannot be codified in a simple process. As with many complex concepts, it is likely that “you will know it when you see it.” But, getting to that point requires creativity and hard work.

Appendix: Some tools for problem definition

Because of the critical importance of problem definition, there are many tools and strategies that have been developed to get started. A few of these are listed here.

Right to left thinking

One approach that can be very helpful is to turn the business objective into a decision to be made. For example, if the problem is that a company needs more customers, this could lead to a goal of getting new customers or working on keeping current customers. These goals will lead to very different data needs and analyses. If the goal is keeping current customers, then a model that predicts which specific customers are likely to leave can lead to actions to reduce the probability of those customers leaving. If on the other hand, the goal is to get new customers, then the model can be used to indicate which potential individuals are most likely to respond favorably to an offer.

Reversing the problem

Clearly identify the problem or challenge, and write it down. Brainstorm the reverse problem to generate reverse solution ideas. Allow the brainstorm ideas to flow freely. Do not reject anything at this stage.

Instead of asking, “How do I solve or prevent this problem?” ask, “How could I possibly cause the problem?” Instead of asking “How do I achieve these results?” ask, “How could I possibly achieve the opposite effect?”

Once you have brainstormed all the ideas to solve the reverse problem, now reverse these into solution ideas for the original problem or challenge. Evaluate these solution ideas to determine if a potential solution is suggested or at least the attributes of a potential solution? (DeRusha and WOlfson, n.d.)

Open the problem with “whys”

Consider the following:

If I asked you to build a bridge for me, you could go off and build a bridge. Or you could come back to me with another question: “Why do you need a bridge?” I would likely tell you that I need a bridge to get to the other side of a river. Aha! This response opens up the frame of possible solutions. There are clearly many ways to get across a river besides using a bridge. You could dig a tunnel, take a ferry, paddle a canoe, use a zip line, or fly a hot-air balloon, to name a few.

You can open the frame even farther by asking why I want to get to the other side of the river. Imagine that I told you that I work on the other side. This, again, provides valuable information and broadens the range of possible solutions even more. There are probably viable ways for me to earn a living without ever going across the river. Source: (Seelig 2013)

Challenge assumptions

Every problem — no matter how apparently simple it may be — comes with a long list of assumptions attached. Many of these assumptions may be inaccurate and could make your problem statement inadequate or even misguided.

The first step to get rid of bad assumptions is to make them explicit. Write a list and expose as many assumptions as you can — especially those that may seem the most obvious and ‘untouchable.’ That, in itself, brings more clarity to the problem at hand. Essentially, you need to learn how to think like a philosopher.

But go further and test each assumption for validity: think in ways that they might not be valid and their consequences. What you will find may surprise you: that many of those bad assumptions are self-imposed — with just a bit of scrutiny you are able to safely drop them. Be a skeptic. (Clissold 2021)

Example of challenging assumptions

Consider the interesting example of the mathematician Abraham Wald and the British Air Ministry. During World War II, the British Air Ministry engaged Mr. Wald to study the damage suffered by airplanes engaged in combat missions and use those results to recommend appropriate places for armor reinforcement. All returning planes were assessed for damage and the data was collected and analyzed. Patterns of damage soon became apparent, and the officers of the Royal Air Force concluded that the planes should be reinforced based on those patterns. (Ellenberg 2016)

Mr. Wald took a different approach. He reasoned that the damaged planes which made it back weren’t the real problem. Additional armor was needed most on planes which didn’t make it back. While he didn’t know for certain where the damage was on the lost planes, he could reason that it was different from that of the planes which safely returned. His task was to see the problem which wasn’t easily observed.


  • Chunk up
    • Chunking up is about taking a broader view. Helicopter up to 30,000 feet. Survey the landscape to see the whole system. Ask ‘Why’ things happen to find higher-level purpose. Ask ‘what is this an instance of’ to find a more general classification. Use inductive reasoning to go from specific detail to general theories and explanations.
  • Chunk down
    • Chunking down is about going into detail to find smaller and more specific elements of the system. Ask ‘How’ things happen to find lower-level detail. Ask ‘What, specifically’ to probe for more information. Ask ‘Give me an example’ to get specific instances of a class. Use deductive reasoning to go from general theories and ideas to specific cases and instances.
  • Chunk up and down
    • Chunking up and down go well together as a way of looking differently at the same situation. Chunk up from the existing situation to find a general or broader view. Then chunk down somewhere else.


Problem 1 Virtually all data base systems today include capabilities to retrieve records and to reate and manipulate tables efficiently. Queries return data from a database that fulfills specific constraints and criteria set by the user. OLAP uses a process of preaggregation to form data cubes, which enables retrieval more quickly, often with less computational power. Neither database queries nor OLAP, however, are considered data mining tools. Data mining addresses fundamentally different questions by constructing models of the data.

  • Consider the following business questions. Take a few minutes and “twist” each question into an analytic modeling question.
    • Who are my best customers?
    • What geographic areas do sales come from?
    • How long do customers stay with my company?
    • What do my customers buy, specifically?
    • What share of the customer wallet do I have?
    • Which suppliers are most reliable? What are the characteristics of the most reliable suppliers?
    • Which salespeople provide the most profit?
    • What is the perceived quality of my products?
    • Which products are most profitable?

Problem 2 A bank is interested in running a major promotion to increase its share of the new car loan market. Assuming the promotion is successful, the number of applications for new car loans will increase substantially. Therefore, it was suggested that a predictive analytics model be developed to speed up the process of loan acceptance or rejection, with the specific goal of quickly and automatically identifying applicants that are likely to default on the loan.

A database of 5,500 automobile loans is available from the bank’s information system, covering loans made from July of 1999 through June of 2010.

  • Variables for each loan include:
    • Whether or not the loan resulted in default
    • Amount of loan
    • Percentage of loan to value of automobile
    • Age of applicant
    • Gender of applicant
    • Current debt (monthly payments)
    • Monthly household income
    • Years in current residence
    • Years in previous residence
    • Number of dependents
    • Marital status
    • Credit score
    • Date of the loan
    • Duration of the loan in months

What issues or questions would you have regarding this data set and prior to developing a predictive model?


Arnuld. 2020. “Some Key Things i Learned from Google’s " Introduction to Machine Learning Problem Framing" MOOC.”
Cady, Field. 2017. The Data Science Handbook. Hoboken, NJBoca Raton: John Wiley; Sons.
Clissold, Rachel. 2021. “How to Solve a Problem.”
Davenport, Thomas H., and Jinho Kim. 2013. Keeping up with the Quants: Your Guide to Understanding and Using Analytics. Boston: Harvard Business Review Press.
DeRusha, Karen, and Bill WOlfson. n.d. “Shift Your Lens: The Power of Re-Framing Problems.”
Ellenberg, Jordan. 2016. “Abraham Wald and the Missing Bullet Holes.”
Flinchbaugh, Jamie. 2009. “Leading Lean: Solve the Right Problem.”
Hauch, Brian. 2018. “Finding the True North of a Problem: Problem-Framing Principles for Design-Led Innovation.”
Jonassen, David. 1997. “Instructional Design Models for Well-Structured and Ill-Structured Problem-Solving Learning Outcomes.” Educational Technology Research and Development, 65–94.
Seelig, Tina. 2013. “Shift Your Lens: The Power of Re-Framing Problems.”
Shoemaker, Paul J. H., and J. Edward Russo. 2001. “Knitr: Manging Frames to Make Better Decisions.” In Wharton on Making Decisions, edited by Howard C. Kunreuther Hoch Stephen J. and Robert E. Gunther. John Wiley.
Taylor, James. 2017. “Bringing Business Clarity to CRISP-DM.”

  1. This is based on workshop presentation by the TMA consulting group.↩︎

  2. Some stakeholders may want to see the “bottom line” without a lot of elaboration. Others want a clear logical path drawn out. Still others are more persuaded by anecdotes and stores.↩︎