Customer Lifetime Value Analysis Part 1

Bob Rietveld

25-10-2019

The customer is the main source of revenue for most companies, but not all customers are created equal. Some provide more value than others. Recent research (McCarthy and Winer 2019McCarthy, Daniel M, and Russell S Winer. 2019. “The Pareto rule in marketing revisited: is it 80/20 or 70/20?” Marketing Letters 30 (2): 139–50.)(Kim, Singh, and Winer 2017Kim, Baek Jung, Vishal Singh, and Russell S Winer. 2017. “The Pareto rule for frequently purchased packaged goods: an empirical generalization.” Marketing Letters 28 (4): 491–507.) in consumer and non consumer goods finds that on average 20% of the customer provide 70% of the revenues.

To grow a company therefore must understand the development of the customer base, who will my 20% of most valuable customers in the future?

Understanding your current and future customer base if therefore essential when designing strategies for growth. Customer lifetime value is a (if not the) critical metric to understand about customers. Customer lifetime value is net worth of future transactions of a customer relationship (Fader, Hardie, and Lee 2005Fader, P S, B G S Hardie, and K L Lee. 2005. “RFM and CLV: Using iso-value curves for customer base analysis.” JMR, Journal of Marketing Research. journals.ama.org.)

To understand how the CLV metric can be calculated and what type of insights can be derived from CLV, we provide a short case study based on a real data set. In the remain sections of this post we will discuss the dataset, our CLV model and some prelimnary findings.

Our modeling objective

The main problem CLV models try to solve is estimating if / how many future transactions a customer will make. For our setting ( = non contractual/continous), we do not know in advance if customers are still a live (in contrast to for example a subscription based model like Hello Fresh) and we also do not know in advance how much customers will spend per visit. We use the purchase history of indivual customers to model their expected number of future transactions, average spend per transaction and probability of being alive.

In the graph we visually represent the modeling task, using historic purchase data we try to predict the number of future transactions per customer. To do so we split the data in half and only use the first (yellow) subset of the data to train our model. Later we can compare our predictions with the actual transactions ( = black) and verify how our model performs.

The dataset

To see how CLV can be used to define a “playbook” of tactics we are going to try to predict the future value (in dollars) of each individual customer.

To do so, we will use the a dataset of grocey transactions provided by https://www.8451.com/. The dataset contains the following data.

Pareto Visualization

To get a better understanding of the data it’s always a good idea to do some exploratory data analysis. Below we show the Pareto Ratio of our data, as well as the distribution of frequency of transactions and the distribution of sales.

By deviding the sales of the top 20% customers by the total sales, we obtain the pareto ratio for this data set which is 0.55. This measure indicates the concentration of sales, 20% is responsible for 55% of the revenue. Compared to industry standards this is lower, but still it means some customer generate up to a third (0.35) more value.

If we look at the distribution of sales and transactions across consumers we see a familiar pattern reflecting the pareto ratio.

Our Model

Based on these properties, we use a flexible model (MBG/CNBD-k) where we can incorporate : customer’s intertransaction times, a flexible with purchase rate and each customer can become inactive (for good) with a constant dropout probability. Using this model allows for a number of assumptions (different purchase, drop-out rates) which closely reflect the properties of our grocery dataset.

We use 50% of the data (the first six months) to train the model. The remaining 50% of the data is used to validate the predictions the model makes. Using this 50-50 split allows us to see to what degree we can trust the predictions.

In the graph describes the cumulative (total) transactions per week. The yellow line denotes the actual number of transactions per week, the black line denotes the predictions from our model. Our model follows the general direction but seems to underpredict the number of transactions.

In the grapg below we see the actual vs the predicted number of transactions in the period we use to train the model. On average the model does a fairly good job.

In order to take a closer look we can see the differences between the model and reality in the graph below. The graph depict the net difference (acutal - expected) on a weekly basis. We can see the model is getting better at predicting the number of transactions.

Using this data we can compute a number of accuracy measures to denote the performance of the model. On average our model is of by 74 ( = Mean Average Error ) transactions a week, percentage wise this amounts to about 2.78% (= Mean Average Percetange Error).

Using this information going forward we can be fairly confident that our predictions have merit and can inform downstream decisions on segmentation, marketing tactics etc.

CLV distribution

We uses a time window of 2 years, so the estimated transactions and CLV represent a forecast for the next two years. Apart from the transactions the model provides the following information per individual customer:

Below is the distribution of the number of transactions and CLV. We can see the long tail shape across the CLV metric. This confirms that indeed some customers are more valuable than others.

Now that we have a model for the future value of customer we implement data driven marketing tactics. We provide a number of examples below.

We now also can segment our customers based on their CLV score. In our case we select 4 bins representing high to low CLV customer segments, this is an arbitrary number you can choose how to devide the CLV segments based on your data.

To see what the added value of CLV analysis is, we compute the correlation between the sales rank and clv rank. We rank customers from 1 to n based on their total known sales and the estimated CLV. If the correlation between these two ranks is 1 than doing an CLV analysis is irrelevant as one can just use the sum of sales to identify important customers.

We compute the (Kendall) correlation per CLV segment as this may provide some additional insight in where our analysis is most valuable. The results indicate that for most segment a reasonable correlation exitsts, but there is a difference. For especially the Low CLV segment there appears to be a descrepancy between total sales and CLV. This indicates the analysis is usefull in identifying customer in the Low CLV segment which either are likely to churn or some customers may spend more in the future than their past might suggest.

CLV Segment Correlation
Medium 0.88
Low 0.68
Premium 0.88
High 0.87

Churn Probability

Part of the analysis is a probability of being alive at the end of the calibration period. The histogram reveals that most of the customer will either be alive or have churned. A small number of customere however 86 has a probility of being alive between 0.25 and 0.8. The potential churners represent an aggregated CLV of 78578.72.

Cohort Analysis

A popular wat to see how the business is evolving is using cohort analysis. Customer are grouped together by a variable, often the aqcuisition date is used. By plotting the number of customer per CLV segment and acquisition date, we can visualize if the business is attracting high value customers. Using this visualization it becomes clear that the number of premuim and high valued customers are not being targetted.

Future analysis

In a next post we will explore our CLV segments in more detail to assess their basket size, product preference, demographic profile. Further more we can assess the effectiveness of the retailer strategy in terms of promotion and coupons.