# 12 Customer Lifetime Value (CLV)

also known as Lifetime Value (LTV)

“A year’s projected business gives a number that is normally about half of a customer’s full lifetime value, although that can vary by industry.”

CLV model

$CLV_i = \sum_{j = T+1}^{T+N} \frac{p(Buy_{ij}=1)\times \hat{CM}_{ij}}{(1+r)^{j-T}} - \frac{\hat{MC}_{ij}}{(1-r)^{j-T}}$

where

• $$CLV_i$$ = lifetime value for customer i,
• $$p(Buy_{ij})$$ = predicted probability that customer i will purchase in period j,
• $$CM_{ij}$$ = predicted contribution margin provided by customer i in period j,
• $$MC_{ij}$$ = predicted marketing costs directed toward customer i in period j,
• $$j$$ = index for time periods (semiannual or annual)
• $$T$$ = the end of the calibration or observation time frame (usually 1 year projection),
• $$N$$ = total number of prediction periods,
• $$r$$ = semiannual discount factor (e.g., 0.07238 = 15% annual rate)

## 12.1 Example

This example is subscription customers by Sergey Bryl’ in which he adapted model Fader and Hardie (2007)

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::lag()    masks stats::lag()
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
##     smiths
library(MLmetrics)
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
##     Recall
# retention rate data
df_ret <- data.frame(month_lt = c(0:7),
case01 = c(1, .531, .452, .423, .394, .375, .356, .346),
case02 = c(1, .869, .743, .653, .593, .551, .517, .491),
case03 = c(1, .677, .562, .486, .412, .359, .332, .310),
case04 = c(1, .631, .468, .382, .326, .289, .262, .241)
) %>%
melt(., id.vars = c('month_lt'), variable.name = 'example', value.name = 'retention_rate')

ggplot(df_ret, aes(x = month_lt, y = retention_rate, group = example, color = example)) +
theme_minimal() +
facet_wrap(~ example) +
scale_color_manual(values = c('#4e79a7', '#f28e2b', '#e15759', '#76b7b2')) +
geom_line() +
geom_point() +
theme(plot.title = element_text(size = 20, face = 'bold', vjust = 2, hjust = 0.5),
axis.text.x = element_text(size = 8, hjust = 0.5, vjust = .5, face = 'plain'),
strip.text = element_text(face = 'bold', size = 12)) +
ggtitle('Retention Rate') Prediciton when we only have values on certain months

# functions for sBG distribution
churnBG <- Vectorize(function(alpha, beta, period) {
t1 = alpha / (alpha + beta)
result = t1
if (period > 1) {
result = churnBG(alpha, beta, period - 1) * (beta + period - 2) / (alpha + beta + period - 1)
}
return(result)
}, vectorize.args = c("period"))

survivalBG <- Vectorize(function(alpha, beta, period) {
t1 = 1 - churnBG(alpha, beta, 1)
result = t1
if(period > 1){
result = survivalBG(alpha, beta, period - 1) - churnBG(alpha, beta, period)
}
return(result)
}, vectorize.args = c("period"))

MLL <- function(alphabeta) {
if(length(activeCust) != length(lostCust)) {
stop("Variables activeCust and lostCust have different lengths: ",
length(activeCust), " and ", length(lostCust), ".")
}
t = length(activeCust) # number of periods
alpha = alphabeta
beta = alphabeta
return(-as.numeric(
sum(lostCust * log(churnBG(alpha, beta, 1:t))) +
activeCust[t]*log(survivalBG(alpha, beta, t))
))
}

df_ret <- df_ret %>%
group_by(example) %>%
mutate(activeCust = 1000 * retention_rate,
lostCust = lag(activeCust) - activeCust,
lostCust = ifelse(is.na(lostCust), 0, lostCust)) %>%
ungroup()

ret_preds01 <- vector('list', 7)
for (i in c(1:7)) {

df_ret_filt <- df_ret %>%
filter(between(month_lt, 1, i) == TRUE & example == 'case01')

activeCust <- c(df_ret_filt$activeCust) lostCust <- c(df_ret_filt$lostCust)

opt <- optim(c(1, 1), MLL)
retention_pred <- round(c(1, survivalBG(alpha = opt$par, beta = opt$par, c(1:7))), 3)

df_pred <- data.frame(month_lt = c(0:7),
example = 'case01',
fact_months = i,
retention_pred = retention_pred)
ret_preds01[[i]] <- df_pred
}
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
## Warning in log(survivalBG(alpha, beta, t)): NaNs produced
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
## Warning in log(survivalBG(alpha, beta, t)): NaNs produced
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
## Warning in log(survivalBG(alpha, beta, t)): NaNs produced
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
## Warning in log(survivalBG(alpha, beta, t)): NaNs produced
ret_preds01 <- as.data.frame(do.call('rbind', ret_preds01))

ret_preds02 <- vector('list', 7)
for (i in c(1:7)) {

df_ret_filt <- df_ret %>%
filter(between(month_lt, 1, i) == TRUE & example == 'case02')

activeCust <- c(df_ret_filt$activeCust) lostCust <- c(df_ret_filt$lostCust)

opt <- optim(c(1, 1), MLL)
retention_pred <- round(c(1, survivalBG(alpha = opt$par, beta = opt$par, c(1:7))), 3)

df_pred <- data.frame(month_lt = c(0:7),
example = 'case02',
fact_months = i,
retention_pred = retention_pred)
ret_preds02[[i]] <- df_pred
}
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced

## Warning in log(churnBG(alpha, beta, 1:t)): NaNs produced
ret_preds02 <- as.data.frame(do.call('rbind', ret_preds02))

ret_preds03 <- vector('list', 7)
for (i in c(1:7)) {

df_ret_filt <- df_ret %>%
filter(between(month_lt, 1, i) == TRUE & example == 'case03')

activeCust <- c(df_ret_filt$activeCust) lostCust <- c(df_ret_filt$lostCust)

opt <- optim(c(1, 1), MLL)
retention_pred <- round(c(1, survivalBG(alpha = opt$par, beta = opt$par, c(1:7))), 3)

df_pred <- data.frame(month_lt = c(0:7),
example = 'case03',
fact_months = i,
retention_pred = retention_pred)
ret_preds03[[i]] <- df_pred
}

ret_preds03 <- as.data.frame(do.call('rbind', ret_preds03))

ret_preds04 <- vector('list', 7)
for (i in c(1:7)) {

df_ret_filt <- df_ret %>%
filter(between(month_lt, 1, i) == TRUE & example == 'case04')

activeCust <- c(df_ret_filt$activeCust) lostCust <- c(df_ret_filt$lostCust)

opt <- optim(c(1, 1), MLL)
retention_pred <- round(c(1, survivalBG(alpha = opt$par, beta = opt$par, c(1:7))), 3)

df_pred <- data.frame(month_lt = c(0:7),
example = 'case04',
fact_months = i,
retention_pred = retention_pred)
ret_preds04[[i]] <- df_pred
}

ret_preds04 <- as.data.frame(do.call('rbind', ret_preds04))

ret_preds <- bind_rows(ret_preds01, ret_preds02, ret_preds03, ret_preds04)

df_ret_all <- df_ret %>%
select(month_lt, example, retention_rate) %>%
left_join(., ret_preds, by = c('month_lt', 'example'))

ggplot(df_ret_all, aes(x = month_lt, y = retention_rate, group = example, color = example)) +
theme_minimal() +
facet_wrap(~ example) +
scale_color_manual(values = c('#4e79a7', '#f28e2b', '#e15759', '#76b7b2')) +
geom_line(size = 1.5) +
geom_point(size = 1.5) +
geom_line(aes(y = retention_pred, group = fact_months), alpha = 0.5) +
theme(plot.title = element_text(size = 20, face = 'bold', vjust = 2, hjust = 0.5),
axis.text.x = element_text(size = 8, hjust = 0.5, vjust = .5, face = 'plain'),
strip.text = element_text(face = 'bold', size = 12)) +
ggtitle('Retention Rate Projections') calculate the average LTV for case03 based on two historical months with a forecast horizon of 24 months and a subscription price of $1 ### LTV prediction ### df_ltv_03 <- df_ret %>% filter(between(month_lt, 1, 2) == TRUE & example == 'case03') activeCust <- c(df_ltv_03$activeCust)
lostCust <- c(df_ltv_03$lostCust) opt <- optim(c(1, 1), MLL) retention_pred <- round(c(survivalBG(alpha = opt$par, beta = opt$par, c(3:24))), 3) df_pred <- data.frame(month_lt = c(3:24), retention_pred = retention_pred) df_ltv_03 <- df_ret %>% filter(between(month_lt, 0, 2) == TRUE & example == 'case03') %>% select(month_lt, retention_rate) %>% bind_rows(., df_pred) %>% mutate(retention_rate_calc = ifelse(is.na(retention_rate), retention_pred, retention_rate), ltv_monthly = retention_rate_calc * 1, ltv_cum = round(cumsum(ltv_monthly), 2)) # average LTV of$9.33. actual data for the observed periods (from 0 to 2nd months) and the predicted retention for the future periods (from 3rd to 24th months)

## 12.2 Referral value (CRV)

“Referrals made by customers after a referral-incentive marketing campaign can be attributed to that campaign for about a year.” . Hence, we usually count those referrals made after one year of a campaign.

2 types of referral:

• Type-one referral: Would not join without referral.
• Type-two referral: still join without referral.

estimated that each customer made about half type-one and half type-two referrals.

Referral value = PV(Type-one referrals) + PV(Type-two referrals)

where

• PV(type-two referrals) = the prevent value of the savings in acquisition costs.

$CRV_i = \sum_{t =1}^{T} \sum_{y = 1}^{n1} \frac{A_{ty} - a_{ty} - M_{ty} + ACQ1_{ty}}{(1+r)^t} + \sum_{t=1}^{T} \sum_{y = n1+1}^{n2} \frac{ACQ2_{ty}}{(1+r)^t}$

where

• T = the number of periods that will be predicted into the future (e.g., typically 1 year)
• $$A_{ty}$$ = the gross margin contributed by customer y who other otherwise would not have bought the product,
• $$a_{ty}$$ = the cost of the referral for the customer y (this is what you set up)
• $$n1$$ = the number of customers who would not join without the referral,
• $$n2-n1$$ = the number of customers who would have joined anyway,
• $$M_{ty}$$ = the marketing costs needed to retain the referred customers,
• r = semiannual discount factor ( used .07238 = 15% annual rate)
• $$ACQ1_{ty}$$ = the savings in acquisition costs from customers who would not join without the referral,
• $$ACQ2_{ty}$$ = the savings in acquisition cost from customers who would have joined anyway.

Customers with high CLV does not guarantee high CRV . Hence, proposed the customer value matrix 