22.1 Attribution Models

22.1.1 Ordered Shapley

Based on (Zhao, Mahboobi, and Bagheri 2018) (to access paper ) Cooperative game theory: look at the marginal contribution of each player in the game, where Shapley value (i..e, the credit assigned to each individual) is the expected value value of the marginal contribution over all possible permutations (e.g., all possible sequences) of the players.

Shapely value considered:

marginal contribution of each player (i.e., channel)
sequence of joining the coalition (i.e., customer journey).

Typically, we can’t apply the Shapley Value method due to computational burden (you need all possible permutations). And a drawback is that all the credit must be divided among your channels, if you have missing channels, then it will distort the estimates of other channels’ estimates.

It’s hard to use Shapley value model for model comparison since we have no “ground truth”

Marketing application:

Pardot Einstein features and more

library("GameTheory")

packages reference

22.1.2 Markov Model

Markov chains maps the movement and gives a probability distribution, for moving from one state to another state. A Markov Chain has three properties:

State space – set of all the states in which process could potentially exist
Transition operator –the probability of moving from one state to other state
Current state probability distribution – probability distribution of being in any one of the states at the start of the process

In mathematically sense

$w_{ij}= P(X_t = s_j|X_{t-1}=s_i),0 \le w_{ij} \le 1, \sum_{j=1}^N w_{ij} =1 \forall i$

where

The Transition Probability ( $w_{ij}$ ) = The Probability of the Previous State ( $X_{t-1}$ ) Given the Current State ( $X_t$ )
The Transition Probability ( $w_{ij}$ ) is No Less Than 0 and No Greater Than 1
The Sum of the Transition Probabilities Equals 1 (i.e., Everyone Must Go Somewhere)

To examine a particular node in the Markov graph, we use removal effect ( $s_i$ ) to see its contribution to a conversion. In another word, the Removal Effect is the probability of converting when a step is completely removed; all sequences that had to go through that step are now sent directly to the null node

Each node is called transition states
The probability of moving from one channel to another channel is called transition probability.

first-order or “memory-free” Markov graph is called “memory-free” because the probability of reaching one state depends only on the previous state visited.

Order 0: Do not care about where the you came from or what step the you are on, only the probability of going to any state.
Order 1: Looks back zero steps. You are currently at a state. The probability of going anywhere is based on being at that step.
Order 2: Looks back one step. You came from state A and are currently at state B. The probability of going anywhere is based on where you were and where you are.
Order 3: Looks back two steps. You came from state A after state B and are currently at state C. The probability of going anywhere is based on where you were and where you are.
Order 4: Looks back three steps. You came from state A after B after C and are currently at state D. The probability of going anywhere is based on where you were and where you are.

22.1.2.1 Example 1

This section is by Analytics Vidhya

data link

# #Install the libraries
# install.packages("ChannelAttribution")
# install.packages("ggplot2")
# install.packages("reshape")
# install.packages("dplyr")
# install.packages("plyr")
# install.packages("reshape2")
# install.packages("markovchain")
# install.packages("plotly")

#Load the libraries
library("ChannelAttribution")
library("ggplot2")
library("reshape")
library("dplyr")
library("plyr")
library("reshape2")
# library("markovchain")
library("plotly")

#Read the data into R
channel = read.csv("images/Channel_attribution.csv", header = T) %>% 
    select(-c(Output))
head(channel, n = 2)

##   R05A.01 R05A.02 R05A.03 R05A.04 R05A.05 R05A.06 R05A.07 R05A.08 R05A.09
## 1      16       4       3       5      10       8       6       8      13
## 2       2       1       9      10       1       4       3      21      NA
##   R05A.10 R05A.11 R05A.12 R05A.13 R05A.14 R05A.15 R05A.16 R05A.17 R05A.18
## 1      20      21      NA      NA      NA      NA      NA      NA      NA
## 2      NA      NA      NA      NA      NA      NA      NA      NA      NA
##   R05A.19 R05A.20
## 1      NA      NA
## 2      NA      NA

The number represents:

1-19 are various channels
20 – customer has decided which device to buy;
21 – customer has made the final purchase, and;
22 – customer hasn’t decided yet.

Pre-processing

for (row in 1:nrow(channel)){
    if (21 %in% channel[row,]){
        channel$convert = 1
    }
}

column = colnames(channel)
channel$path = do.call(paste, c(channel, sep = " > "))
head(channel$path)

## [1] "16 > 4 > 3 > 5 > 10 > 8 > 6 > 8 > 13 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"     
## [2] "2 > 1 > 9 > 10 > 1 > 4 > 3 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"     
## [3] "9 > 13 > 20 > 16 > 15 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"
## [4] "8 > 15 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"
## [5] "16 > 9 > 13 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"
## [6] "1 > 11 > 8 > 4 > 9 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"

for(row in 1:nrow(channel)){
  channel$path[row] = strsplit(channel$path[row], " > 21")[[1]][1]
}
channel_fin = channel[,c(22,21)]
channel_fin = ddply(channel_fin,~path,summarise, conversion= sum(convert))
head(channel_fin)

##                           path conversion
## 1               1 > 1 > 1 > 20          1
## 2              1 > 1 > 12 > 12          1
## 3    1 > 1 > 14 > 13 > 12 > 20          1
## 4      1 > 1 > 3 > 13 > 3 > 20          1
## 5          1 > 1 > 3 > 17 > 17          1
## 6 1 > 1 > 6 > 1 > 12 > 20 > 12          1

Data = channel_fin
head(Data)

##                           path conversion
## 1               1 > 1 > 1 > 20          1
## 2              1 > 1 > 12 > 12          1
## 3    1 > 1 > 14 > 13 > 12 > 20          1
## 4      1 > 1 > 3 > 13 > 3 > 20          1
## 5          1 > 1 > 3 > 17 > 17          1
## 6 1 > 1 > 6 > 1 > 12 > 20 > 12          1

heuristic model

H <- heuristic_models(Data, 'path', 'conversion', var_value='conversion')

## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

##    channel_name first_touch_conversions first_touch_value
## 1             1                     130               130
## 2            20                       0                 0
## 3            12                      75                75
## 4            14                      34                34
## 5            13                     320               320
## 6             3                     168               168
## 7            17                      31                31
## 8             6                      50                50
## 9             8                      56                56
## 10           10                     547               547
## 11           11                      66                66
## 12           16                     111               111
## 13            2                     199               199
## 14            4                     231               231
## 15            7                      26                26
## 16            5                      62                62
## 17            9                     250               250
## 18           15                      22                22
## 19           18                       4                 4
## 20           19                      10                10
##    last_touch_conversions last_touch_value linear_touch_conversions
## 1                      18               18                73.773661
## 2                    1701             1701               473.998171
## 3                      23               23                76.127863
## 4                      25               25                56.335744
## 5                      76               76               204.039552
## 6                      21               21               117.609677
## 7                      47               47                76.583847
## 8                      20               20                54.707124
## 9                      17               17                53.677862
## 10                     42               42               211.822393
## 11                     33               33               107.109048
## 12                     95               95               156.049086
## 13                     18               18                94.111668
## 14                     88               88               250.784033
## 15                     15               15                33.435991
## 16                     23               23                74.900402
## 17                     71               71               194.071690
## 18                     47               47                65.159225
## 19                      2                2                 5.026587
## 20                     10               10                12.676375
##    linear_touch_value
## 1           73.773661
## 2          473.998171
## 3           76.127863
## 4           56.335744
## 5          204.039552
## 6          117.609677
## 7           76.583847
## 8           54.707124
## 9           53.677862
## 10         211.822393
## 11         107.109048
## 12         156.049086
## 13          94.111668
## 14         250.784033
## 15          33.435991
## 16          74.900402
## 17         194.071690
## 18          65.159225
## 19           5.026587
## 20          12.676375

First Touch Conversion: credit is given to the first touch point.
Last Touch Conversion: credit is given to the last touch point.
Linear Touch Conversion: All channels/touch points are given equal credit in the conversion.

Markov model

M <- markov_model(Data, 'path', 'conversion', var_value='conversion', order = 1)

## 
## Number of simulations: 100000 - Convergence reached: 2.05% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (17) is reached: 99.40%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

##    channel_name total_conversion total_conversion_value
## 1             1        82.805970              82.805970
## 2            20       439.582090             439.582090
## 3            12        81.253731              81.253731
## 4            14        64.238806              64.238806
## 5            13       197.791045             197.791045
## 6             3       122.328358             122.328358
## 7            17        86.985075              86.985075
## 8             6        58.985075              58.985075
## 9             8        60.656716              60.656716
## 10           10       209.850746             209.850746
## 11           11       115.402985             115.402985
## 12           16       159.820896             159.820896
## 13            2        97.074627              97.074627
## 14            4       222.149254             222.149254
## 15            7        40.597015              40.597015
## 16            5        80.537313              80.537313
## 17            9       178.865672             178.865672
## 18           15        72.358209              72.358209
## 19           18         6.567164               6.567164
## 20           19        14.149254              14.149254

combine the two models

# Merges the two data frames on the "channel_name" column.
R <- merge(H, M, by='channel_name')

# Select only relevant columns
R1 <- R[, (colnames(R) %in% c('channel_name', 'first_touch_conversions', 'last_touch_conversions', 'linear_touch_conversions', 'total_conversion'))]

# Transforms the dataset into a data frame that ggplot2 can use to plot the outcomes
R1 <- melt(R1, id='channel_name')

# Plot the total conversions
ggplot(R1, aes(channel_name, value, fill = variable)) +
  geom_bar(stat='identity', position='dodge') +
  ggtitle('TOTAL CONVERSIONS') +
  theme(axis.title.x = element_text(vjust = -2)) +
  theme(axis.title.y = element_text(vjust = +2)) +
  theme(title = element_text(size = 16)) +
  theme(plot.title=element_text(size = 20)) +
  ylab("")

and then check the final results.

22.1.2.2 Example 2

Example code by Sergey Bryl’

library(dplyr)
library(reshape2)
library(ggplot2)
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
library(ChannelAttribution)
library(markovchain)
 
##### simple example #####
# creating a data sample
df1 <- data.frame(path = c('c1 > c2 > c3', 'c1', 'c2 > c3'), conv = c(1, 0, 0), conv_null = c(0, 1, 1))
 
# calculating the model
mod1 <- markov_model(df1,
                    var_path = 'path',
                    var_conv = 'conv',
                    var_null = 'conv_null',
                    out_more = TRUE)
 
# extracting the results of attribution
df_res1 <- mod1$result
 
# extracting a transition matrix
df_trans1 <- mod1$transition_matrix
df_trans1 <- dcast(df_trans1, channel_from ~ channel_to, value.var = 'transition_probability')
 
### plotting the Markov graph ###
df_trans <- mod1$transition_matrix
 
# adding dummies in order to plot the graph
df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'),
                       channel_to = c('(start)', '(conversion)', '(null)'),
                       transition_probability = c(0, 1, 1))
df_trans <- rbind(df_trans, df_dummy)
 
# ordering channels
df_trans$channel_from <- factor(df_trans$channel_from,levels = c('(start)','(conversion)', '(null)', 'c1', 'c2', 'c3'))
df_trans$channel_to <- factor(df_trans$channel_to,levels = c('(start)', '(conversion)', '(null)', 'c1', 'c2', 'c3'))
df_trans <- dcast(df_trans, channel_from ~ channel_to, value.var ='transition_probability')
 
# creating the markovchain object
trans_matrix <- matrix(data = as.matrix(df_trans[, -1]),nrow = nrow(df_trans[, -1]), ncol = ncol(df_trans[, -1]),dimnames = list(c(as.character(df_trans[,1])),c(colnames(df_trans[, -1]))))
trans_matrix[is.na(trans_matrix)] <- 0
# trans_matrix1 <- new("markovchain", transitionMatrix = trans_matrix)
# 
# # plotting the graph
# plot(trans_matrix1, edge.arrow.size = 0.35)

# simulating the "real" data
set.seed(354)
df2 <- data.frame(client_id = sample(c(1:1000), 5000, replace = TRUE),
                  date = sample(c(1:32), 5000, replace = TRUE),
                  channel = sample(c(0:9), 5000, replace = TRUE,
                                   prob = c(0.1, 0.15, 0.05, 0.07, 0.11, 0.07, 0.13, 0.1, 0.06, 0.16)))
df2$date <- as.Date(df2$date, origin = "2015-01-01")
df2$channel <- paste0('channel_', df2$channel)
 
# aggregating channels to the paths for each customer
df2 <- df2 %>%
        arrange(client_id, date) %>%
        group_by(client_id) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  # assume that all paths were finished with conversion
                  conv = 1,
                  conv_null = 0) %>%
        ungroup()
 
# calculating the models (Markov and heuristics)
mod2 <- markov_model(df2,
                     var_path = 'path',
                     var_conv = 'conv',
                     var_null = 'conv_null',
                     out_more = TRUE)

## 
## Number of simulations: 100000 - Convergence reached: 1.40% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (13) is reached: 95.98%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

# heuristic_models() function doesn't work for me, therefore I used the manual calculations
# instead of:
#h_mod2 <- heuristic_models(df2, var_path = 'path', var_conv = 'conv')
 
df_hm <- df2 %>%
        mutate(channel_name_ft = sub('>.*', '', path),
               channel_name_ft = sub(' ', '', channel_name_ft),
               channel_name_lt = sub('.*>', '', path),
               channel_name_lt = sub(' ', '', channel_name_lt))
# first-touch conversions
df_ft <- df_hm %>%
        group_by(channel_name_ft) %>%
        summarise(first_touch_conversions = sum(conv)) %>%
        ungroup()
# last-touch conversions
df_lt <- df_hm %>%
        group_by(channel_name_lt) %>%
        summarise(last_touch_conversions = sum(conv)) %>%
        ungroup()
 
h_mod2 <- merge(df_ft, df_lt, by.x = 'channel_name_ft', by.y = 'channel_name_lt')
 
# merging all models
all_models <- merge(h_mod2, mod2$result, by.x = 'channel_name_ft', by.y = 'channel_name')
colnames(all_models)[c(1, 4)] <- c('channel_name', 'attrib_model_conversions')

library("RColorBrewer")
library("ggthemes")
library("ggrepel")
############## visualizations ##############
# transition matrix heatmap for "real" data
df_plot_trans <- mod2$transition_matrix
 
cols <- c("#e7f0fa", "#c9e2f6", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e", "#eec73a",
          "#e29421", "#e29421", "#f05336", "#ce472e")
t <- max(df_plot_trans$transition_probability)
 
ggplot(df_plot_trans, aes(y = channel_from, x = channel_to, fill = transition_probability)) +
        theme_minimal() +
        geom_tile(colour = "white", width = .9, height = .9) +
        scale_fill_gradientn(colours = cols, limits = c(0, t),
                             breaks = seq(0, t, by = t/4),
                             labels = c("0", round(t/4*1, 2), round(t/4*2, 2), round(t/4*3, 2), round(t/4*4, 2)),
                             guide = guide_colourbar(ticks = T, nbin = 50, barheight = .5, label = T, barwidth = 10)) +
        geom_text(aes(label = round(transition_probability, 2)), fontface = "bold", size = 4) +
        theme(legend.position = 'bottom',
              legend.direction = "horizontal",
              panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8),
              axis.title.x = element_text(size = 24, face = "bold"),
              axis.title.y = element_text(size = 24, face = "bold"),
              axis.text.y = element_text(size = 8, face = "bold", color = 'black'),
              axis.text.x = element_text(size = 8, angle = 90, hjust = 0.5, vjust = 0.5, face = "plain")) +
        ggtitle("Transition matrix heatmap")

# models comparison
all_mod_plot <- reshape2::melt(all_models, id.vars = 'channel_name', variable.name = 'conv_type')
all_mod_plot$value <- round(all_mod_plot$value)
# slope chart
pal <- colorRampPalette(brewer.pal(10, "Set1"))
ggplot(all_mod_plot, aes(x = conv_type, y = value, group = channel_name)) +
        theme_solarized(base_size = 18, base_family = "", light = TRUE) +
        scale_color_manual(values = pal(10)) +
        scale_fill_manual(values = pal(10)) +
        geom_line(aes(color = channel_name), size = 2.5, alpha = 0.8) +
        geom_point(aes(color = channel_name), size = 5) +
        geom_label_repel(aes(label = paste0(channel_name, ': ', value), fill = factor(channel_name)),
                         alpha = 0.7,
                         fontface = 'bold', color = 'white', size = 5,
                         box.padding = unit(0.25, 'lines'), point.padding = unit(0.5, 'lines'),
                         max.iter = 100) +
        theme(legend.position = 'none',
              legend.title = element_text(size = 16, color = 'black'),
              legend.text = element_text(size = 16, vjust = 2, color = 'black'),
              plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8),
              axis.title.x = element_text(size = 24, face = "bold"),
              axis.title.y = element_text(size = 16, face = "bold"),
              axis.text.x = element_text(size = 16, face = "bold", color = 'black'),
              axis.text.y = element_blank(),
              axis.ticks.x = element_blank(),
              axis.ticks.y = element_blank(),
              panel.border = element_blank(),
              panel.grid.major = element_line(colour = "grey", linetype = "dotted"),
              panel.grid.minor = element_blank(),
              strip.text = element_text(size = 16, hjust = 0.5, vjust = 0.5, face = "bold", color = 'black'),
              strip.background = element_rect(fill = "#f0b35f")) +
        labs(x = 'Model', y = 'Conversions') +
        ggtitle('Models comparison') +
        guides(colour = guide_legend(override.aes = list(size = 4)))

Additional concerns:

library(tidyverse)
library(reshape2)
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
library(ChannelAttribution)
# library(markovchain)
library(visNetwork)
library(expm)
library(stringr)
library(purrr)
library(purrrlyr)
 
 
##### simulating the "real" data #####
set.seed(454)
df_raw <- data.frame(customer_id = paste0('id', sample(c(1:20000), replace = TRUE)), date = as.Date(rbeta(80000, 0.7, 10) * 100, origin = "2016-01-01"), channel = paste0('channel_', sample(c(0:7), 80000, replace = TRUE, prob = c(0.2, 0.12, 0.03, 0.07, 0.15, 0.25, 0.1, 0.08))) ) %>%
        group_by(customer_id) %>%
        mutate(conversion = sample(c(0, 1), n(), prob = c(0.975, 0.025), replace = TRUE)) %>%
        ungroup() %>%
        dmap_at(c(1, 3), as.character) %>%
        arrange(customer_id, date)
 
df_raw <- df_raw %>%
        mutate(channel = ifelse(channel == 'channel_2', NA, channel))
head(df_raw, n = 2)

## # A tibble: 2 × 4
##   customer_id date       channel   conversion
##   <chr>       <date>     <chr>          <dbl>
## 1 id1         2016-01-02 channel_7          0
## 2 id1         2016-01-09 channel_4          0

22.1.2.2.1 1. Customers will be at different stage of purchase journey after each conversion.

First-time buyer’s journey will look different from n-times buyer’s (e.g., he will not start at website )

You can create your own code to split data into customers in different stages.

##### splitting paths #####
df_paths <- df_raw %>%
        group_by(customer_id) %>%
        mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>% # add the path's serial number by using the lagged cumulative sum of conversion binary marks
        ungroup()
head(df_paths)

## # A tibble: 6 × 5
##   customer_id date       channel   conversion path_no
##   <chr>       <date>     <chr>          <dbl>   <dbl>
## 1 id1         2016-01-02 channel_7          0       1
## 2 id1         2016-01-09 channel_4          0       1
## 3 id1         2016-01-18 channel_5          1       1
## 4 id1         2016-01-20 channel_4          1       2
## 5 id100       2016-01-01 channel_0          0       1
## 6 id100       2016-01-01 channel_0          0       1

attribution path for first-time buyers:

df_paths_1 <- df_paths %>%
        filter(path_no == 1) %>%
        select(-path_no)

22.1.2.2.2 2. Handle missing data

We might have missing data on the channel or do not want to attribute a path (e.g., Direct Channel). We can either

Remove NA/Channel
Use the previous channel in its place.

In the first-order Markov chains, the results are unchanged because duplicated channels don’t affect the calculation.

##### replace some channels #####
df_path_1_clean <- df_paths_1 %>%
        # removing NAs
        filter(!is.na(channel)) %>%
         
        # adding order of channels in the path
        group_by(customer_id) %>%
        mutate(ord = c(1:n()),
               is_non_direct = ifelse(channel == 'channel_6', 0, 1),
               is_non_direct_cum = cumsum(is_non_direct)) %>%
         
        # removing Direct (channel_6) when it is the first in the path
        filter(is_non_direct_cum != 0) %>%
         
        # replacing Direct (channel_6) with the previous touch point
        mutate(channel = ifelse(channel == 'channel_6', channel[which(channel != 'channel_6')][is_non_direct_cum], channel)) %>%
         
        ungroup() %>%
        select(-ord, -is_non_direct, -is_non_direct_cum)

22.1.2.2.3 3. one vs. multi-channel paths

We need to calculate the weighted importance for each channel because the sum of the Removal Effects doesn’t equal to 1. In case we have a path with a unique channel, the Removal Effect and importance of this channel for that exact path is 1. However, weighting with other multi-channel paths will decrease the importance of one-channel occurrences. That means that, in case we have a channel that occurs in one-channel paths, usually it will be underestimated if attributed with multi-channel paths.

There is also a pretty straight logic behind splitting – for one-channel paths, we definitely know the channel that brought a conversion and we don’t need to distribute that value into other channels.

To account for one-channel path:

Split data for paths with one or more unique channels
Calculate total conversions for one-channel paths and compute the Markov model for multi-channel paths
Summarize results for each channel.

##### one- and multi-channel paths #####
df_path_1_clean <- df_path_1_clean %>%
        group_by(customer_id) %>%
        mutate(uniq_channel_tag = ifelse(length(unique(channel)) == 1, TRUE, FALSE)) %>%
        ungroup()
 
df_path_1_clean_uniq <- df_path_1_clean %>%
        filter(uniq_channel_tag == TRUE) %>%
        select(-uniq_channel_tag)
 
df_path_1_clean_multi <- df_path_1_clean %>%
        filter(uniq_channel_tag == FALSE) %>%
        select(-uniq_channel_tag)
 
### experiment ###
# attribution model for all paths
df_all_paths <- df_path_1_clean %>%
        group_by(customer_id) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  conversion = sum(conversion)) %>%
        ungroup() %>%
        filter(conversion == 1)
 
mod_attrib <- markov_model(df_all_paths,
                           var_path = 'path',
                           var_conv = 'conversion',
                           out_more = TRUE)

## 
## Number of simulations: 100000 - Convergence reached: 1.28% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (19) is reached: 99.92%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

mod_attrib$removal_effects

##   channel_name removal_effects
## 1    channel_7       0.2812250
## 2    channel_4       0.4284428
## 3    channel_5       0.6056845
## 4    channel_0       0.5367294
## 5    channel_1       0.3820056
## 6    channel_3       0.2535028

mod_attrib$result

##   channel_name total_conversions
## 1    channel_7          192.8653
## 2    channel_4          293.8279
## 3    channel_5          415.3811
## 4    channel_0          368.0913
## 5    channel_1          261.9811
## 6    channel_3          173.8533

d_all <- data.frame(mod_attrib$result)
 
# attribution model for splitted multi and unique channel paths
df_multi_paths <- df_path_1_clean_multi %>%
        group_by(customer_id) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  conversion = sum(conversion)) %>%
        ungroup() %>%
        filter(conversion == 1)
 
mod_attrib_alt <- markov_model(df_multi_paths,
                           var_path = 'path',
                           var_conv = 'conversion',
                           out_more = TRUE)

## 
## Number of simulations: 100000 - Convergence reached: 1.21% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (19) is reached: 99.59%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

mod_attrib_alt$removal_effects

##   channel_name removal_effects
## 1    channel_7       0.3265696
## 2    channel_4       0.4844802
## 3    channel_5       0.6526369
## 4    channel_0       0.5814164
## 5    channel_1       0.4343546
## 6    channel_3       0.2898041

mod_attrib_alt$result

##   channel_name total_conversions
## 1    channel_7          150.9460
## 2    channel_4          223.9350
## 3    channel_5          301.6599
## 4    channel_0          268.7406
## 5    channel_1          200.7661
## 6    channel_3          133.9524

# adding unique paths
df_uniq_paths <- df_path_1_clean_uniq %>%
        filter(conversion == 1) %>%
        group_by(channel) %>%
        summarise(conversions = sum(conversion)) %>%
        ungroup()
 
d_multi <- data.frame(mod_attrib_alt$result)
 
d_split <- full_join(d_multi, df_uniq_paths, by = c('channel_name' = 'channel')) %>%
        mutate(result = total_conversions + conversions)
 
sum(d_all$total_conversions)

## [1] 1706

sum(d_split$result)

## [1] 1706

22.1.2.2.4 4. Higher Order Markov Chains

Since the transition matrix stays the same in the first order Markov, having duplicates will not affect the result. But starting from the second order order Markov, you will have different results when skipping duplicates. In order to check the effect of skipping duplicates in the first-order Markov chain, we will use my script for “manual” calculation because the package skips duplicates automatically.

##### Higher order of Markov chains and consequent duplicated channels in the path #####
 
# computing transition matrix - 'manual' way
df_multi_paths_m <- df_multi_paths %>%
        mutate(path = paste0('(start) > ', path, ' > (conversion)'))
m <- max(str_count(df_multi_paths_m$path, '>')) + 1 # maximum path length
 
df_multi_paths_cols <- reshape2::colsplit(string = df_multi_paths_m$path, pattern = ' > ', names = c(1:m))
colnames(df_multi_paths_cols) <- paste0('ord_', c(1:m))
df_multi_paths_cols[df_multi_paths_cols == ''] <- NA
 
df_res <- vector('list', ncol(df_multi_paths_cols) - 1)
 
for (i in c(1:(ncol(df_multi_paths_cols) - 1))) {
         
        df_cache <- df_multi_paths_cols %>%
                select(num_range("ord_", c(i, i+1))) %>%
                na.omit() %>%
                group_by_(.dot = c(paste0("ord_", c(i, i+1)))) %>%
                summarise(n = n()) %>%
                ungroup()
         
        colnames(df_cache)[c(1, 2)] <- c('channel_from', 'channel_to')
        df_res[[i]] <- df_cache
}
 
df_res <- do.call('rbind', df_res)
 
df_res_tot <- df_res %>%
        group_by(channel_from, channel_to) %>%
        summarise(n = sum(n)) %>%
        ungroup() %>%
        group_by(channel_from) %>%
        mutate(tot_n = sum(n),
               perc = n / tot_n) %>%
        ungroup()
 
df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'),
                       channel_to = c('(start)', '(conversion)', '(null)'),
                       n = c(0, 0, 0),
                       tot_n = c(0, 0, 0),
                       perc = c(0, 1, 1))
 
df_res_tot <- rbind(df_res_tot, df_dummy)
 
# comparing transition matrices
trans_matrix_prob_m <- dcast(df_res_tot, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum)
trans_matrix_prob <- data.frame(mod_attrib_alt$transition_matrix)
trans_matrix_prob <- dcast(trans_matrix_prob, channel_from ~ channel_to, value.var = 'transition_probability')
 
# computing attribution - 'manual' way
channels_list <- df_path_1_clean_multi %>%
        filter(conversion == 1) %>%
        distinct(channel)
channels_list <- c(channels_list$channel)
 
df_res_ini <- df_res_tot %>% select(channel_from, channel_to)
df_attrib <- vector('list', length(channels_list))
 
for (i in c(1:length(channels_list))) {
         
        channel <- channels_list[i]
         
        df_res1 <- df_res %>%
                mutate(channel_from = ifelse(channel_from == channel, NA, channel_from),
                       channel_to = ifelse(channel_to == channel, '(null)', channel_to)) %>%
                na.omit()
         
        df_res_tot1 <- df_res1 %>%
                group_by(channel_from, channel_to) %>%
                summarise(n = sum(n)) %>%
                ungroup() %>%
                 
                group_by(channel_from) %>%
                mutate(tot_n = sum(n),
                       perc = n / tot_n) %>%
                ungroup()
         
        df_res_tot1 <- rbind(df_res_tot1, df_dummy) # adding (start), (conversion) and (null) states
         
        df_res_tot1 <- left_join(df_res_ini, df_res_tot1, by = c('channel_from', 'channel_to'))
        df_res_tot1[is.na(df_res_tot1)] <- 0
         
        df_trans1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum)
         
        trans_matrix_1 <- df_trans1
        rownames(trans_matrix_1) <- trans_matrix_1$channel_from
        trans_matrix_1 <- as.matrix(trans_matrix_1[, -1])
         
        inist_n1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'n', fun.aggregate = sum)
        rownames(inist_n1) <- inist_n1$channel_from
        inist_n1 <- as.matrix(inist_n1[, -1])
        inist_n1[is.na(inist_n1)] <- 0
        inist_n1 <- inist_n1['(start)', ]
         
        res_num1 <- inist_n1 %*% (trans_matrix_1 %^% 100000)
         
        df_cache <- data.frame(channel_name = channel,
                               conversions = as.numeric(res_num1[1, 1]))
         
        df_attrib[[i]] <- df_cache
}
 
df_attrib <- do.call('rbind', df_attrib)
 
# computing removal effect and results
tot_conv <- sum(df_multi_paths_m$conversion)
 
df_attrib <- df_attrib %>%
        mutate(tot_conversions = sum(df_multi_paths_m$conversion),
               impact = (tot_conversions - conversions) / tot_conversions,
               tot_impact = sum(impact),
               weighted_impact = impact / tot_impact,
               attrib_model_conversions = round(tot_conversions * weighted_impact)
        ) %>%
        select(channel_name, attrib_model_conversions)

Since with different transition matrices, the removal effects and attribution results stay the same, in practice we skip duplicates.

22.1.2.2.5 5. Non-conversion paths

We incorporate null paths in this analysis.

##### Generic Probabilistic Model #####
df_all_paths_compl <- df_path_1_clean %>%
        group_by(customer_id) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  conversion = sum(conversion)) %>%
        ungroup() %>%
        mutate(null_conversion = ifelse(conversion == 1, 0, 1))
 
mod_attrib_complete <- markov_model(
        df_all_paths_compl,
        var_path = 'path',
        var_conv = 'conversion',
        var_null = 'null_conversion',
        out_more = TRUE
)

## 
## Number of simulations: 100000 - Convergence reached: 4.05% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (27) is reached: 99.91%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

trans_matrix_prob <- mod_attrib_complete$transition_matrix %>%
        dmap_at(c(1, 2), as.character)
 
##### viz #####
edges <-
        data.frame(
                from = trans_matrix_prob$channel_from,
                to = trans_matrix_prob$channel_to,
                label = round(trans_matrix_prob$transition_probability, 2),
                font.size = trans_matrix_prob$transition_probability * 100,
                width = trans_matrix_prob$transition_probability * 15,
                shadow = TRUE,
                arrows = "to",
                color = list(color = "#95cbee", highlight = "red")
        )
 
nodes <- data_frame(id = c( c(trans_matrix_prob$channel_from), c(trans_matrix_prob$channel_to) )) %>%
        distinct(id) %>%
        arrange(id) %>%
        mutate(
                label = id,
                color = ifelse(
                        label %in% c('(start)', '(conversion)'),
                        '#4ab04a',
                        ifelse(label == '(null)', '#ce472e', '#ffd73e')
                ),
                shadow = TRUE,
                shape = "box"
        )


visNetwork(nodes,
           edges,
           height = "2000px",
           width = "100%",
           main = "Generic Probabilistic model's Transition Matrix") %>%
        visIgraphLayout(randomSeed = 123) %>%
        visNodes(size = 5) %>%
        visOptions(highlightNearest = TRUE)

##### modeling states and conversions #####
# transition matrix preprocessing
trans_matrix_complete <- mod_attrib_complete$transition_matrix
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy %>%
                                       mutate(transition_probability = perc) %>%
                                       select(channel_from, channel_to, transition_probability))
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to, levels = c(levels(trans_matrix_complete$channel_from)))
trans_matrix_complete <- dcast(trans_matrix_complete, channel_from ~ channel_to, value.var = 'transition_probability')
trans_matrix_complete[is.na(trans_matrix_complete)] <- 0
rownames(trans_matrix_complete) <- trans_matrix_complete$channel_from
trans_matrix_complete <- as.matrix(trans_matrix_complete[, -1])
 
 
# creating empty matrix for modeling
model_mtrx <- matrix(data = 0,
                     nrow = nrow(trans_matrix_complete), ncol = 1,
                     dimnames = list(c(rownames(trans_matrix_complete)), '(start)'))
# adding modeling number of visits
model_mtrx['channel_5', ] <- 1000
 
c(model_mtrx) %*% (trans_matrix_complete %^% 5) # after 5 steps
c(model_mtrx) %*% (trans_matrix_complete %^% 100000) # after 100000 steps

22.1.2.2.6 6. Customer Journey Duration

##### Customer journey duration #####
# computing time lapses from the first contact to conversion/last contact
df_multi_paths_tl <- df_path_1_clean_multi %>%
        group_by(customer_id) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  first_touch_date = min(date),
                  last_touch_date = max(date),
                  tot_time_lapse = round(as.numeric(last_touch_date - first_touch_date)),
                  conversion = sum(conversion)) %>%
        ungroup()
 
# distribution plot
ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
        theme_minimal() +
        geom_histogram(fill = '#4e79a7', binwidth = 1)

# cumulative distribution plot
ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
        theme_minimal() +
        stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) +
        geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) +
        geom_vline(xintercept = 23, color = '#e15759', size = 1.5, linetype = 2)

### for generic probabilistic model ###
df_multi_paths_tl_1 <- reshape2::melt(df_multi_paths_tl[c(1:50), ] %>% select(customer_id, first_touch_date, last_touch_date, conversion),
                    id.vars = c('customer_id', 'conversion'),
                    value.name = 'touch_date') %>%
        arrange(customer_id)
rep_date <- as.Date('2016-01-10', format = '%Y-%m-%d')
 
ggplot(df_multi_paths_tl_1, aes(x = as.factor(customer_id), y = touch_date, color = factor(conversion), group = customer_id)) +
        theme_minimal() +
        coord_flip() +
        geom_point(size = 2) +
        geom_line(size = 0.5, color = 'darkgrey') +
        geom_hline(yintercept = as.numeric(rep_date), color = '#e15759', size = 2) +
        geom_rect(xmin = -Inf, xmax = Inf, ymin = as.numeric(rep_date), ymax = Inf, alpha = 0.01, color = 'white', fill = 'white') +
        theme(legend.position = 'bottom',
              panel.border = element_blank(),
              panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              axis.ticks.x = element_blank(),
              axis.ticks.y = element_blank()) +
        guides(colour = guide_legend(override.aes = list(size = 5)))

df_multi_paths_tl_2 <- df_path_1_clean_multi %>%
        group_by(customer_id) %>%
        mutate(prev_touch_date = lag(date)) %>%
        ungroup() %>%
        filter(conversion == 1) %>%
        mutate(prev_time_lapse = round(as.numeric(date - prev_touch_date)))
         
# distribution
ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) +
        theme_minimal() +
        geom_histogram(fill = '#4e79a7', binwidth = 1)

# cumulative distribution
ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) +
        theme_minimal() +
        stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) +
        geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) +
        geom_vline(xintercept = 12, color = '#e15759', size = 1.5, linetype = 2)

In conclusion, we say that if a customer made contact with a marketing channel the first time for more than 23 days and/or hasn’t made contact with a marketing channel for the last 12 days, then it is a fruitless path.

# extracting data for generic model
df_multi_paths_tl_3 <- df_path_1_clean_multi %>%
        group_by(customer_id) %>%
        mutate(prev_time_lapse = round(as.numeric(date - lag(date)))) %>%
        summarise(path = paste(channel, collapse = ' > '),
                  tot_time_lapse = round(as.numeric(max(date) - min(date))),
                  prev_touch_tl = prev_time_lapse[which(max(date) == date)],
                  conversion = sum(conversion)) %>%
        ungroup() %>%
        mutate(is_fruitless = ifelse(conversion == 0 & tot_time_lapse > 20 & prev_touch_tl > 10, TRUE, FALSE)) %>%
        filter(conversion == 1 | is_fruitless == TRUE)

22.1.2.2.7 7. Channel Comparisons

We can use markov_model with var_value to compare the gross margin among channels.

22.1.2.3 Example 3

This example is from Bounteus

# Install these libraries (only do this once)
# install.packages("ChannelAttribution")
# install.packages("reshape")
# install.packages("ggplot2")

# Load these libraries (every time you start RStudio)
library(ChannelAttribution)
library(reshape)
library(ggplot2)

# This loads the demo data. You can load your own data by importing a dataset or reading in a file
data(PathData)

Path Variable – The steps a user takes across sessions to comprise the sequences.
Conversion Variable – How many times a user converted.
Value Variable – The monetary value of each marketing channel.
Null Variable – How many times a user exited.

Build the simple heuristic models (First Click / first_touch, Last Click / last_touch, and Linear Attribution / linear_touch):

H <- heuristic_models(Data, 'path', 'total_conversions', var_value='total_conversion_value')

## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

Markov model

M <- markov_model(Data, 'path', 'total_conversions', var_value='total_conversion_value', order = 1)

## 
## Number of simulations: 100000 - Convergence reached: 1.46% < 5.00%
## 
## Percentage of simulated paths that successfully end before maximum number of steps (46) is reached: 99.99%
## 
## [1] "*** Looking to run more advanced attribution? Try ChannelAttribution Pro for free! Visit https://channelattribution.io/product"

# Merges the two data frames on the "channel_name" column.
R <- merge(H, M, by='channel_name') 

# Selects only relevant columns
R1 <- R[, (colnames(R)%in%c('channel_name', 'first_touch_conversions', 'last_touch_conversions', 'linear_touch_conversions', 'total_conversion'))]

# Renames the columns
colnames(R1) <- c('channel_name', 'first_touch', 'last_touch', 'linear_touch', 'markov_model') 

# Transforms the dataset into a data frame that ggplot2 can use to graph the outcomes
R1 <- melt(R1, id='channel_name')

Plot the total conversions

ggplot(R1, aes(channel_name, value, fill = variable)) +
  geom_bar(stat='identity', position='dodge') +
  ggtitle('TOTAL CONVERSIONS') + 
  theme(axis.title.x = element_text(vjust = -2)) +
  theme(axis.title.y = element_text(vjust = +2)) +
  theme(title = element_text(size = 16)) +
  theme(plot.title=element_text(size = 20)) +
  ylab("")

The “Total Conversions” bar chart shows you how many conversions were attributed to each channel (i.e. alpha, beta, etc.) for each method (i.e. first_touch, last_touch, etc.).

R2 <- R[, (colnames(R)%in%c('channel_name', 'first_touch_value', 'last_touch_value', 'linear_touch_value', 'total_conversion_value'))]

colnames(R2) <- c('channel_name', 'first_touch', 'last_touch', 'linear_touch', 'markov_model')

R2 <- melt(R2, id='channel_name')

ggplot(R2, aes(channel_name, value, fill = variable)) +
  geom_bar(stat='identity', position='dodge') +
  ggtitle('TOTAL VALUE') + 
  theme(axis.title.x = element_text(vjust = -2)) +
  theme(axis.title.y = element_text(vjust = +2)) +
  theme(title = element_text(size = 16)) +
  theme(plot.title=element_text(size = 20)) +
  ylab("")

The “Total Conversion Value” bar chart shows you monetary value that can be attributed to each channel from a conversion.

References

Zhao, Kaifeng, Seyed Hanif Mahboobi, and Saeed R Bagheri. 2018. “Revenue-Based Attribution Modeling for Online Advertising.” International Journal of Market Research 61 (2): 195–209. https://doi.org/10.1177/1470785318774447.