36.4 Advertising Response (Effectiveness)
Consumer response to advertising
Key issues
Does advertising work?
When, where, why and for how long?
5 effects of ad exposure
Short
Sleeper
Hysteresis
Long
Instant
Simple model of ad response
\[ S_t = \alpha + \beta A_t + \mu_t \]
- Does not capture the carryover effect
Using (Koyck 1954) model captures carryover
\[ S_t + \alpha + \beta A_t + \beta \lambda A_{t-1} + \dots + \epsilon_t \]
This is a moving average model with an infinite lag that precisely captures carryover effect of advertising
Then, we need the Koyck transformation, lag on period and multiply by \(\lambda\) (carryover effect) (\(0 < \lambda < 1\))
Then
\[ \lambda S_{t-1} = \alpha \lambda + \beta \lambda A_{t-1} + \dots + \epsilon_t \lambda \]
With subtraction,
\[ \begin{aligned} S_t - \lambda S_{t-1} &= \alpha - \alpha \lambda + \beta A_t + \epsilon_t - \epsilon_t \lambda \\ S_t &= \alpha - \alpha \lambda + \lambda S_{t-1} + \beta A_t + \epsilon_t - \epsilon_t \lambda \\ S_t &= \alpha + \lambda S_{t-1} + \beta A_t + u_t \end{aligned} \]
Pros:
An infinite lag series turns to 1 period auto-regressive model
easy to estimate
\(\lambda\) is the carryover or decay in effect of advertising
\(\beta\) = current effect of ad
\(\beta \lambda/ (1- \lambda)\) carryover effect of ad
\(\beta / (1- \lambda)\) = total effect advertising
p% duration interval = \(\log (1-p) / \log \lambda\)
If include a lagged ad term
\[ S_t = \alpha + \lambda S_{t-1} + \beta A_t + \beta_1 A_{t-1} + \mu_t \]
Separate inertia from ad carryover
separate out decay from multiple independent variables
identify shape of decay
(Clarke 1976) found major limitation of Koyck model
Aggregation bias: the larger the data interval: the larger the estimated \(\lambda\), the larger the estimated carryover effect, the longer the estimated duration of ad
People used to think the best data interval time is the inter-purchase time. But (Tellis and Franses 2006) showed that unit exposure time is the optimal data interval (the smallest interval within which advertising occurs only once and at the same time every period)
General Autoregressive distributed Lag Model (ADL, ARMA)
\[ S_t = \alpha + \lambda S_{t-1} + \lambda S_{t-2} + \dots + \beta A_t + \beta A_{t-1} + \dots + \mu_t \]
pros:
rich variety of decay shapes
\(\beta\) affect number and position of bumps
\(\lambda\) affect speed of decay
precursor to Vector Autoregressive model (VAR)
cons:
aggregate data at population level and time cannot identify ad exposure
aggregate time cannot identify treated period
reverse causality: ad set on expected sales
multicollinearity
Major advances in ad response modeling:
Dis-aggregate data
modeling at individual household, consumer
modeling by day, hour
modeling moment-to-moment
modeling exposure (not $)
quasi-experiments
DID
Synthetic control
36.4.1 (Tellis, Chandy, and Thaivanich 2000) Direct TV ad
Study Context
A referral is “a call by a customer for the firm’s service” (p. 33)
Theory of message repetition:
A current effect on behavior
A carryover effect on behavior
A non behavior effect on attitude and memory
Research questions:
Given current brand equity, what is the effect of advertising on referrals?
Ad placement
Creatives
Time period
Age and repetition
Is marginal benefit greater than marginal cost for advertising?
Model
\[ R_t = \alpha + \gamma_1 R_{t-1} + \gamma_2 R_{t-2} + \gamma_3 R_{t-3} + \dots \\ + \beta_0 A_t + \beta_1 A_{t-1} + \beta A_{t-2} + \dots + \epsilon \]
where
\(A\) = advertising
\(R\) = referral
Controls: Opening hour + time of the day.
Expect:
Morning ads have longer decay than other time
Differences in creatives
Transfer function analysis
temporal patterns: auto correlations + partial auto-correlation show patterns at the hourly and weekly level
Lag structure: 3 lags on the dependent, and 4 lags on the independent (advertising)
Why there are lags of the dependent variable:
Algebraic: if didn’t have of the dependent, the independent lag would be infinite
Intuitive: separate the effect of carry over effect of advertising and inertia.
Error patterns:
\[ R_t = \alpha + v(\mathbf{B})A_t +N_t \]
where
\(R_t, A_t\) stationary
\(v(\mathbf{B})\) transfer function of advertising on referrals where \(v(\mathbf{B}) = Cw(B)B^b / \delta(B)\)
\(N_t = [\theta(B) / \phi(B)](1- B)^d a_t\) where \(a_t \sim N(0)\)
Advertising Effects (decay)
Total effects of advertising = sum of ad coefficients divided by (1 - sum of lag-referral coefficients)
\[ \text{Total Effect} = \frac{\sum_{l = 0}^n \beta_l }{(1- \sum_{j=1}^p \lambda_l)} \]
where \(l\) is the index for the time lag
and the partial advertising effect at each time period is
\[ TA_{t-l} = \beta_l A_{t-l} + \sum_{j=0}^l \lambda_j TA_{t-l+j} \]
Results
Advertising effect dissipate after 8 hours
Ad Effectiveness varies by station
Creatives also varies
36.4.2 (Tellis and Franses 2006) Optimal Data Interval for estimating ad response (on sales)
- Such a seminal paper
- This could also be applied to firm optimal interval for estimating announcement effect on stock performance.
Too disaggregate does not lead to disaggregate bias
Optimal interval is unit exposure time (not inter-purchase time)
To get the true estimates, it depends on the unit exposure time (instead of assumption of the advertising process)
Definition:
Term | Definition |
---|---|
Data Interval | temporal level of the records |
Inter purchase time | Smallest calendar time between any two consumer purchases |
Duration Interval | Length of time that advertising effect lasts |
Calendar time | Discrete time period |
Exposure time | Moment a pulse of ad first hits a consumer |
p% duration interval | length of time that accounts for \(p\)% of the advertising effect |
Current effect of ad | portion of the total advertising effect that occurs in the same time period as the exposure |
Duration interval bias | carryover effect estimated at the true interval - estimated on aggregate data |
Optimal interval balances between storage cost and estimate unbiasedness
Koyck model
- \(s_t, a_t\) are sales and ad at the true microdata interval
\[ s_t = \mu + \beta a_t + \beta \lambda a_{t-1} + \beta \lambda^2 a_{t-2} + \dots + \epsilon_t \]
where
\(\epsilon \sim N(0, \sigma^2_\epsilon)\)
\(\beta\) = current effect of advertising
\(\beta/(1- \lambda)\) = carryover effect
\(\lambda\) determines the duration interval (what do we call this term)
Using (Koyck 1954) transformation (i.e., multiply both sides by \(1 - \lambda L\) where \(L\) is the familiar lag operator \(L^k y_t = y_{t-k}\)) then
\[ s_t = \lambda s_{t-1} + \beta a_t + \epsilon_t - \lambda \epsilon_{t-1} \]
For aggregate data, denote \(S_T\) as the aggregate sales series from aggregating sales in the \(K\) periods from the current to the \(K-1\) prior period that are sampled at the current period
\[ \begin{aligned} S_T &= s_t + s_{t-1}+ s_{t-2}+ \dots + s_{t-(K-1)} \\ & = (1 + L + L^2 + \dots + L^{K-1})s_t \end{aligned} \]
Hence,
\[ A_T = (1 + L + L^2 + \dots + L^{K-1}) a_t \\ \epsilon_T = (1 + L + L^2 + \dots + L^{K-1}) \epsilon_t \\ S_{T-1} = (1 + L + L^2 + \dots + L^{K-1}) s_{t-K} \]
The true aggregate form of the micromodel
\[ S_T = \lambda^K S_{T-1} + \beta A_T + \beta \lambda (1 + \lambda L + \lambda^2 L^2 + \dots + \lambda^{K-1} L^{K-1}) \\ \times (1 + L + \dots + L_{K-1})a_{t-1} + \epsilon_T - \lambda^K \epsilon_{T-1} \]
The bias stem from the fact that
\[ A_{T-1} \neq (1 + \lambda L + \lambda^2 L^2 + \dots + \lambda^{K-1} L^{K-1}) \\ \times (1 + L + \dots + L_{K-1})a_{t-1} \]
because it was lost in aggregation
With optimal data interval (1 exposure pulse per interval), we can recover the carryover effect
\[ \frac{\beta_1 + \beta_2}{1 - \lambda^K} \]
and the true duration interval is
\[ \sqrt[K]{\hat{\lambda}^K} \]
the the current effect is \(\beta\)
When we have even more dis aggregate data than the optimal interval, we just have to adjust the formula to recover the true effects.
36.4.3 (T. S. Teixeira, Wedel, and Pieters 2010) Ad Pulsing to prevent consumer ad avoidance
Model: probit with MCMC
Data: eye-tracking on 31 commercials for 2000 participants.
New metric to predict attention dispersion based on eye-tracking data.
Optimization of ads:
problem: minimize avoidance subject to a given level of brand activity level
Solution: Pulsing
36.4.4 (Sethuraman, Tellis, and Briesch 2011) Advertising effectiveness meta-analysis
Data: 1960 - 2008, 56 studies.
Average short-term ad elasticity is .12
a decline in the advertising elasticity over time.
advertising elasticity is higher
for durable goods (vs. nondurables)
in the early stage than the mature stage of the life cycle
yearly data than quarterly data
ad is measured in gross rating points than monetary terms
Long-term ad elasticity is .24
36.4.5 (Liaukonyte, Teixeira, and Wilbur 2015) TV advertising on online shopping
Impression merging process: human coders
Data: $3.4 bil spending by 20 brands, consists of traffic and transactions and content measures for 1,2224 commercials.
Dif-n-dif: 2 mins pre/post windows of time. (similar to regression discontinuity)
Action-focus content increases direct website traffic and sales conditional on visitation
Info and emotion-focus content reduce web traffic while increases purchases, and positive net effect on sales for most brands.
Imagery-focus ad content decreases direct traffic to the website
After the tv ad
consumer choose whether to visit the website
consumer then determine whether to buy a product
Data:
Online traffic: comScore Media Metrix
Direct traffic
Search engine referrals
Transaction Count
TV Ad Data: Kantar Media
Argument for no endogeneity problem is that brands can’t manipulate the exact time the ad will air. (since hte ad will be placed in a 15-min window while the research design looks at the 4 minutes windows). For the case that the authors look at the 2-hour window, they use the dif-n-dif design where they pick the largest brands within each product category that did not advertise
36.4.6 (Tirunillai and Tellis 2017) TV ad on Online chatter: synthetic control
Raw metrics
- Reviews: from Amazon, Epinions, cnet, twitter, YouTube, Facebook
Volume of reviews
valence of the review (positive vs. negative)
Polarity (entropy)
- Blogs: from Spinn3r
Volume
In-degree (links) of the brand website
In-degree (links) of blog posts
Volume of blogs that gain/lose rank
Using Dynamic factor analysis
\[ Y_t = \xi f_t + \epsilon_t \\ f_t = \Psi f_{t-1} + \eta_t \]
where
\(Y_t\) raw measure of reviews and blogs
\(f_t\) is the underlying factors
\(\xi\) is the factors loadings
\(\epsilon\) idiosyncratic error
\(\eta\) = white noise where \(E(\epsilon_t \eta'_{t-k})=0\)
Dimension of chatter (using dynamic factor analysis)
Content-based dimensions:
Popularity: loads on volume of reviews and blogs
Negativity: loads positively on positive valence and polarity and negatively on positive valence
Information spread dimensions:
Visibility: loads on the volume of blogs and the in-degree links of the brand website
Virality: loads on volume of blogs that gained rank and in-degree of the blogs
TV ad causally increases a short positive effect on online chatter (info-spread > content-based)
Ad can reduce the negativity in online chatter in the short-term.
Ad can
simulate conversation online
trigger brand recall
Interpreting experience: give more favorable assessment toward the brand
Refute negatives: greater credibility and persuasiveness
Empirical Setting: A campaign: Let’s Do Amazing (ad duration). 20 days after the campaign date)>
Method:
Synthetic control (synthetic brand): the difference might already account for the spillover effect of the focal brands on other brands in the same industry (authors argue that there was no spillover effect).
No justification for 70 days before and 20 days after
To make sure YouTube did not affect much, the authors use data from Visible Measures to assess viewership, and TV viewership from https://tvlistings.zap2it.com/?aid=gapzap and Nielsen TV Ratings and Stradegy (need to ask about this company).
Authors also use Vector Auto-regressive model to examine the short-term and long-term dynamics between the dependent (chatter metrics) and independent variables (advertising).