36.4 Advertising Response (Effectiveness)

Consumer response to advertising

Key issues

Does advertising work?
When, where, why and for how long?

5 effects of ad exposure

Short
Sleeper
Hysteresis
Long
Instant

Simple model of ad response

\[ S_t = \alpha + \beta A_t + \mu_t \]

Does not capture the carryover effect

Using (Koyck 1954) model captures carryover

\[ S_t + \alpha + \beta A_t + \beta \lambda A_{t-1} + \dots + \epsilon_t \]

This is a moving average model with an infinite lag that precisely captures carryover effect of advertising

Then, we need the Koyck transformation, lag on period and multiply by $\lambda$ (carryover effect) ($0 < \lambda < 1$)

Then

\[ \lambda S_{t-1} = \alpha \lambda + \beta \lambda A_{t-1} + \dots + \epsilon_t \lambda \]

With subtraction,

\[ \begin{aligned} S_t - \lambda S_{t-1} &= \alpha - \alpha \lambda + \beta A_t + \epsilon_t - \epsilon_t \lambda \\ S_t &= \alpha - \alpha \lambda + \lambda S_{t-1} + \beta A_t + \epsilon_t - \epsilon_t \lambda \\ S_t &= \alpha + \lambda S_{t-1} + \beta A_t + u_t \end{aligned} \]

Pros:

An infinite lag series turns to 1 period auto-regressive model
easy to estimate
$\lambda$ is the carryover or decay in effect of advertising

$\beta$ = current effect of ad

$\beta \lambda/ (1- \lambda)$ carryover effect of ad

$\beta / (1- \lambda)$ = total effect advertising

p% duration interval = $\log (1-p) / \log \lambda$

If include a lagged ad term

\[ S_t = \alpha + \lambda S_{t-1} + \beta A_t + \beta_1 A_{t-1} + \mu_t \]

Separate inertia from ad carryover
separate out decay from multiple independent variables
identify shape of decay

(Clarke 1976) found major limitation of Koyck model

Aggregation bias: the larger the data interval: the larger the estimated $\lambda$, the larger the estimated carryover effect, the longer the estimated duration of ad
People used to think the best data interval time is the inter-purchase time. But (Tellis and Franses 2006) showed that unit exposure time is the optimal data interval (the smallest interval within which advertising occurs only once and at the same time every period)

General Autoregressive distributed Lag Model (ADL, ARMA)

\[ S_t = \alpha + \lambda S_{t-1} + \lambda S_{t-2} + \dots + \beta A_t + \beta A_{t-1} + \dots + \mu_t \]

pros:

rich variety of decay shapes
- $\beta$ affect number and position of bumps
- $\lambda$ affect speed of decay
precursor to Vector Autoregressive model (VAR)

cons:

aggregate data at population level and time cannot identify ad exposure
aggregate time cannot identify treated period
reverse causality: ad set on expected sales
multicollinearity

Major advances in ad response modeling:

Dis-aggregate data
- modeling at individual household, consumer
- modeling by day, hour
- modeling moment-to-moment
- modeling exposure (not $)
quasi-experiments
- DID
- Synthetic control

36.4.1 (Tellis, Chandy, and Thaivanich 2000) Direct TV ad

Study Context

A referral is “a call by a customer for the firm’s service” (p. 33)
Theory of message repetition:
- A current effect on behavior
- A carryover effect on behavior
- A non behavior effect on attitude and memory
Research questions:
- Given current brand equity, what is the effect of advertising on referrals?
  - Ad placement
  - Creatives
  - Time period
  - Age and repetition
- Is marginal benefit greater than marginal cost for advertising?

Model

\[ R_t = \alpha + \gamma_1 R_{t-1} + \gamma_2 R_{t-2} + \gamma_3 R_{t-3} + \dots \\ + \beta_0 A_t + \beta_1 A_{t-1} + \beta A_{t-2} + \dots + \epsilon \]

where

$A$ = advertising
$R$ = referral

Controls: Opening hour + time of the day.

Expect:

Morning ads have longer decay than other time
Differences in creatives

Transfer function analysis

temporal patterns: auto correlations + partial auto-correlation show patterns at the hourly and weekly level
Lag structure: 3 lags on the dependent, and 4 lags on the independent (advertising)
- Why there are lags of the dependent variable:
  - Algebraic: if didn’t have of the dependent, the independent lag would be infinite
  - Intuitive: separate the effect of carry over effect of advertising and inertia.
Error patterns:

\[ R_t = \alpha + v(\mathbf{B})A_t +N_t \]

where

$R_t, A_t$ stationary
$v(\mathbf{B})$ transfer function of advertising on referrals where $v(\mathbf{B}) = Cw(B)B^b / \delta(B)$
$N_t = [\theta(B) / \phi(B)](1- B)^d a_t$ where $a_t \sim N(0)$

Advertising Effects (decay)

Total effects of advertising = sum of ad coefficients divided by (1 - sum of lag-referral coefficients)

\[ \text{Total Effect} = \frac{\sum_{l = 0}^n \beta_l }{(1- \sum_{j=1}^p \lambda_l)} \]

where $l$ is the index for the time lag

and the partial advertising effect at each time period is

\[ TA_{t-l} = \beta_l A_{t-l} + \sum_{j=0}^l \lambda_j TA_{t-l+j} \]

Results

Advertising effect dissipate after 8 hours
Ad Effectiveness varies by station
Creatives also varies

36.4.2 (Tellis and Franses 2006) Optimal Data Interval for estimating ad response (on sales)

Such a seminal paper
This could also be applied to firm optimal interval for estimating announcement effect on stock performance.

Too disaggregate does not lead to disaggregate bias

Optimal interval is unit exposure time (not inter-purchase time)

To get the true estimates, it depends on the unit exposure time (instead of assumption of the advertising process)

Definition:

Term	Definition
Data Interval	temporal level of the records
Inter purchase time	Smallest calendar time between any two consumer purchases
Duration Interval	Length of time that advertising effect lasts
Calendar time	Discrete time period
Exposure time	Moment a pulse of ad first hits a consumer
p% duration interval	length of time that accounts for $p$% of the advertising effect
Current effect of ad	portion of the total advertising effect that occurs in the same time period as the exposure
Duration interval bias	carryover effect estimated at the true interval - estimated on aggregate data

Optimal interval balances between storage cost and estimate unbiasedness

Koyck model

$s_t, a_t$ are sales and ad at the true microdata interval

\[ s_t = \mu + \beta a_t + \beta \lambda a_{t-1} + \beta \lambda^2 a_{t-2} + \dots + \epsilon_t \]

where

$\epsilon \sim N(0, \sigma^2_\epsilon)$
$\beta$ = current effect of advertising
$\beta/(1- \lambda)$ = carryover effect
$\lambda$ determines the duration interval (what do we call this term)

Using (Koyck 1954) transformation (i.e., multiply both sides by $1 - \lambda L$ where $L$ is the familiar lag operator $L^k y_t = y_{t-k}$) then

\[ s_t = \lambda s_{t-1} + \beta a_t + \epsilon_t - \lambda \epsilon_{t-1} \]

For aggregate data, denote $S_T$ as the aggregate sales series from aggregating sales in the $K$ periods from the current to the $K-1$ prior period that are sampled at the current period

\[ \begin{aligned} S_T &= s_t + s_{t-1}+ s_{t-2}+ \dots + s_{t-(K-1)} \\ & = (1 + L + L^2 + \dots + L^{K-1})s_t \end{aligned} \]

Hence,

\[ A_T = (1 + L + L^2 + \dots + L^{K-1}) a_t \\ \epsilon_T = (1 + L + L^2 + \dots + L^{K-1}) \epsilon_t \\ S_{T-1} = (1 + L + L^2 + \dots + L^{K-1}) s_{t-K} \]

The true aggregate form of the micromodel

\[ S_T = \lambda^K S_{T-1} + \beta A_T + \beta \lambda (1 + \lambda L + \lambda^2 L^2 + \dots + \lambda^{K-1} L^{K-1}) \\ \times (1 + L + \dots + L_{K-1})a_{t-1} + \epsilon_T - \lambda^K \epsilon_{T-1} \]

The bias stem from the fact that

\[ A_{T-1} \neq (1 + \lambda L + \lambda^2 L^2 + \dots + \lambda^{K-1} L^{K-1}) \\ \times (1 + L + \dots + L_{K-1})a_{t-1} \]

because it was lost in aggregation

With optimal data interval (1 exposure pulse per interval), we can recover the carryover effect

\[ \frac{\beta_1 + \beta_2}{1 - \lambda^K} \]

and the true duration interval is

\[ \sqrt[K]{\hat{\lambda}^K} \]

the the current effect is $\beta$

When we have even more dis aggregate data than the optimal interval, we just have to adjust the formula to recover the true effects.

36.4.3 (T. S. Teixeira, Wedel, and Pieters 2010) Ad Pulsing to prevent consumer ad avoidance

Model: probit with MCMC
Data: eye-tracking on 31 commercials for 2000 participants.
New metric to predict attention dispersion based on eye-tracking data.
Optimization of ads:
- problem: minimize avoidance subject to a given level of brand activity level
- Solution: Pulsing

36.4.4 (Sethuraman, Tellis, and Briesch 2011) Advertising effectiveness meta-analysis

Data: 1960 - 2008, 56 studies.

Average short-term ad elasticity is .12

a decline in the advertising elasticity over time.

advertising elasticity is higher

for durable goods (vs. nondurables)
in the early stage than the mature stage of the life cycle
yearly data than quarterly data
ad is measured in gross rating points than monetary terms

Long-term ad elasticity is .24

36.4.5 (Liaukonyte, Teixeira, and Wilbur 2015) TV advertising on online shopping

Impression merging process: human coders
Data: $3.4 bil spending by 20 brands, consists of traffic and transactions and content measures for 1,2224 commercials.
Dif-n-dif: 2 mins pre/post windows of time. (similar to regression discontinuity)
Action-focus content increases direct website traffic and sales conditional on visitation
Info and emotion-focus content reduce web traffic while increases purchases, and positive net effect on sales for most brands.
Imagery-focus ad content decreases direct traffic to the website
After the tv ad
1. consumer choose whether to visit the website
2. consumer then determine whether to buy a product
Data:
- Online traffic: comScore Media Metrix
  - Direct traffic
  - Search engine referrals
  - Transaction Count
- TV Ad Data: Kantar Media
Argument for no endogeneity problem is that brands can’t manipulate the exact time the ad will air. (since hte ad will be placed in a 15-min window while the research design looks at the 4 minutes windows). For the case that the authors look at the 2-hour window, they use the dif-n-dif design where they pick the largest brands within each product category that did not advertise

36.4.6 (Tirunillai and Tellis 2017) TV ad on Online chatter: synthetic control

Raw metrics

Reviews: from Amazon, Epinions, cnet, twitter, YouTube, Facebook
1. Volume of reviews
2. valence of the review (positive vs. negative)
3. Polarity (entropy)
Blogs: from Spinn3r
1. Volume
2. In-degree (links) of the brand website
3. In-degree (links) of blog posts
4. Volume of blogs that gain/lose rank

Using Dynamic factor analysis

\[ Y_t = \xi f_t + \epsilon_t \\ f_t = \Psi f_{t-1} + \eta_t \]

where

$Y_t$ raw measure of reviews and blogs
$f_t$ is the underlying factors
$\xi$ is the factors loadings
$\epsilon$ idiosyncratic error
$\eta$ = white noise where $E(\epsilon_t \eta'_{t-k})=0$

Dimension of chatter (using dynamic factor analysis)

Content-based dimensions:
- Popularity: loads on volume of reviews and blogs
- Negativity: loads positively on positive valence and polarity and negatively on positive valence
Information spread dimensions:
- Visibility: loads on the volume of blogs and the in-degree links of the brand website
- Virality: loads on volume of blogs that gained rank and in-degree of the blogs

TV ad causally increases a short positive effect on online chatter (info-spread > content-based)

Ad can reduce the negativity in online chatter in the short-term.

Ad can

simulate conversation online
trigger brand recall
Interpreting experience: give more favorable assessment toward the brand
Refute negatives: greater credibility and persuasiveness

Empirical Setting: A campaign: Let’s Do Amazing (ad duration). 20 days after the campaign date)>

Method:

Synthetic control (synthetic brand): the difference might already account for the spillover effect of the focal brands on other brands in the same industry (authors argue that there was no spillover effect).
No justification for 70 days before and 20 days after
To make sure YouTube did not affect much, the authors use data from Visible Measures to assess viewership, and TV viewership from https://tvlistings.zap2it.com/?aid=gapzap and Nielsen TV Ratings and Stradegy (need to ask about this company).
Authors also use Vector Auto-regressive model to examine the short-term and long-term dynamics between the dependent (chatter metrics) and independent variables (advertising).

References

Clarke, Darral G. 1976. “Econometric Measurement of the Duration of Advertising Effect on Sales.” Journal of Marketing Research 13 (4): 345. https://doi.org/10.2307/3151017.

Koyck, Leendert Marinus. 1954. Distributed Lags and Investment Analysis. Vol. 4. North-Holland Publishing Company.

Liaukonyte, Jura, Thales Teixeira, and Kenneth C. Wilbur. 2015. “Television Advertising and Online Shopping.” Marketing Science 34 (3): 311–30. https://doi.org/10.1287/mksc.2014.0899.

Sethuraman, Raj, Gerard J. Tellis, and Richard A. Briesch. 2011. “How Well Does Advertising Work? Generalizations from Meta-Analysis of Brand Advertising Elasticities.” Journal of Marketing Research 48 (3): 457–71. https://doi.org/10.1509/jmkr.48.3.457.

Teixeira, Thales S., Michel Wedel, and Rik Pieters. 2010. “Moment-to-Moment Optimal Branding in TV Commercials: Preventing Avoidance by Pulsing.” Marketing Science 29 (5): 783–804. https://doi.org/10.1287/mksc.1100.0567.

Tellis, Gerard J., Rajesh K. Chandy, and Pattana Thaivanich. 2000. “Which Ad Works, When, Where, and How Often? Modeling the Effects of Direct Television Advertising.” Journal of Marketing Research 37 (1): 32–46. https://doi.org/10.1509/jmkr.37.1.32.18716.

Tellis, Gerard J., and Philip Hans Franses. 2006. “Optimal Data Interval for Estimating Advertising Response.” Marketing Science 25 (3): 217–29. https://doi.org/10.1287/mksc.1050.0178.

———. 2017. “Does Offline TV Advertising Affect Online Chatter? Quasi-Experimental Analysis Using Synthetic Control.” Marketing Science 36 (6): 862–78. https://doi.org/10.1287/mksc.2017.1040.