19.6 Google Trends

Online search volume and news reporting show a strong correlation, while academic publishing lag behind online public interest due to its delayed review and publication process (Nghiem et al. 2016). This result was interpreted as news serves as a conductor between the research and the public community.

Google Trends only return

Data Granularity	Data Window
Hourly Data	Last 7 days
Daily Data	Less than 9 months
Weekly Data	between 9 months and 5 years
Monthly Data	Longer than 5 years

Since the data is indexed for the chosen time window, for the same keywords with different time windows might result in different index values.

19.6.1 Relative Search

library(gtrendsR)

# for proxy 
# setHandleParameters(
#     user = "xxxx",
#     password = "*******",
#     domain = "mydomain",
#     proxyhost = "10.111.124.113",
#     proxyport = 8080
# )

keywords = c("7eleven", "3m")
country = c("US")
time = ("2010-01-01 2012-01-30") # earliest is 2004
channel = "web"

trends = gtrends(keyword = keywords, geo = country, time = time, gprop = channel)

# objects
names(trends)

# this is weekly data only
time_trend <- trends$interest_over_time

The value is relative to the maximum volume (not absolute search volume)

channel type (“web” (default), “news”, “images”, “froogle” , “youtube”)

library(ggplot2)
 
plot <-
    ggplot(data = time_trend, aes(
        x = date,
        y = hits,
        group = keyword,
        col = keyword
    )) +
    geom_line() + xlab('Time') + ylab('Relative Interest (weekly)') + theme_bw() +
    theme(
        legend.title = element_blank(),
        legend.position = "bottom",
        legend.text = element_text(size = 12)
    ) + ggtitle("Google Search Volume")
plot

Smoothing to remove seasonality

plot <-
    ggplot(data = time_trend, aes(
        x = date,
        y = hits,
        group = keyword,
        col = keyword
    )) +
    geom_smooth(span = 0.5, se = FALSE) + xlab('Time') + ylab('Relative Interest') +
    theme_bw() + theme(
        legend.title = element_blank(),
        legend.position = "bottom",
        legend.text = element_text(size = 12)
    ) + ggtitle("Google Search Volume")

plot

Alternatively, we can use the plot function in the gtrendsR package readily

plot(gtrendsR::gtrends(
    keyword = keywords,
    geo = country,
    time = time
))

Scaling method (overlapping method) (recommended)

A way to get daily data from gtrendsR readily is proposed by Alex Dyachenko

Get daily estimates for the window less than 9 months
Get monthly estimates for your desired time frame
Multiply daily estimates for each month from step 1 by their weights from step 2
Concatenation method (normalization/dailydata method)

Daily data are concatenated from 1-month queries and normalized by weekly trends data, which has been done in this post

From this post, we can be sure that the scaling method is better than the normalization method

Rate limit:

1,400 requests in 4 hours

library(gtrendsR)
library(tidyverse)
library(lubridate)

get_daily_gtrend <-
    function(keyword = c('7eleven', '3M'),
             geo = 'US',
             from = '2013-01-01',
             to = '2013-02-15') {
        if (ymd(to) >= floor_date(Sys.Date(), 'month')) {
            to <- floor_date(ymd(to), 'month') - days(1)
            
            if (to < from) {
                stop("Specifying \'to\' date in the current month is not allowed")
            }
        }
        
        aggregated_data <-
            gtrends(keyword = keyword,
                    geo = geo,
                    time = paste(from, to))
        if (is.null(aggregated_data$interest_over_time)) {
            print('There is no data in Google Trends!')
            return()
        }
        
        mult_m <- aggregated_data$interest_over_time %>%
            mutate(hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
            group_by(month = floor_date(date, 'month'), keyword) %>%
            summarise(hits = sum(hits)) %>%
            ungroup() %>%
            mutate(ym = format(month, '%Y-%m'),
                   mult = hits / max(hits)) %>%
            select(month, ym, keyword, mult) %>%
            as_tibble()
        
        pm <- tibble(
            s = seq(ymd(from), ymd(to), by = 'month'),
            e = seq(ymd(from), ymd(to), by = 'month') + months(1) - days(1)
        )
        
        raw_trends_m <- tibble()
        
        for (i in seq(1, nrow(pm), 1)) {
            curr <- gtrends(keyword,
                            geo = geo,
                            time = paste(pm$s[i], pm$e[i]))
            if (is.null(curr$interest_over_time))
                next
            print(paste(
                'for',
                pm$s[i],
                pm$e[i],
                'retrieved',
                count(curr$interest_over_time),
                'days of data (all keywords)'
            ))
            raw_trends_m <- rbind(raw_trends_m,
                                  curr$interest_over_time)
        }
        
        trend_m <- raw_trends_m %>%
            select(date, keyword, hits) %>%
            mutate(ym = format(date, '%Y-%m'),
                   hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
            as_tibble()
        
        trend_res <- trend_m %>%
            left_join(mult_m) %>%
            mutate(est_hits = hits * mult) %>%
            select(date, keyword, est_hits) %>%
            as_tibble() %>%
            mutate(date = as.Date(date))
        
        return(trend_res)
    }

daily_trend <- get_daily_gtrend(
    keyword = c('7eleven', '3M'),
    geo = 'US',
    from = '2013-01-01',
    to = '2013-02-01'
)
head(daily_trend)

This method was used in a research paper (Risteski and Davcev 2014)

Similarly, without the gtrendsR package, you can follow Erik Johansson’s method, but the Google no longer has his URL: http://www.google.com/trends/trendsReport?hl=en-US&q= so you might have to figure out the new URL

Other methods include:

(Ying Liu et al. 2019)

19.6.2 Absolute Search

See Google Trend Dataset available via BigQuery

References

Liu, Ying, Geng Peng, Lanyi Hu, Jichang Dong, and Qingqing Zhang. 2019. “Using Google Trends and Baidu Index to Analyze the Impacts of Disaster Events on Company Stock Prices.” Industrial Management & Data Systems 120 (2): 350–65. https://doi.org/10.1108/imds-03-2019-0190.

Nghiem, Le T. P., Sarah K. Papworth, Felix K. S. Lim, and Luis R. Carrasco. 2016. “Analysis of the Capacity of Google Trends to Measure Interest in Conservation Topics and the Role of Online News.” Edited by Zhong-Ke Gao. PLOS ONE 11 (3): e0152802. https://doi.org/10.1371/journal.pone.0152802.

Risteski, Dimche, and Danco Davcev. 2014. “Can We Use Daily Internet Search Query Data to Improve Predicting Power of EGARCH Models for Financial Time Series Volatility?” International Conference on Computer Science and Information Systems (ICSIS’2014) Oct 17-18, 2014 Dubai (UAE), October. https://doi.org/10.15242/iie.e1014066.