19.6 Google Trends
Online search volume and news reporting show a strong correlation, while academic publishing lag behind online public interest due to its delayed review and publication process (Nghiem et al. 2016). This result was interpreted as news serves as a conductor between the research and the public community.
Google Trends only return
Data Granularity | Data Window |
---|---|
Hourly Data | Last 7 days |
Daily Data | Less than 9 months |
Weekly Data | between 9 months and 5 years |
Monthly Data | Longer than 5 years |
Since the data is indexed for the chosen time window, for the same keywords with different time windows might result in different index values.
19.6.1 Relative Search
library(gtrendsR)
# for proxy
# setHandleParameters(
# user = "xxxx",
# password = "*******",
# domain = "mydomain",
# proxyhost = "10.111.124.113",
# proxyport = 8080
# )
keywords = c("7eleven", "3m")
country = c("US")
time = ("2010-01-01 2012-01-30") # earliest is 2004
channel = "web"
trends = gtrends(keyword = keywords, geo = country, time = time, gprop = channel)
# objects
names(trends)
# this is weekly data only
time_trend <- trends$interest_over_time
The value is relative to the maximum volume (not absolute search volume)
channel type (“web” (default), “news”, “images”, “froogle” , “youtube”)
library(ggplot2)
plot <-
ggplot(data = time_trend, aes(
x = date,
y = hits,
group = keyword,
col = keyword
)) +
geom_line() + xlab('Time') + ylab('Relative Interest (weekly)') + theme_bw() +
theme(
legend.title = element_blank(),
legend.position = "bottom",
legend.text = element_text(size = 12)
) + ggtitle("Google Search Volume")
plot
Smoothing to remove seasonality
plot <-
ggplot(data = time_trend, aes(
x = date,
y = hits,
group = keyword,
col = keyword
)) +
geom_smooth(span = 0.5, se = FALSE) + xlab('Time') + ylab('Relative Interest') +
theme_bw() + theme(
legend.title = element_blank(),
legend.position = "bottom",
legend.text = element_text(size = 12)
) + ggtitle("Google Search Volume")
plot
Alternatively, we can use the plot function in the gtrendsR
package readily
- Scaling method (overlapping method) (recommended)
A way to get daily data from gtrendsR
readily is proposed by Alex Dyachenko
Get daily estimates for the window less than 9 months
Get monthly estimates for your desired time frame
Multiply daily estimates for each month from step 1 by their weights from step 2
Concatenation method (normalization/dailydata method)
Daily data are concatenated from 1-month queries and normalized by weekly trends data, which has been done in this post
From this post, we can be sure that the scaling method is better than the normalization method
Rate limit:
- 1,400 requests in 4 hours
library(gtrendsR)
library(tidyverse)
library(lubridate)
get_daily_gtrend <-
function(keyword = c('7eleven', '3M'),
geo = 'US',
from = '2013-01-01',
to = '2013-02-15') {
if (ymd(to) >= floor_date(Sys.Date(), 'month')) {
to <- floor_date(ymd(to), 'month') - days(1)
if (to < from) {
stop("Specifying \'to\' date in the current month is not allowed")
}
}
aggregated_data <-
gtrends(keyword = keyword,
geo = geo,
time = paste(from, to))
if (is.null(aggregated_data$interest_over_time)) {
print('There is no data in Google Trends!')
return()
}
mult_m <- aggregated_data$interest_over_time %>%
mutate(hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
group_by(month = floor_date(date, 'month'), keyword) %>%
summarise(hits = sum(hits)) %>%
ungroup() %>%
mutate(ym = format(month, '%Y-%m'),
mult = hits / max(hits)) %>%
select(month, ym, keyword, mult) %>%
as_tibble()
pm <- tibble(
s = seq(ymd(from), ymd(to), by = 'month'),
e = seq(ymd(from), ymd(to), by = 'month') + months(1) - days(1)
)
raw_trends_m <- tibble()
for (i in seq(1, nrow(pm), 1)) {
curr <- gtrends(keyword,
geo = geo,
time = paste(pm$s[i], pm$e[i]))
if (is.null(curr$interest_over_time))
next
print(paste(
'for',
pm$s[i],
pm$e[i],
'retrieved',
count(curr$interest_over_time),
'days of data (all keywords)'
))
raw_trends_m <- rbind(raw_trends_m,
curr$interest_over_time)
}
trend_m <- raw_trends_m %>%
select(date, keyword, hits) %>%
mutate(ym = format(date, '%Y-%m'),
hits = as.integer(ifelse(hits == '<1', '0', hits))) %>%
as_tibble()
trend_res <- trend_m %>%
left_join(mult_m) %>%
mutate(est_hits = hits * mult) %>%
select(date, keyword, est_hits) %>%
as_tibble() %>%
mutate(date = as.Date(date))
return(trend_res)
}
daily_trend <- get_daily_gtrend(
keyword = c('7eleven', '3M'),
geo = 'US',
from = '2013-01-01',
to = '2013-02-01'
)
head(daily_trend)
This method was used in a research paper (Risteski and Davcev 2014)
Similarly, without the gtrendsR
package, you can follow Erik Johansson’s method, but the Google no longer has his URL: http://www.google.com/trends/trendsReport?hl=en-US&q= so you might have to figure out the new URL
Other methods include:
19.6.2 Absolute Search
See Google Trend Dataset available via BigQuery