You will need to install the following packages for this chapter (run the code):
# install.packages('pacman') library(pacman) p_load('httr', 'gtrendsR', 'ggplot2', 'dplyr')
- What data/service is provided by the API?
The API is provided by Google.
With Google Trends, one gets access to a largely unfiltered sample of actual search topics (up to 36h before your search) and a filtered and representative sample for search topics older than 36 hours starting from the year 2004. The data is anonymized, can be obtained from different Google products like “Web search”, “News”, “Images”, “Shopping” and “Youtube,” can be filtered by different categories to get the data for the correct meaning of the word, and is aggregated, which means that the searches of all cities/regions are aggregated to the federal state level, country level or world level. The results you get are a standardized measure of search volume for single search terms, a combination of search terms using operators (see table below), or comparisons (one input in relation to the other inputs) over a selected time period. Google calculates how much search volume in each region a search term or query had, relative to all searches in that region. Using this information, Google assigns a measure of popularity to search terms (scale of 0 - 100), leaving out repeated searches from the same person over a short period of time and searches with apostrophes and other special characters.
|No quotation marks (e.g. Corona symptoms)||You get results for each word in your query|
|Quotation marks (e.g. “Corona symptoms”)||You get results for the coherent search phrase|
|Plus sign (e.g. corona +covid)||Serves as function of an OR-operator|
|Minus sign (e.g. corona -symptoms)||Excludes word after the operator|
- What are the prerequisites to access the API (authentication)?
It can be used without an API key by anyone for free directly in the internet browser (no sign up needed).
- What does a simple API call look like?
Just click here.
- How can we access the API from R (httr + other packages)?
Example using “httr” package:
library(httr) GET("https://trends.google.com/trends/explore", query=list(q = "Covid",geo = "US"))
- but just html-output, we recommend to use the gtrendsR package
Example using “gtrendsR” package:
# visualizing google searches for the word "corona symptoms" in # Germany and Austria in the period 01/01/2020 - 27/04/2021 library(gtrendsR) library(ggplot2) library(dplyr) data("countries") # get abbreviations of all countries to filter data data("categories") # get numbers of all categories to filter data # Simple call <- gtrends("corona symptome",geo=c("DE", "AT")) res plot(res)
- Note (1): the use of c() in the keyword argument of the gtrends function allows comparisons of up to 5 searches (separator = comma).
- Note (2): the use of the pattern ‘“xyz”’ in the keyword argument of the gtrends function corresponds to the inverted commas in the table above, all other punctuation methods in the table above can be used as indicated in the table.
#Combination using dplyr and ggplot = gtrends(keyword="corona symptome", geo=c("DE", "AT"), time = "2020-01-01 2021-04-27", gprop="web") trend <- trend$interest_over_time trend_df <- trend_df %>% trend_df mutate(hits = as.numeric(hits), date = as.Date(date)) %>% replace(is.na(.), 0) ggplot(trend_df, aes(x=date, y=hits, group=geo, col=geo)) + geom_line(size=2) + scale_x_date(date_breaks = "2 months" , date_labels = "%b-%y") + labs(color= "Countries") + ggtitle("Frequencies for the query -corona symptoms- in the period: 01/01/2020 - 27/04/2021")