6.4 Lab: Media Cloud API
Below we will…
- …download data from the MediaCloud API.
- …create a simple graph that visualizes the number of articles referring to Russia.
Start by creating an MediaCloud account to obtain an API key here: https://explorer.mediacloud.org/#/user/profile
Then we have to ad this key as an environmental variable called MEDIACLOUD_API_KEY
. We open the .Renviron files and add the following line: MEDIACLOUD_API_KEY=#you-key-here#
.
::edit_r_environ(scope = "user") usethis
Subsequently, we install the corresponding R packages mediacloud
.
# install.packages('pacman')
library(pacman)
p_load('httr', 'stringr',
'mediacloud', 'tidytext', 'quanteda', 'quanteda')
Then we check our rate limit, search for a particular news outlet and download the corresponding stories (only a few because it takes a long time..).
# Check your rate limit
::check_rate_limit()
mediacloud
# Search for a particular media
::search_media("New York Times")
mediacloud
# Collect articles within date range on Russia
# Very recent dates do not seem to work
<- search_stories(text = "Russia",
data_russia media_id = 1,
#after_date = "2021-01-1",
#before_date = "2021-12-31",
n = 10)
data_russia
Subsequently, we could explore different questions, e.g., how often and when was Russia mentioned in NYT articles?
First we aggregate the dataset to get the number of articles per day.
<- data_russia %>%
data_russia_date mutate(date = as_date(publish_date)) %>%
group_by(date) %>%
summarise(n = n())
Subsequently, we can visualize this number across time.
ggplot(data = data_russia_date,
aes(x = date,
y = n)) +
geom_point() +
scale_x_date(date_breaks = "1 month",
date_labels = "%d %b %Y") +
ylim(0,max(data_russia_date$n))