6.4 Lab: Media Cloud API

Below we will…

  • …download data from the MediaCloud API.
  • …create a simple graph that visualizes the number of articles referring to Russia.

Start by creating an MediaCloud account to obtain an API key here: https://explorer.mediacloud.org/#/user/profile

Then we have to ad this key as an environmental variable called MEDIACLOUD_API_KEY. We open the .Renviron files and add the following line: MEDIACLOUD_API_KEY=#you-key-here#.

usethis::edit_r_environ(scope = "user")

Subsequently, we install the corresponding R packages mediacloud.

# install.packages('pacman')
library(pacman)
p_load('httr', 'stringr',
'mediacloud', 'tidytext', 'quanteda', 'quanteda')

Then we check our rate limit, search for a particular news outlet and download the corresponding stories (only a few because it takes a long time..).

# Check your rate limit
mediacloud::check_rate_limit()

# Search for a particular media
mediacloud::search_media("New York Times")

# Collect articles within date range on Russia
# Very recent dates do not seem to work
data_russia <- search_stories(text = "Russia", 
                              media_id = 1, 
                              #after_date = "2021-01-1",
                              #before_date = "2021-12-31",
                              n = 10)
data_russia

Subsequently, we could explore different questions, e.g., how often and when was Russia mentioned in NYT articles?

First we aggregate the dataset to get the number of articles per day.

data_russia_date <- data_russia %>%
                      mutate(date = as_date(publish_date)) %>%
                      group_by(date) %>%
                        summarise(n = n())

Subsequently, we can visualize this number across time.

ggplot(data = data_russia_date,
       aes(x = date,
           y = n)) +
  geom_point() +
  scale_x_date(date_breaks = "1 month", 
               date_labels =  "%d %b %Y") +
  ylim(0,max(data_russia_date$n))