6.4 Lab: Media Cloud API

Below we will…

  • …download data from the MediaCloud API.
  • …create a simple graph that visualizes the number of articles referring to Russia.

Start by creating an MediaCloud account to obtain an API key here: https://explorer.mediacloud.org/#/user/profile

Then we have to ad this key as an environmental variable called MEDIACLOUD_API_KEY. We open the .Renviron files and add the following line: MEDIACLOUD_API_KEY=#you-key-here#.

usethis::edit_r_environ(scope = "user")

Subsequently, we install the corresponding R packages mediacloud.

# install.packages('pacman')
p_load('httr', 'stringr',
'mediacloud', 'tidytext', 'quanteda', 'quanteda')

Then we check our rate limit, search for a particular news outlet and download the corresponding stories (only a few because it takes a long time..).

# Check your rate limit

# Search for a particular media
mediacloud::search_media("New York Times")

# Collect articles within date range on Russia
# Very recent dates do not seem to work
data_russia <- search_stories(text = "Russia", 
                              media_id = 1, 
                              #after_date = "2021-01-1",
                              #before_date = "2021-12-31",
                              n = 10)

Subsequently, we could explore different questions, e.g., how often and when was Russia mentioned in NYT articles?

First we aggregate the dataset to get the number of articles per day.

data_russia_date <- data_russia %>%
                      mutate(date = as_date(publish_date)) %>%
                      group_by(date) %>%
                        summarise(n = n())

Subsequently, we can visualize this number across time.

ggplot(data = data_russia_date,
       aes(x = date,
           y = n)) +
  geom_point() +
  scale_x_date(date_breaks = "1 month", 
               date_labels =  "%d %b %Y") +