6.4 Lab: Media Cloud API
Below we will…
- …download data from the MediaCloud API.
- …create a simple graph that visualizes the number of articles referring to Russia.
Start by creating an MediaCloud account to obtain an API key here: https://explorer.mediacloud.org/#/user/profile
Then we have to ad this key as an environmental variable called MEDIACLOUD_API_KEY. We open the .Renviron files and add the following line: MEDIACLOUD_API_KEY=#you-key-here#.
usethis::edit_r_environ(scope = "user")Subsequently, we install the corresponding R packages mediacloud.
# install.packages('pacman')
library(pacman)
p_load('httr', 'stringr',
'mediacloud', 'tidytext', 'quanteda', 'quanteda')Then we check our rate limit, search for a particular news outlet and download the corresponding stories (only a few because it takes a long time..).
# Check your rate limit
mediacloud::check_rate_limit()
# Search for a particular media
mediacloud::search_media("New York Times")
# Collect articles within date range on Russia
# Very recent dates do not seem to work
data_russia <- search_stories(text = "Russia",
media_id = 1,
#after_date = "2021-01-1",
#before_date = "2021-12-31",
n = 10)
data_russiaSubsequently, we could explore different questions, e.g., how often and when was Russia mentioned in NYT articles?
First we aggregate the dataset to get the number of articles per day.
data_russia_date <- data_russia %>%
mutate(date = as_date(publish_date)) %>%
group_by(date) %>%
summarise(n = n())Subsequently, we can visualize this number across time.
ggplot(data = data_russia_date,
aes(x = date,
y = n)) +
geom_point() +
scale_x_date(date_breaks = "1 month",
date_labels = "%d %b %Y") +
ylim(0,max(data_russia_date$n))