8.25 Lab 8: Twitter’s REST API

We’ll now turn to a different type of Twitter data – static data, either recent tweets or user-level information. This type of data can be retrieved with Twitter’s REST API. We will use the tweetscores package here – a package that Pablo Barberá created to facilitate the collection and analysis of Twitter data.

8.25.1 Searching recent tweets

It is possible to download recent tweets, but only up those less than 7 days old, and in some cases not all of them.

load("./www/my_oauth")
library(tweetscores)
library(streamR)

searchTweets(q=c("Kashoggi", "Turkey"), 
  filename="./www/survey-tweets.json",
  n=1000, until="2018-10-22", 
  oauth=my_oauth)

tweets <- parseTweets("./www/survey-tweets.json")

What are the most popular hashtags?

library(stringr)
ht <- str_extract_all(tweets$text, "#(\\d|\\w)+")
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))

You can check the documentation about the options for string search here.

8.25.2 Extracting users’ profile information

This is how you would extract information from user profiles:

wh <- c("nantermod", "JayBadran")
users <- getUsersBatch(screen_names=wh,
                       oauth=my_oauth)
str(users)

Which of these has the most followers?

users[which.max(users$followers_count),]
users$screen_name[which.max(users$followers_count)]

Download up to 3,200 recent tweets from a Twitter account:

getTimeline(filename="./www/JayBadran.json", screen_name="JayBadran", n=100, oauth=my_oauth)

What are the most common hashtags?

tweets <- parseTweets("./www/JayBadran.json")
ht <- str_extract_all(tweets$text, "#(\\d|\\w)+")
ht <- unlist(ht)
head(sort(table(ht), decreasing = TRUE))

8.25.3 Building friend and follower networks

Download friends and followers:

followers <- getFollowers("JayBadran", 
    oauth=my_oauth)
friends <- getFriends("JayBadran", 
    oauth=my_oauth)

What are the most common words that friends of the JayBadran account use to describe themselves on Twitter?

# extract profile descriptions
users <- getUsersBatch(ids=friends, oauth=my_oauth)
# create table with frequency of word use
library(quanteda)
tw <- corpus(users$description[users$description!=""])
dfm <- dfm(tw, remove=c(stopwords("english"), stopwords("german"),
                                 "t.co", "https", "rt", "rts", "http"),
           remove_punct=TRUE)
topfeatures(dfm, n = 30)
# create wordcloud
par(mar=c(0,0,0,0))
textplot_wordcloud(dfm, rotation=0, min_size=1, max_size=5, max_words=100)

8.25.4 Estimating ideology based on Twitter networks

The tweetscores package also includes functions to replicate the method developed in the Political Analysis paper Birds of a Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data. For an application of this method, see also this Monkey Cage blog post.

# download list of friends for an account
user <- "JayBadran"
friends <- getFriends(user, oauth=my_oauth)
# estimating ideology with correspondence analysis method
(theta <- estimateIdeology2(user, friends, verbose=FALSE))

# download list of friends for an account
user <- "MullerAltermatt"
friends <- getFriends(user, oauth=my_oauth)
# estimating ideology with correspondence analysis method
(theta <- estimateIdeology2(user, friends, verbose=FALSE))

8.25.5 Other types of data

The REST API offers also a long list of other endpoints that could be of use at some point, depending on your research interests.

  1. You can search users related to specific keywords:
users <- searchUsers(q="kashoggi", count=100, oauth=my_oauth)
users$screen_name[1:10]
  1. If you know the ID of the tweets, you can download it directly from the API. This is useful because tweets cannot be redistributed as part of the replication materials of a published paper, but the list of tweet IDs can be shared:
# Downloading tweets when you know the ID
getStatuses(ids=c("474134260149157888", "266038556504494082"),
            filename="./www/old-tweets.json",
            oauth=my_oauth)
parseTweets("./www/old-tweets.json")
  1. Lists of Twitter users, compiled by other users, are also accessible through the API.
# download user information from a list
MCs <- getList(list_name="new-members-of-congress", 
               screen_name="cspan", oauth=my_oauth)
head(MCs)

This is also useful if e.g. you’re interested in compiling lists of journalists, because media outlets offer these lists in their profiles.

  1. List of users who retweeted a particular tweet – unfortunately, it’s limited to only 100 most recent retweets.
# Download list of users who retweeted a tweet (unfortunately, only up to 100)
rts <- getRetweets(id='1054737221507522560', oauth=my_oauth)
# https://twitter.com/manal_alsharif/status/1054737221507522560
users <- getUsersBatch(ids=rts, oauth=my_oauth)
# create table with frequency of word use
library(quanteda)
tw <- corpus(users$description[users$description!=""])
dfm <- dfm(tw, remove=c(stopwords("english"), stopwords("german"),
                                 "t.co", "https", "rt", "rts", "http"),
           remove_punct = TRUE)
# create wordcloud
par(mar=c(0,0,0,0))
textplot_wordcloud(dfm, rot.per=0, scale=c(3, .50), max.words=100)
  1. And one final function to convert dates in their internal Twitter format to another format we could work with in R:
# format Twitter dates to facilitate analysis
tweets <- parseTweets("./www/JayBadran.json")
tweets$date <- formatTwDate(tweets$created_at, format="date")
hist(tweets$date, breaks="month")