8.16 Web APIs
- Super relevant.. there are APIs for everything
- Classic use case: Companies provide data to developers
- Today
- Companies provide access to services over API
- Upload data and get data back
- e.g., upload German texts and get translations back
8.16.1 APIS
- API = Application Programming Interface
- a set of structured http requests that return data in a lightweight format.
- HTTP = Hypertext Transfer Protocol
- how browsers and e-mail clients communicate with servers. Source: Munzert et al. (2014), Figure 9.8
8.16.2 Types of APIs:
- RESTful APIs: queries for static information at current moment (e.g. user profiles, posts, etc.)
- Streaming APIs: real time data (e.g. new tweets, weather alerts…)
APIs generally have extensive documentation:
- Written for developers, so must be understandable for humans
- What to look for: endpoints and parameters.
- e.g., DeepL API Documentation
Most APIs are rate-limited:
- Restrictions on number of API calls by user/IP address and period of time.
- Commercial APIs may impose a monthly fee
8.16.3 Connecting with an API
Constructing a REST API call:
- Baseline URL endpoint: https://maps.googleapis.com/maps/api/geocode/json
- Parameters:
?address=budapest
- Authentication token (optional):
&key=XXXXX
From R, use httr package to make GET request:
library(httr)
r <- GET("https://maps.googleapis.com/maps/api/geocode/json",
query=list(address="budapest"))
If request was successful, returned code will be 200, where 4xx indicates client errors and 5xx indicates server errors. If you need to attach data, use POST request.
library(httr)
r <- GET(
"https://maps.googleapis.com/maps/api/geocode/json",
query=list(address="budapest"))
8.16.4 JSON format (responses)
Response is often in JSON format (Javascript Object Notation)
- Type:
content(r, "text")
- Data stored in key-value pairs. Why? Lightweight, more flexible than traditional table format.
- Curly brackets embrace objets; square brackets enclose arrays (vectors)
- Use
fromJSON
function fromjsonlite
package to read JSON data into R - But many packages have their own specific functions to read data in JSON format;
content(r, "parsed")
8.16.5 Authentication
- Many APIs require an access key or token
- An alternative, open standard is called OAuth
- Connections without sharing username or password, only temporary tokens that can be refreshed
httr
package in R implements most cases (examples)
8.16.6 R packages
Before starting a new project, worth checking if there’s already an R package for that API. Where to look?
- CRAN Web Technologies Task View (but only packages released in CRAN)
- GitHub (including unreleased packages and most recent versions of packages)
- rOpenSci Consortium
Also see this great list of APIs in case you need inspiration (see also here)
8.16.7 Why APIs?
Advantages:
- ‘Pure’ data collection: avoid malformed HTML, no legal issues, clear data structures, more trust in data collection…
- Standardized data access procedures: transparency, replicability
- Robustness: benefits from ‘wisdom of the crowds’
Disadvantages
- They’re not too common (yet!)
- Dependency on API providers
- Lack of natural connection to R
8.16.8 Scraping: Decisions, decisions…
References
Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons.