8.16 Web APIs

  • Super relevant.. there are APIs for everything
  • Classic use case: Companies provide data to developers
  • Today
    • Companies provide access to services over API
    • Upload data and get data back
    • e.g., upload German texts and get translations back

8.16.1 APIS

  • API = Application Programming Interface
    • a set of structured http requests that return data in a lightweight format.
  • HTTP = Hypertext Transfer Protocol
    • how browsers and e-mail clients communicate with servers. optional caption text Source: Munzert et al. (2014), Figure 9.8

8.16.2 Types of APIs:

  1. RESTful APIs: queries for static information at current moment (e.g. user profiles, posts, etc.)
  2. Streaming APIs: real time data (e.g. new tweets, weather alerts…)

APIs generally have extensive documentation:

  • Written for developers, so must be understandable for humans
  • What to look for: endpoints and parameters.
  • e.g., DeepL API Documentation

Most APIs are rate-limited:

  • Restrictions on number of API calls by user/IP address and period of time.
  • Commercial APIs may impose a monthly fee

8.16.3 Connecting with an API

Constructing a REST API call:

From R, use httr package to make GET request:

library(httr)
r <- GET("https://maps.googleapis.com/maps/api/geocode/json",
query=list(address="budapest"))

If request was successful, returned code will be 200, where 4xx indicates client errors and 5xx indicates server errors. If you need to attach data, use POST request.

library(httr)
r <- GET(
"https://maps.googleapis.com/maps/api/geocode/json",
query=list(address="budapest"))

8.16.4 JSON format (responses)

Response is often in JSON format (Javascript Object Notation)

  • Type: content(r, "text")
  • Data stored in key-value pairs. Why? Lightweight, more flexible than traditional table format.
  • Curly brackets embrace objets; square brackets enclose arrays (vectors)
  • Use fromJSON function from jsonlite package to read JSON data into R
  • But many packages have their own specific functions to read data in JSON format; content(r, "parsed")

8.16.5 Authentication

  • Many APIs require an access key or token
  • An alternative, open standard is called OAuth
  • Connections without sharing username or password, only temporary tokens that can be refreshed
  • httr package in R implements most cases (examples)

8.16.6 R packages

Before starting a new project, worth checking if there’s already an R package for that API. Where to look?

  • CRAN Web Technologies Task View (but only packages released in CRAN)
  • GitHub (including unreleased packages and most recent versions of packages)
  • rOpenSci Consortium

Also see this great list of APIs in case you need inspiration (see also here)

8.16.7 Why APIs?

Advantages:

  • ‘Pure’ data collection: avoid malformed HTML, no legal issues, clear data structures, more trust in data collection…
  • Standardized data access procedures: transparency, replicability
  • Robustness: benefits from ‘wisdom of the crowds’

Disadvantages

  • They’re not too common (yet!)
  • Dependency on API providers
  • Lack of natural connection to R

8.16.8 Scraping: Decisions, decisions…

optional caption text

optional caption text

References

Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons.