5.8 (Reponse) Formats: JSON

  • Response often in JSON or XML format (cf. Munzert et al. 2014, chap. 3)
    • http_type(#): Check format
  • JSON (Javascript Object Notation, *.json)
    • Structure: Data stored in key-value pairs. Why? Lightweight, more flexible than traditional table format.
      • Various data types possible (strings, numbers etc.)
    • Curly brackets embrace objects; square brackets enclose arrays (vectors)
      • objects ({"name": "peter","phone":"397483"})
      • arrays ([1910, 1911])
    • jsonlite package: Use fromJSON function to read JSON data into R
    • But many packages have their own specific functions to read data in JSON format
    • Syntax example
  • R functions to extract/format content
    • # below is the APIs’ response
      • Try replacing it with GET("https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/Hadley_Wickham/daily/20170101/20170102")
    • writeLines(content(#, type = "text")): Inspect & printout content
    • content(#, as = "parsed"): Parse content
    • library(jsonlite); fromJSON(content(#, type = "text")); fromJSON(content(#, type = "text"), simplifyDataFrame = TRUE): Parse with jsonlite

References

Munzert, Simon, Christian Rubba, Peter Meißner, and Dominic Nyhuis. 2014. Automated Data Collection with r: A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons.