5.11 RSS: Scraping newspaper websites

RSS feeds

  • Really Simple Syndication: originally developed as a way to regularly check for new content on sites
  • Includes list of entries (with some more information) and when they were updated
  • Written in XML format (extensible Markup Language)
  • Example: The Guardian RSS feed