5.9 Scrape dynamic webpages: Selenium

  • Dynamic website
    • Changes as the user interacts with it
    • Has reactive elements (e.g. drop-down menu)
  • General idea: Control your browser to scrape dynamically rendered web pages
  • RSelenium: Originally developed for web testing purposes
  • R will launch a browser session and all communication will be routed through that browser session.
  • phantomJS: scriptable headless browser (will not display website)
  • Capabilities: complete forms, write text, click on buttons or area of website, navigate to new URL…