• Preface
  • 1 The seminar
    • 1.1 Script & Material
    • 1.2 About me
    • 1.3 Who are you?
    • 1.4 Why are we using R?
    • 1.5 Where/how to study R?
    • 1.6 Installation and setup of R
    • 1.7 Content & Objectives
    • 1.8 Overview of some readings
    • 1.9 Descriptive vs. causal research questions
      • 1.9.1 Descriptive questions
      • 1.9.2 Causal questions
  • 2 What is big data?
    • 2.1 What is data? (Wikipedia)
    • 2.2 Big data: Quotes for a start
    • 2.3 Definitions
    • 2.4 The Vs
    • 2.5 Exercise: Farecast and Google Flu
    • 2.6 Exercise: Big Data is not about the data
    • 2.7 Exercise: What data can reveal about you…
    • 2.8 Exercise: Download your data!
    • 2.9 Big Data: Challenges for Social Scientists
  • 3 Measurement & Variables
    • 3.1 Discussion: Objective vs. subjective reality
    • 3.2 Example 1
    • 3.3 Example 2
    • 3.4 Measurement abstract
    • 3.5 Scenarios, planned and realized measurements
    • 3.6 Distribution(s) of measurements
  • 4 Data: Fundementals
    • 4.1 Basics
    • 4.2 Table format
    • 4.3 (Empirical) Univariate distributions
    • 4.4 (Empirical) Joint distributions
    • 4.5 One more joint distribution
    • 4.6 Theoretical (Probability) Distributions
  • 5 Models
    • 5.1 What is a model?
    • 5.2 Example: Mean as a model
    • 5.3 Example: Linear model (Equation)
    • 5.4 Example: Linear model (Visualization)
    • 5.5 Estimation
    • 5.6 Prediction
    • 5.7 Estimand, estimator and estimation
    • 5.8 Associational vs. causal inference
    • 5.9 Assumptions
  • 6 New forms/types of data
    • 6.1 The digital revolution
    • 6.2 From traditional to new data
    • 6.3 New forms of data
    • 6.4 Where can we find such data?
  • 7 Good practices in data analysis
    • 7.1 Reproducibility & Replicability
    • 7.2 Why reproducability?
    • 7.3 Reproducability: My current approach
    • 7.4 Reproducability in practice
  • 8 Capture and collect data
    • 8.1 Web scraping: Basics
      • 8.1.1 Scraping data from websites: Why?
      • 8.1.2 Scraping the web: two approaches
      • 8.1.3 The rules of the game
      • 8.1.4 The art of web scraping
    • 8.2 Screen (Web) scraping
      • 8.2.1 Scenarios
      • 8.2.2 HTML: a primer
      • 8.2.3 HTML: a primer
      • 8.2.4 Beyond HTML
      • 8.2.5 Parsing HTML code
    • 8.3 Lab 1: Scraping tables
    • 8.4 Exercise 1: Scraping tables
    • 8.5 Today & repetition
    • 8.6 Lab 2: Scraping (many) tables
    • 8.7 Exercise 2.1: Scraping (many) tables
    • 8.8 Exercise 2.2: Scraping (many) tables
    • 8.9 Lab 3: Scraping unstructured data
    • 8.10 Exercise 3: Scraping unstructured data
    • 8.11 Scrape dynamic webpages: Selenium
    • 8.12 Lab 4: Scraping web data behind web forms
    • 8.13 Exercise 4: Scraping web data behind web forms
    • 8.14 RSS: Scraping newspaper websites
    • 8.15 Lab 4: Scraping newspaper website
    • 8.16 Web APIs
      • 8.16.1 APIS
      • 8.16.2 Types of APIs:
      • 8.16.3 Connecting with an API
      • 8.16.4 JSON format (responses)
      • 8.16.5 Authentication
      • 8.16.6 R packages
      • 8.16.7 Why APIs?
      • 8.16.8 Scraping: Decisions, decisions…
    • 8.17 Some APIs
    • 8.18 Lab 5: Scraping data from APIs
    • 8.19 Exercise 5: Scraping data from APIs
    • 8.20 What comes next? 3rd session/day
    • 8.21 Lab 6: Clarify API
    • 8.22 Twitter’s APIs
    • 8.23 Lab 7: Twitter’s streaming API
      • 8.23.1 Authenticating
      • 8.23.2 Collecting data from Twitter’s Streaming API
    • 8.24 Exercise 7: Twitter’s streaming API
    • 8.25 Lab 8: Twitter’s REST API
      • 8.25.1 Searching recent tweets
      • 8.25.2 Extracting users’ profile information
      • 8.25.3 Building friend and follower networks
      • 8.25.4 Estimating ideology based on Twitter networks
      • 8.25.5 Other types of data
    • 8.26 Exercise 8: Twitter’s REST API
  • 9 Encoding issues
    • 9.1 Lab 9: Basics of character encoding in R
    • 9.2 Exercise 9: Character encoding
  • 10 Research examples
    • 10.1 Dressel and Farid (2018): Predicting recidivism
    • 10.2 Barbera (2015): Birds of the Same Feather Tweet Together
    • 10.3 Edelmann et al (2017): Racial Discrimination in the Sharing Economy
    • 10.4 Lazer et al (2014) The Parable of Google Flu: Traps in Big Data Analysis
    • 10.5 Swan (2013) The Quantified Self
    • 10.6 Göbel & Munzert (2018) Political Advertising on the Wikipedia Marketplace of Information
    • 10.7 Przepiorka et al (2017) Order without Law
  • 11 Storing and managing (big) data
    • 11.1 Size & dimensions & creation of data
    • 11.2 How is data stored?
    • 11.3 Introduction to SQL
      • 11.3.1 Databases
      • 11.3.2 SQL
      • 11.3.3 Components of a SQL query
      • 11.3.4 SQL at scale: Google BigQuery
    • 11.4 Lab 9: Working with a SQL database
      • 11.4.1 Creating an SQL database
      • 11.4.2 Querying an SQL database
      • 11.4.3 Querying multiple SQL tables
      • 11.4.4 Grouping and aggregating
    • 11.5 Exercise 9: SQL database
    • 11.6 Data warehouses
    • 11.7 Lab 10: Setting up & using Google BigQuery
      • 11.7.1 More advanced queries
    • 11.8 Exercise 10: Setting up & querying Google BigQuery
  • 12 Analyzing Big Data
    • 12.1 Descriptive vs. causal questions (Repetition)
      • 12.1.1 Descriptive questions (and analysis)
      • 12.1.2 Causal questions (and analysis)
    • 12.2 Lab 11: Descriptive statistics
    • 12.3 Lab 12: Visualization
    • 12.4 Lab 13: Sentiment analysis
    • 12.5 Lab 14: Time in R
  • 13 Data security & ethics
    • 13.1 What could happen to your data?
    • 13.2 Yes…
    • 13.3 Protection against different problems
  • 14 R Basics
    • 14.1 Start R and help function
    • 14.2 Objects, working directory and workspace
      • 14.2.1 Example: Objects, working directory and workspace
      • 14.2.2 Exercise: Working directory, objects and workspace
      • 14.2.3 Solution: Working directory, objects and workspace
    • 14.3 Calculations and logical comparisons
      • 14.3.1 Example: Calculations and logical comparisons
      • 14.3.2 Exercise: Calculations and logical comparisons (Homework)
      • 14.3.3 Solution: Calculations and logical comparisons
    • 14.4 How to write good code/workflow!
    • 14.5 Objects: Classes and their structure
      • 14.5.1 Overview of structure of object classes
      • 14.5.2 Vectors: Numerical, logical and character
      • 14.5.3 Factors and lists
    • 14.6 Packages
      • 14.6.1 Example: Packages
      • 14.6.2 Exercise: Packages
      • 14.6.3 Solution: Packages (HOMEWORK)
    • 14.7 Data frames and data management
      • 14.7.1 The basics
      • 14.7.2 The attach()-function
      • 14.7.3 Example: The basics
      • 14.7.4 Logic of accessing subsets of data frames
      • 14.7.5 Recoding variables
    • 14.8 DPLYR: Grammar of data management (Hadley Wickham)
      • 14.8.1 filter() & slice()
      • 14.8.2 arrange(): Reorder/sort rows
      • 14.8.3 select(): Subsetting and renaming
      • 14.8.4 distinct() & unique(): Extract distinct/unique rows/values
      • 14.8.5 mutate() & transform ()
      • 14.8.6 group_by(): Applying functions across groups
      • 14.8.7 Chaining with dplyr
      • 14.8.8 anti_join(): Merging data frames

Big data and Social Science

8.17 Some APIs

  • Wikipedia
  • LinkedIn
  • DeepL
  • Google Translate
  • Facebook
  • Meetup
  • Strava
    • e.g. https://blog.revolutionanalytics.com/2018/01/strava-visualization.html
  • Uber