• Preface
  • 1 Introduction
    • 1.1 Script & Material
    • 1.2 About me
    • 1.3 Who are you?
    • 1.4 Content & Objectives
    • 1.5 Overview of some readings
    • 1.6 Tools and software we use
      • 1.6.1 R: Why use it?
      • 1.6.2 R: Where/how to study?
      • 1.6.3 R: Installation and setup
      • 1.6.4 Datacamp
      • 1.6.5 Google Cloud
    • 1.7 Descriptive inference, causal inference & prediction
      • 1.7.1 Descriptive questions
      • 1.7.2 Causal questions
      • 1.7.3 Prediction
    • 1.8 The digital revolution
    • 1.9 How does the internet work? (+ access)
    • 1.10 Technology adoption: United States
    • 1.11 Platform usage (1): Social Media Adoption
    • 1.12 Platform usage (2): Social Media Adoption (Barchart)
    • 1.13 Platform usage (3): Social Networking Young
    • 1.14 Platform usage (4): Daily hours digital media
    • 1.15 What is Computational Social science (CSS)?
    • 1.16 CSS: Challenges for Social Scientists
    • 1.17 Exercise: What data can reveal about you…
    • 1.18 Exercise: Documentaries by Deutsche Welle
    • 1.19 X-Exercise: Farecast and Google Flu
    • 1.20 X-Exercise: Big Data is not about the data
    • 1.21 X-Exercise: Download your data!
    • 1.22 Presentations: Example Barbera (2015)
    • 1.23 Good practices in data analysis (X)
      • 1.23.1 Reproducibility & Replicability
      • 1.23.2 Reproducibility & Replicability
      • 1.23.3 Why reproducability?
      • 1.23.4 Reproducability: My current approach
      • 1.23.5 Reproducability in practice
    • 1.24 References
  • 2 Big data & new data sources (1)
    • 2.1 For a starter
    • 2.2 What is Big Data?
    • 2.3 Big data: Quotes for a start
    • 2.4 Big data: Definitions
    • 2.5 Big data: The Vs
    • 2.6 Big data: Analog age vs. digital age (1)
    • 2.7 Big data: Analog age vs. digital age (2)
    • 2.8 Big data: Repurposing
    • 2.9 Presentations
    • 2.10 Exercise: Ten common characteristics of big data (Salganik 2017)
    • 2.11 New forms of data: Overview
    • 2.12 Where can we find big data sources data?
    • 2.13 References
  • 3 Big data & new data sources (2)
    • 3.1 Presentations
    • 3.2 Example: Salience of issues
    • 3.3 Google trends: Caveats
    • 3.4 Data security & ethics (1): What might happen?
    • 3.5 Data security & ethics (2): Yes…
    • 3.6 Data security & ethics (3): Protection
    • 3.7 Data: Size & dimensions & creation
    • 3.8 Data: How is it stored?
    • 3.9 Data & Databases
    • 3.10 R Database Packages
    • 3.11 SQL: Intro
    • 3.12 SQL: Components of a query
    • 3.13 Lab: Working with a SQL database
      • 3.13.1 Creating an SQL database
      • 3.13.2 Querying an SQL database
      • 3.13.3 Querying multiple SQL tables
      • 3.13.4 Grouping and aggregating
    • 3.14 Exercise: SQL database
    • 3.15 SQL at scale: Strategy
    • 3.16 SQL at scale: Google BigQuery
    • 3.17 Lab (Skip!): Setting up GCP research credits
      • 3.17.1 Google Translation API
    • 3.18 Lab: Google Big Query
    • 3.19 Exercise: Setting up & querying Google BigQuery
    • 3.20 Strategies to work with big data
    • 3.21 References
  • 4 Data collection: APIs
    • 4.1 Web APIs
    • 4.2 API = Application Programming Interface
    • 4.3 Why APIs?
    • 4.4 Scraping: Decisions, decisions…
    • 4.5 Types of APIs
    • 4.6 Some APIs
    • 4.7 R packages
    • 4.8 (Reponse) Formats: JSON
    • 4.9 (Reponse) Formats: XML
    • 4.10 Authentication
    • 4.11 Connect to API: Example
    • 4.12 Lab: Scraping data from APIs
    • 4.13 Exercise: Scraping data from APIs
      • 4.13.1 Homework: APIs for social scientists
    • 4.14 X-Lab: Clarify API
    • 4.15 X-Twitter’s APIs
    • 4.16 X-Lab: Twitter’s streaming API
      • 4.16.1 Authenticating
      • 4.16.2 Collecting data from Twitter’s Streaming API
    • 4.17 X-Exercise: Twitter’s streaming API
    • 4.18 X-Lab: Twitter’s REST API
      • 4.18.1 Searching recent tweets
      • 4.18.2 Extracting users’ profile information
      • 4.18.3 Building friend and follower networks
      • 4.18.4 Estimating ideology based on Twitter networks
      • 4.18.5 Other types of data
    • 4.19 X-Exercise: Twitter’s REST API
  • 5 Data collection: Web (screen) scraping
    • 5.1 Web scraping: Basics
      • 5.1.1 Scraping data from websites: Why?
      • 5.1.2 Scraping the web: two approaches
      • 5.1.3 The rules of the game
      • 5.1.4 The art of web scraping
    • 5.2 Screen (Web) scraping
      • 5.2.1 Scenarios
      • 5.2.2 HTML: a primer
      • 5.2.3 HTML: a primer
      • 5.2.4 Beyond HTML
      • 5.2.5 Parsing HTML code
    • 5.3 Lab: Scraping tables
    • 5.4 Exercise: Scraping tables
    • 5.5 Lab: Scraping (more) tables
    • 5.6 Exercise: Scraping (more) tables
    • 5.7 Lab: Scraping unstructured data
    • 5.8 Exercise: Scraping unstructured data
    • 5.9 Scrape dynamic webpages: Selenium
    • 5.10 Lab: Scraping web data behind web forms
      • 5.10.1 Using RSelenium
    • 5.11 RSS: Scraping newspaper websites
    • 5.12 Lab: Scraping newspaper website
  • 6 Machine learning: Introduction
    • 6.1 Classical statistics vs. machine learning
    • 6.2 Machine learning as programming paradigm
    • 6.3 Terminological differences (1)
    • 6.4 Terminological differences (2)
    • 6.5 Prediction: Mean
    • 6.6 Prediction: Linear model (Equation) (1)
    • 6.7 Prediction: Linear model (Equation) (2)
    • 6.8 Prediction: Linear model (Visualization)
    • 6.9 Prediction: Linear model (Estimation)
    • 6.10 Prediction: Linear model (Prediction)
    • 6.11 Exercise: What’s predicted?
    • 6.12 Exercise: Discussion
    • 6.13 Regression vs. Classification
    • 6.14 Overview of Classification
    • 6.15 Assessing Model Accuracy
    • 6.16 The Logistic Model
    • 6.17 LR in R: Predicting Recidvism (1)
    • 6.18 LR in R: Predicting Recidvism (2)
    • 6.19 LR in R: Predicting Recidvism (3)
    • 6.20 Lab: Predicting recidvism (Classification)
      • 6.20.1 Inspecting the dataset
      • 6.20.2 Splitting the datasets
      • 6.20.3 Comparing the scores of black and white defendants
      • 6.20.4 Building a predictiv model
      • 6.20.5 Predicting values
      • 6.20.6 Training error rate
      • 6.20.7 Test error rate
      • 6.20.8 Comparison to COMPAS score
      • 6.20.9 Model comparisons
    • 6.21 Exercise
    • 6.22 Resampling methods (1)
    • 6.23 Resampling methods (2): Cross-validation
    • 6.24 Resampling methods (3): Validation set approach
    • 6.25 Resampling methods (3): Leave-one-out cross-validation (LOOCV)
    • 6.26 Resampling methods (4): Leave-one-out cross-validation (LOOCV)
    • 6.27 Resampling methods (5): k-Fold Cross-Validation
    • 6.28 Exercise: Resampling methods
    • 6.29 Lab: Resampling & cross-validation
      • 6.29.1 Simple sampling
      • 6.29.2 Validation set approach
      • 6.29.3 Leave-one-out cross-validation (LOOCV)
      • 6.29.4 k-Fold Cross-Validation
      • 6.29.5 Comparing models
    • 6.30 Classifier performance & fairness (1): False positives & negatives
    • 6.31 Classifier performance & fairness (2)
    • 6.32 Lab: Classifier performance & fairness
      • 6.32.1 Initial evaluation of the COMPAS scores
      • 6.32.2 False positives negatives and correct classification
      • 6.32.3 Altering the threshold
      • 6.32.4 Adjusting thresholds
    • 6.33 Other ML methods: Quick overview
    • 6.34 Trade-Off: Prediction Accuracy vs. Model Interpretability
    • 6.35 References
  • 7 Machine Learning: Text classification
    • 7.1 Text as Data
    • 7.2 Language in NLP
    • 7.3 (R-) Workflow for Text Analysis
      • 7.3.1 Data collection
      • 7.3.2 Data manipulation: Basics (1)
      • 7.3.3 Data manipulation: Basics (2)
      • 7.3.4 Data manipulation: Basics (3)
      • 7.3.5 Data manipulation: Tidytext Example (1)
      • 7.3.6 Data manipulation: Tidytext Example (2)
      • 7.3.7 Data manipulation: Tm Example
      • 7.3.8 Data manipulation: Quanteda Example
      • 7.3.9 Data manipulation: Summary
      • 7.3.10 Vectorization: Basics
      • 7.3.11 Vectorization: Tidytext example
      • 7.3.12 Vectorization: Tm example
      • 7.3.13 Vectorization: Quanteda example
      • 7.3.14 Analysis: Unsupervised text classification
      • 7.3.15 Analysis: Topic Models
      • 7.3.16 Analysis: Latent Dirichlet Allocation (1)
      • 7.3.17 Analysis: Latent Dirichlet Allocation (2)
      • 7.3.18 Analysis: Structural Topic Models
    • 7.4 Lab: Structural Topic Model
      • 7.4.1 Setup
      • 7.4.2 Data Pre-processing
      • 7.4.3 Analysis: (Structural) Topic Model
      • 7.4.4 Validation and Model Selection
      • 7.4.5 Visualization and Model Interpretation
      • 7.4.6 Highest word probabilities for each topic
  • 8 Machine learning: Intro to Deep learning
    • 8.1 Artificial, machine and deep learning
    • 8.2 Classical ML: What it does (1)
    • 8.3 Classical ML: What it does (2)
    • 8.4 The ‘deep’ in deep learning (1)
    • 8.5 The ‘deep’ in deep learning (2)
    • 8.6 The ‘deep’ in deep learning (3)
    • 8.7 Understanding how DL works (1)
    • 8.8 Understanding how DL works (2)
    • 8.9 Understanding how DL works (3)
    • 8.10 Achievements of deep learning
    • 8.11 Short-term hype & promise of AI (Ch. 1.1.7, 1.1.8)
    • 8.12 The universal workflow of machine learning
    • 8.13 Getting started: Network anatomy
    • 8.14 Layers: the building blocks of deep learning
    • 8.15 Loss functions and optimizers
    • 8.16 Keras & R packages
    • 8.17 Installation
    • 8.18 Lab: Predicting house prices: a regression example
  • 9 Machine learning: APIs
    • 9.1 Using ML APIs for research: Pros and Cons
    • 9.2 Lab: Using Google ML APIs
      • 9.2.1 Software
      • 9.2.2 Install & load packages
      • 9.2.3 Twitter: Authenticate & load data
      • 9.2.4 Google: Authenticate
      • 9.2.5 Translation API
      • 9.2.6 NLP API: Sentiment
      • 9.2.7 NLP API: Syntax
      • 9.2.8 Analyzing images
      • 9.2.9 References
  • 10 Summary: Computational Social Science
  • 11 References

Computational Social Science: Theory & Application

4.6 Some APIs

  • Wikipedia
  • LinkedIn
  • DeepL
  • Google Translate
  • Facebook
  • Meetup
  • Strava
    • e.g. https://blog.revolutionanalytics.com/2018/01/strava-visualization.html
  • Uber