• Preface
  • 1 Introduction
    • 1.1 Script & Material
    • 1.2 About me
    • 1.3 Who are you?
    • 1.4 Content & Objectives & readings
    • 1.5 Tools and software we use
      • 1.5.1 R: Why use it?
      • 1.5.2 R: Where/how to study?
      • 1.5.3 R: Installation and setup
      • 1.5.4 Datacamp
      • 1.5.5 Google Cloud
    • 1.6 Research questions
      • 1.6.1 Research questions: Types
      • 1.6.2 Research questions: Descriptive (What?)
      • 1.6.3 Research questions: Causal (Why?)
      • 1.6.4 Research questions: Predictive
    • 1.7 The digital revolution
    • 1.8 The Internet (+ access)
    • 1.9 Technology adoption: United States
    • 1.10 Platform usage (1): Social Media Adoption
    • 1.11 Platform usage (2): Social Media Adoption (Barchart)
    • 1.12 Platform usage (3): Social Networking Young
    • 1.13 Platform usage (4): Daily hours digital media
    • 1.14 What is Computational Social science (CSS)?
    • 1.15 CSS: Chances
    • 1.16 CSS: Challenges
    • 1.17 Exercise: What data can reveal about you…
    • 1.18 Example-Presentation: Example Barbera (2015)
  • 2 Big data & new data sources (1)
    • 2.1 Good practices in data analysis (X)
      • 2.1.1 Why reproducability?
      • 2.1.2 Reproducability: My current approach
    • 2.2 Appetizer
    • 2.3 What is data?
    • 2.4 Big data: Quotes for a start
    • 2.5 Big data: Definitions
    • 2.6 Big data: The Vs
    • 2.7 Big data: Analog age vs. digital age (1)
    • 2.8 Big data: Analog age vs. digital age (2)
    • 2.9 Big data: Repurposing
    • 2.10 Presentation
    • 2.11 Exercise: Ten common characteristics of big data (Salganik 2017)
    • 2.12 New forms of data: Overview
    • 2.13 Exercise: Big Data is not about the data
    • 2.14 Where can we find big data sources data?
  • 3 Big data & new data sources (2)
    • 3.1 Presentation
    • 3.2 Example: Salience of issues
    • 3.3 Google trends: Caveats
    • 3.4 Data: How is it stored?
    • 3.5 Short lab: Create data of different size
    • 3.6 Data & Databases
    • 3.7 R Database Packages
    • 3.8 SQL: Intro
    • 3.9 SQL: Components of a query
    • 3.10 Lab: Working with a SQL database
      • 3.10.1 Creating an SQL database
  • 4 Big data & new data sources (3)
    • 4.1 Lab: Working with a SQL database
      • 4.1.1 Querying an SQL database
      • 4.1.2 Querying multiple SQL tables
      • 4.1.3 Grouping and aggregating
    • 4.2 Exercise: Local SQL database
    • 4.3 Lab: Three strategies: Local SQL database
      • 4.3.1 Strategy 1: Sample and Model
      • 4.3.2 Strategy 2: Chunk and Pull
      • 4.3.3 Strategy 3: Push Compute to Data
  • 5 Data collection: Platform APIs (1)
    • 5.1 Web APIs
    • 5.2 API = Application Programming Interface
    • 5.3 Why APIs?
    • 5.4 Scraping: Decisions, decisions…
    • 5.5 Types of APIs
    • 5.6 Some APIs
    • 5.7 R packages & access
    • 5.8 (Reponse) Formats: JSON
    • 5.9 (Reponse) Formats: XML
    • 5.10 Authentication
    • 5.11 Lab: Connect to Google Geocoding API
    • 5.12 Lab: Connect to Twitter Academic API
  • 6 Data collection: Platform APIs (2)
    • 6.1 Data security & ethics (1): What might happen?
    • 6.2 Data security & ethics (2): Yes…
    • 6.3 Data security & ethics (3): Protection
    • 6.4 Lab: Media Cloud API
    • 6.5 Lab: Twitter API
    • 6.6 Exercises: Media Cloud API & Twitter API
  • 7 Machine learning: Basics (1)
    • 7.1 API Reviews
    • 7.2 Classical statistics vs. machine learning
    • 7.3 Machine learning as programming paradigm
    • 7.4 Terminological differences (1)
    • 7.5 Terminological differences (2)
    • 7.6 Prediction: Mean
    • 7.7 Prediction: Linear model (Equation) (1)
    • 7.8 Prediction: Linear model (Equation) (2)
    • 7.9 Prediction: Linear model (Visualization)
    • 7.10 Prediction: Linear model (Estimation)
    • 7.11 Prediction: Linear model (Prediction)
    • 7.12 Exercise: What’s predicted?
    • 7.13 Exercise: Discussion
    • 7.14 Regression vs. Classification
    • 7.15 Overview of Classification
    • 7.16 Assessing Model Accuracy
  • 8 Machine learning: Basics (2)
    • 8.1 The Logistic Model
    • 8.2 LR in R: Predicting Recidvism (1)
    • 8.3 LR in R: Predicting Recidvism (2): Estimate model
    • 8.4 LR in R: Predicting Recidvism (3): Use model to predict
    • 8.5 LR in R: Predicting Recidvism (5)
    • 8.6 Lab: Predicting recidvism (Classification)
      • 8.6.1 Inspecting the dataset
      • 8.6.2 Splitting the datasets
      • 8.6.3 Comparing the scores of black and white defendants
      • 8.6.4 Building a predictive model
      • 8.6.5 Predicting values
      • 8.6.6 Training error rate
      • 8.6.7 Test error rate
      • 8.6.8 Comparison to COMPAS score
      • 8.6.9 Model comparisons
    • 8.7 Exercise
  • 9 Machine learning: Basics (3)
    • 9.1 Retake: Simple setup to build predictive model
    • 9.2 Resampling methods (1)
    • 9.3 Resampling methods (2): Cross-validation
    • 9.4 Resampling methods (3): Validation set approach
    • 9.5 Resampling methods (4): Leave-one-out cross-validation (LOOCV)
    • 9.6 Resampling methods (5): Leave-one-out cross-validation (LOOCV)
    • 9.7 Resampling methods (6): k-Fold Cross-Validation
    • 9.8 Resampling methods (7): Some caveats
    • 9.9 Exercise: Resampling methods
    • 9.10 Lab: Resampling & cross-validation
      • 9.10.1 Simple sampling
      • 9.10.2 Validation set approach
      • 9.10.3 Leave-one-out cross-validation (LOOCV)
      • 9.10.4 k-Fold Cross-Validation
      • 9.10.5 Comparing models
    • 9.11 Other ML methods: Quick overview
    • 9.12 Trade-Off: Prediction Accuracy vs. Model Interpretability
    • 9.13 Exercise
  • 10 Machine Learning: Text classification - Unsupervised (1)
    • 10.1 Text as Data
    • 10.2 Language in NLP
    • 10.3 (R-)Workflow for Text Analysis
      • 10.3.1 Data collection
      • 10.3.2 Data manipulation: Basics (1)
      • 10.3.3 Data manipulation: Basics (2)
      • 10.3.4 Data manipulation: Basics (3)
      • 10.3.5 Data manipulation: Tidytext Example (1)
      • 10.3.6 Data manipulation: Tidytext Example (2)
      • 10.3.7 Vectorization: Basics
      • 10.3.8 Vectorization: Tidytext example
      • 10.3.9 Vectorization: Tm example
      • 10.3.10 Analysis: Supervised vs. unsupervised
      • 10.3.11 Topic Modeling
      • 10.3.12 Topic Modeling: Latent Dirichlet Allocation (1)
      • 10.3.13 Topic Modeling: Latent Dirichlet Allocation (2)
      • 10.3.14 Topic Modeling: Structural Topic Models (vs. LDA)
  • 11 Machine Learning: Text classification - Unsupervised (2)
    • 11.1 Lab: Structural Topic Model
      • 11.1.1 Introduction
      • 11.1.2 Setup
      • 11.1.3 Data Pre-processing
      • 11.1.4 Analysis: Structural Topic Model
      • 11.1.5 Validation and Model Selection
      • 11.1.6 Visualization and Model Interpretation
  • 12 Machine Learning: Text classification - Supervised (3)
    • 12.1 Supervised vs. unsupervised learning (1)
    • 12.2 Topic models
    • 12.3 Tree-based methods
    • 12.4 Classification trees
    • 12.5 Advantages and Disadvantages of Trees (C.h 8.1.4)
    • 12.6 Bagging
    • 12.7 Out-of-Bag (OOB) Error Estimation
    • 12.8 Variable Importance Measures
    • 12.9 Random forests
  • 13 Machine Learning: Text classification - Supervised (4)
    • 13.1 Lab: Random Forest for text classification
      • 13.1.1 Preparing the data: DTM
      • 13.1.2 Training data & RF classifier training
      • 13.1.3 Evaluating the RF classifier
      • 13.1.4 Exploring variable relevance & importance
      • 13.1.5 Add predictions to nonlabelled data
      • 13.1.6 Add predictions to labelled (training) data
      • 13.1.7 Creating the final dataset
      • 13.1.8 How to create a training dataset
  • 14 Final session
  • 15 Machine learning: Intro to Deep learning
    • 15.1 Artificial, machine and deep learning
    • 15.2 Classical ML: What it does (1)
    • 15.3 Classical ML: What it does (2)
    • 15.4 The ‘deep’ in deep learning (1)
    • 15.5 The ‘deep’ in deep learning (2)
    • 15.6 The ‘deep’ in deep learning (3)
    • 15.7 Understanding how DL works (1)
    • 15.8 Understanding how DL works (2)
    • 15.9 Understanding how DL works (3)
    • 15.10 Achievements of deep learning
    • 15.11 Short-term hype & promise of AI (Ch. 1.1.7, 1.1.8)
    • 15.12 The universal workflow of machine learning
    • 15.13 Getting started: Network anatomy
    • 15.14 Layers: the building blocks of deep learning
    • 15.15 Loss functions and optimizers
    • 15.16 Keras & R packages
    • 15.17 Installation
    • 15.18 Lab: Predicting house prices: a regression example
  • 16 Machine learning: APIs
    • 16.1 Using ML APIs for research: Pros and Cons
    • 16.2 Lab: Using Google ML APIs
      • 16.2.1 Software
      • 16.2.2 Install & load packages
      • 16.2.3 Twitter: Authenticate & load data
      • 16.2.4 Google: Authenticate
      • 16.2.5 Translation API
      • 16.2.6 NLP API: Sentiment
      • 16.2.7 NLP API: Syntax
      • 16.2.8 Analyzing images
      • 16.2.9 References
  • 17 Summary: Computational Social Science
  • 18 References
  • 19 Content & exercises that are optional
  • 20 Big data in the cloud
    • 20.1 SQL at scale
      • 20.1.1 SQL at scale: Google BigQuery
      • 20.1.2 Lab: Google Big Query & 3 strategies
      • 20.1.3 Exercise: Setting up & querying Google BigQuery
    • 20.2 Optional-Exercise: Documentaries by Deutsche Welle
    • 20.3 Optional-Exercise: Farecast and Google Flu
    • 20.4 Lab (Skip!): Setting up GCP research credits
      • 20.4.1 Google Translation API
    • 20.5 Optional-Exercise: Download your data!
  • 21 Appendix (old material)
    • 21.1 Classifier performance & fairness (1): False positives & negatives
    • 21.2 Classifier performance & fairness (2)
    • 21.3 Lab: Classifier performance & fairness
      • 21.3.1 Initial evaluation of the COMPAS scores
      • 21.3.2 False positives negatives and correct classification
      • 21.3.3 Altering the threshold
      • 21.3.4 Adjusting thresholds
  • 22 References

Computational Social Science

5.6 Some APIs

  • Wikipedia
  • LinkedIn
  • DeepL
  • Google Translate
  • Facebook
  • Meetup
  • Strava
    • e.g. https://blog.revolutionanalytics.com/2018/01/strava-visualization.html
  • Uber