• Preface
  • 1 Introduction
    • 1.1 Script & Material
    • 1.2 About me
    • 1.3 Who are you?
    • 1.4 Content & Objectives
    • 1.5 Overview of some readings
    • 1.6 Tools and software we use
      • 1.6.1 R: Why use it?
      • 1.6.2 R: Where/how to study?
      • 1.6.3 R: Installation and setup
      • 1.6.4 Datacamp
      • 1.6.5 Google Cloud
    • 1.7 Descriptive inference, causal inference & prediction
      • 1.7.1 Descriptive questions
      • 1.7.2 Causal questions
      • 1.7.3 Prediction
    • 1.8 The digital revolution
    • 1.9 How does the internet work? (+ access)
    • 1.10 Technology adoption: United States
    • 1.11 Platform usage (1): Social Media Adoption
    • 1.12 Platform usage (2): Social Media Adoption (Barchart)
    • 1.13 Platform usage (3): Social Networking Young
    • 1.14 Platform usage (4): Daily hours digital media
    • 1.15 What is Computational Social science (CSS)?
    • 1.16 CSS: Challenges for Social Scientists
    • 1.17 Exercise: What data can reveal about you…
    • 1.18 Exercise: Documentaries by Deutsche Welle
    • 1.19 X-Exercise: Farecast and Google Flu
    • 1.20 X-Exercise: Big Data is not about the data
    • 1.21 X-Exercise: Download your data!
    • 1.22 Presentations: Example Barbera (2015)
    • 1.23 Good practices in data analysis (X)
      • 1.23.1 Reproducibility & Replicability
      • 1.23.2 Reproducibility & Replicability
      • 1.23.3 Why reproducability?
      • 1.23.4 Reproducability: My current approach
      • 1.23.5 Reproducability in practice
    • 1.24 References
  • 2 Big data & new data sources (1)
    • 2.1 For a starter
    • 2.2 What is Big Data?
    • 2.3 Big data: Quotes for a start
    • 2.4 Big data: Definitions
    • 2.5 Big data: The Vs
    • 2.6 Big data: Analog age vs. digital age (1)
    • 2.7 Big data: Analog age vs. digital age (2)
    • 2.8 Big data: Repurposing
    • 2.9 Presentations
    • 2.10 Exercise: Ten common characteristics of big data (Salganik 2017)
    • 2.11 New forms of data: Overview
    • 2.12 Where can we find big data sources data?
    • 2.13 References
  • 3 Big data & new data sources (2)
    • 3.1 Presentations
    • 3.2 Example: Salience of issues
    • 3.3 Google trends: Caveats
    • 3.4 Data security & ethics (1): What might happen?
    • 3.5 Data security & ethics (2): Yes…
    • 3.6 Data security & ethics (3): Protection
    • 3.7 Data: Size & dimensions & creation
    • 3.8 Data: How is it stored?
    • 3.9 Data & Databases
    • 3.10 R Database Packages
    • 3.11 SQL: Intro
    • 3.12 SQL: Components of a query
    • 3.13 Lab: Working with a SQL database
      • 3.13.1 Creating an SQL database
      • 3.13.2 Querying an SQL database
      • 3.13.3 Querying multiple SQL tables
      • 3.13.4 Grouping and aggregating
    • 3.14 Exercise: SQL database
    • 3.15 SQL at scale: Strategy
    • 3.16 SQL at scale: Google BigQuery
    • 3.17 Lab (Skip!): Setting up GCP research credits
      • 3.17.1 Google Translation API
    • 3.18 Lab: Google Big Query
    • 3.19 Exercise: Setting up & querying Google BigQuery
    • 3.20 Strategies to work with big data
    • 3.21 References
  • 4 Data collection: APIs
    • 4.1 Web APIs
    • 4.2 API = Application Programming Interface
    • 4.3 Why APIs?
    • 4.4 Scraping: Decisions, decisions…
    • 4.5 Types of APIs
    • 4.6 Some APIs
    • 4.7 R packages
    • 4.8 (Reponse) Formats: JSON
    • 4.9 (Reponse) Formats: XML
    • 4.10 Authentication
    • 4.11 Connect to API: Example
    • 4.12 Lab: Scraping data from APIs
    • 4.13 Exercise: Scraping data from APIs
      • 4.13.1 Homework: APIs for social scientists
    • 4.14 X-Lab: Clarify API
    • 4.15 X-Twitter’s APIs
    • 4.16 X-Lab: Twitter’s streaming API
      • 4.16.1 Authenticating
      • 4.16.2 Collecting data from Twitter’s Streaming API
    • 4.17 X-Exercise: Twitter’s streaming API
    • 4.18 X-Lab: Twitter’s REST API
      • 4.18.1 Searching recent tweets
      • 4.18.2 Extracting users’ profile information
      • 4.18.3 Building friend and follower networks
      • 4.18.4 Estimating ideology based on Twitter networks
      • 4.18.5 Other types of data
    • 4.19 X-Exercise: Twitter’s REST API
  • 5 Data collection: Web (screen) scraping
    • 5.1 Web scraping: Basics
      • 5.1.1 Scraping data from websites: Why?
      • 5.1.2 Scraping the web: two approaches
      • 5.1.3 The rules of the game
      • 5.1.4 The art of web scraping
    • 5.2 Screen (Web) scraping
      • 5.2.1 Scenarios
      • 5.2.2 HTML: a primer
      • 5.2.3 HTML: a primer
      • 5.2.4 Beyond HTML
      • 5.2.5 Parsing HTML code
    • 5.3 Lab: Scraping tables
    • 5.4 Exercise: Scraping tables
    • 5.5 Lab: Scraping (more) tables
    • 5.6 Exercise: Scraping (more) tables
    • 5.7 Lab: Scraping unstructured data
    • 5.8 Exercise: Scraping unstructured data
    • 5.9 Scrape dynamic webpages: Selenium
    • 5.10 Lab: Scraping web data behind web forms
      • 5.10.1 Using RSelenium
    • 5.11 RSS: Scraping newspaper websites
    • 5.12 Lab: Scraping newspaper website
  • 6 Machine learning: Introduction
    • 6.1 Classical statistics vs. machine learning
    • 6.2 Machine learning as programming paradigm
    • 6.3 Terminological differences (1)
    • 6.4 Terminological differences (2)
    • 6.5 Prediction: Mean
    • 6.6 Prediction: Linear model (Equation) (1)
    • 6.7 Prediction: Linear model (Equation) (2)
    • 6.8 Prediction: Linear model (Visualization)
    • 6.9 Prediction: Linear model (Estimation)
    • 6.10 Prediction: Linear model (Prediction)
    • 6.11 Exercise: What’s predicted?
    • 6.12 Exercise: Discussion
    • 6.13 Regression vs. Classification
    • 6.14 Overview of Classification
    • 6.15 Assessing Model Accuracy
    • 6.16 The Logistic Model
    • 6.17 LR in R: Predicting Recidvism (1)
    • 6.18 LR in R: Predicting Recidvism (2)
    • 6.19 LR in R: Predicting Recidvism (3)
    • 6.20 Lab: Predicting recidvism (Classification)
      • 6.20.1 Inspecting the dataset
      • 6.20.2 Splitting the datasets
      • 6.20.3 Comparing the scores of black and white defendants
      • 6.20.4 Building a predictiv model
      • 6.20.5 Predicting values
      • 6.20.6 Training error rate
      • 6.20.7 Test error rate
      • 6.20.8 Comparison to COMPAS score
      • 6.20.9 Model comparisons
    • 6.21 Exercise
    • 6.22 Resampling methods (1)
    • 6.23 Resampling methods (2): Cross-validation
    • 6.24 Resampling methods (3): Validation set approach
    • 6.25 Resampling methods (3): Leave-one-out cross-validation (LOOCV)
    • 6.26 Resampling methods (4): Leave-one-out cross-validation (LOOCV)
    • 6.27 Resampling methods (5): k-Fold Cross-Validation
    • 6.28 Exercise: Resampling methods
    • 6.29 Lab: Resampling & cross-validation
      • 6.29.1 Simple sampling
      • 6.29.2 Validation set approach
      • 6.29.3 Leave-one-out cross-validation (LOOCV)
      • 6.29.4 k-Fold Cross-Validation
      • 6.29.5 Comparing models
    • 6.30 Classifier performance & fairness (1): False positives & negatives
    • 6.31 Classifier performance & fairness (2)
    • 6.32 Lab: Classifier performance & fairness
      • 6.32.1 Initial evaluation of the COMPAS scores
      • 6.32.2 False positives negatives and correct classification
      • 6.32.3 Altering the threshold
      • 6.32.4 Adjusting thresholds
    • 6.33 Other ML methods: Quick overview
    • 6.34 Trade-Off: Prediction Accuracy vs. Model Interpretability
    • 6.35 References
  • 7 Machine learning: Intro to Deep learning
    • 7.1 Machine learning
    • 7.2 Evaluating a machine learning model
    • 7.3 Artificial, machine and deep learning
    • 7.4 Classical ML: What it does (1.1.3)
    • 7.5 Classical ML: What it does (1.1.3)
    • 7.6 The ‘deep’ in deep learning
    • 7.7 Understanding deep learning in three figures
    • 7.8 Classical ML vs. Deep learning
    • 7.9 Keras
  • 8 Machine learning: APIs
    • 8.1 ML APIs: An overview
    • 8.2 Example

Computational Social Science: Theory & Application

7.2 Evaluating a machine learning model

  • https://www.jeremyjordan.me/evaluating-a-machine-learning-model/