Type to search
Preface
1
Introduction
1.1
Script & Material
1.2
About me
1.3
Who are you?
1.4
Content & Objectives
1.5
Overview of some readings
1.6
Tools and software we use
1.6.1
R: Why use it?
1.6.2
R: Where/how to study?
1.6.3
R: Installation and setup
1.6.4
Datacamp
1.6.5
Google Cloud
1.7
Descriptive inference, causal inference & prediction
1.7.1
Descriptive questions
1.7.2
Causal questions
1.7.3
Prediction
1.8
The digital revolution
1.9
How does the internet work? (+ access)
1.10
Technology adoption: United States
1.11
Platform usage (1): Social Media Adoption
1.12
Platform usage (2): Social Media Adoption (Barchart)
1.13
Platform usage (3): Social Networking Young
1.14
Platform usage (4): Daily hours digital media
1.15
What is Computational Social science (CSS)?
1.16
CSS: Challenges for Social Scientists
1.17
Exercise: What data can reveal about you…
1.18
Exercise: Documentaries by Deutsche Welle
1.19
X-Exercise: Farecast and Google Flu
1.20
X-Exercise: Big Data is not about the data
1.21
X-Exercise: Download your data!
1.22
Presentations: Example Barbera (2015)
1.23
Good practices in data analysis (X)
1.23.1
Reproducibility & Replicability
1.23.2
Reproducibility & Replicability
1.23.3
Why reproducability?
1.23.4
Reproducability: My current approach
1.23.5
Reproducability in practice
1.24
References
2
Big data & new data sources (1)
2.1
For a starter
2.2
What is Big Data?
2.3
Big data: Quotes for a start
2.4
Big data: Definitions
2.5
Big data: The Vs
2.6
Big data: Analog age vs. digital age (1)
2.7
Big data: Analog age vs. digital age (2)
2.8
Big data: Repurposing
2.9
Presentations
2.10
Exercise: Ten common characteristics of big data (Salganik 2017)
2.11
New forms of data: Overview
2.12
Where can we find big data sources data?
2.13
References
3
Big data & new data sources (2)
3.1
Presentations
3.2
Example: Salience of issues
3.3
Google trends: Caveats
3.4
Data security & ethics (1): What might happen?
3.5
Data security & ethics (2): Yes…
3.6
Data security & ethics (3): Protection
3.7
Data: Size & dimensions & creation
3.8
Data: How is it stored?
3.9
Data & Databases
3.10
R Database Packages
3.11
SQL: Intro
3.12
SQL: Components of a query
3.13
Lab: Working with a SQL database
3.13.1
Creating an SQL database
3.13.2
Querying an SQL database
3.13.3
Querying multiple SQL tables
3.13.4
Grouping and aggregating
3.14
Exercise: SQL database
3.15
SQL at scale: Strategy
3.16
SQL at scale: Google BigQuery
3.17
Lab (Skip!): Setting up GCP research credits
3.17.1
Google Translation API
3.18
Lab: Google Big Query
3.19
Exercise: Setting up & querying Google BigQuery
3.20
References
4
Data collection: APIs
4.1
Web APIs
4.2
API = Application Programming Interface
4.3
Why APIs?
4.4
Scraping: Decisions, decisions…
4.5
Types of APIs
4.6
Some APIs
4.7
R packages
4.8
(Reponse) Formats: JSON
4.9
(Reponse) Formats: XML
4.10
Authentication
4.11
Connect to API: Example
4.12
Lab: Scraping data from APIs
4.13
Exercise: Scraping data from APIs
4.13.1
Homework: APIs for social scientists
4.14
X-Lab: Clarify API
4.15
X-Twitter’s APIs
4.16
X-Lab: Twitter’s streaming API
4.16.1
Authenticating
4.16.2
Collecting data from Twitter’s Streaming API
4.17
X-Exercise: Twitter’s streaming API
4.18
X-Lab: Twitter’s REST API
4.18.1
Searching recent tweets
4.18.2
Extracting users’ profile information
4.18.3
Building friend and follower networks
4.18.4
Estimating ideology based on Twitter networks
4.18.5
Other types of data
4.19
X-Exercise: Twitter’s REST API
5
Data collection: Web (screen) scraping
5.1
Web scraping: Basics
5.1.1
Scraping data from websites: Why?
5.1.2
Scraping the web: two approaches
5.1.3
The rules of the game
5.1.4
The art of web scraping
5.2
Screen (Web) scraping
5.2.1
Scenarios
5.2.2
HTML: a primer
5.2.3
HTML: a primer
5.2.4
Beyond HTML
5.2.5
Parsing HTML code
5.3
Lab: Scraping tables
5.4
Exercise: Scraping tables
5.5
Lab: Scraping (more) tables
5.6
Exercise: Scraping (more) tables
5.7
Lab: Scraping unstructured data
5.8
Exercise: Scraping unstructured data
5.9
Scrape dynamic webpages: Selenium
5.10
Lab: Scraping web data behind web forms
5.10.1
Using RSelenium
5.11
Exercise: Scraping web data behind web forms
5.12
RSS: Scraping newspaper websites
5.13
Lab: Scraping newspaper website
6
Machine learning: Introduction
6.1
Classical statistics vs. machine learning
6.2
Machine learning as programming paradigm
6.3
Terminological differences (1)
6.4
Terminological differences (2)
6.5
Predicting: Mean
6.6
Predicting: Linear model (Equation)
6.7
Predicting: Linear model (Visualization)
6.8
Predicting: Linear model (Estimation)
6.9
Predicting: Linear model (Prediction)
6.10
Prediction vs. classification
6.11
Exercise
6.12
Exercise: Discussion
6.13
Regression Versus Classification Problems
6.14
An Overview of Classification
6.15
Why Not Linear Regression?
6.16
Classification: Assessing Model Accuracy
7
Logistic Regression
8
4.4.1 and 4.4.2
8.1
Classification: Predicting recidivism
8.2
Measuring classifier performance
8.3
Measuring Algorithmic Fairness
8.4
Lab: Classification: Predicting recidvism
8.4.1
Inspecting the dataset
8.4.2
Splitting the datasets
8.4.3
Part 1: Comparing the scores of black and white defendants
8.4.4
Part 2: Initial evaluation of the COMPAS scores
8.4.5
Part 3: Altering the threshold
8.4.6
Part 4: Trying to reproduce the COMPAS score
8.4.7
Part 5: Adjusting thresholds
8.5
Classifying ML algorithms
8.6
Supervised machine learning (SML)
8.7
Some SML techniques
8.8
SML social science applications
8.9
Unsupervised machine learning techniques
8.10
Machine Learning: New answers to old questions
8.11
FUTURE ISSUES
8.12
Lab: Simple examples of SML and UML
8.13
Dangers of ML
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
Computational Social Science: Theory & Application
6.15
Why Not Linear Regression?
Tibshirani 4.2