Preface
1
Introduction
1.1
Script & Material
1.2
About me
1.3
Who are you?
1.4
Content & Objectives
1.5
Overview of some readings
1.6
Tools and software we use
1.6.1
R: Why use it?
1.6.2
R: Where/how to study?
1.6.3
R: Installation and setup
1.6.4
Datacamp
1.6.5
Google Cloud
1.7
Descriptive inference, causal inference & prediction
1.7.1
Descriptive questions
1.7.2
Causal questions
1.7.3
Prediction
1.8
The digital revolution
1.9
How does the internet work? (+ access)
1.10
Technology adoption: United States
1.11
Platform usage (1): Social Media Adoption
1.12
Platform usage (2): Social Media Adoption (Barchart)
1.13
Platform usage (3): Social Networking Young
1.14
Platform usage (4): Daily hours digital media
1.15
What is Computational Social science (CSS)?
1.16
CSS: Challenges for Social Scientists
1.17
Exercise: What data can reveal about you…
1.18
Exercise: Documentaries by Deutsche Welle
1.19
X-Exercise: Farecast and Google Flu
1.20
X-Exercise: Big Data is not about the data
1.21
X-Exercise: Download your data!
1.22
Presentations: Example Barbera (2015)
1.23
Good practices in data analysis (X)
1.23.1
Reproducibility & Replicability
1.23.2
Reproducibility & Replicability
1.23.3
Why reproducability?
1.23.4
Reproducability: My current approach
1.23.5
Reproducability in practice
1.24
References
2
Big data & new data sources (1)
2.1
For a starter
2.2
What is Big Data?
2.3
Big data: Quotes for a start
2.4
Big data: Definitions
2.5
Big data: The Vs
2.6
Big data: Analog age vs. digital age (1)
2.7
Big data: Analog age vs. digital age (2)
2.8
Big data: Repurposing
2.9
Presentations
2.10
Exercise: Ten common characteristics of big data (Salganik 2017)
2.11
New forms of data: Overview
2.12
Where can we find big data sources data?
2.13
References
3
Big data & new data sources (2)
3.1
Presentations
3.2
Example: Salience of issues
3.3
Google trends: Caveats
3.4
Data security & ethics (1): What might happen?
3.5
Data security & ethics (2): Yes…
3.6
Data security & ethics (3): Protection
3.7
Data: Size & dimensions & creation
3.8
Data: How is it stored?
3.9
Data & Databases
3.10
R Database Packages
3.11
SQL: Intro
3.12
SQL: Components of a query
3.13
Lab: Working with a SQL database
3.13.1
Creating an SQL database
3.13.2
Querying an SQL database
3.13.3
Querying multiple SQL tables
3.13.4
Grouping and aggregating
3.14
Exercise: SQL database
3.15
SQL at scale: Strategy
3.16
SQL at scale: Google BigQuery
3.17
Lab (Skip!): Setting up GCP research credits
3.17.1
Google Translation API
3.18
Lab: Google Big Query
3.19
Exercise: Setting up & querying Google BigQuery
3.20
References
4
Data collection: APIs
4.1
Web APIs
4.2
API = Application Programming Interface
4.3
Why APIs?
4.4
Scraping: Decisions, decisions…
4.5
Types of APIs
4.6
Some APIs
4.7
R packages
4.8
(Reponse) Formats: JSON
4.9
(Reponse) Formats: XML
4.10
Authentication: Key vs. Oauth
4.11
Connect to API: Example
4.12
Lab: Scraping data from APIs
4.13
Exercise: Scraping data from APIs
4.13.1
Homework: APIs for social scientists
4.14
X-Lab: Clarify API
4.15
X-Twitter’s APIs
4.16
X-Lab: Twitter’s streaming API
4.16.1
Authenticating
4.16.2
Collecting data from Twitter’s Streaming API
4.17
X-Exercise: Twitter’s streaming API
4.18
X-Lab: Twitter’s REST API
4.18.1
Searching recent tweets
4.18.2
Extracting users’ profile information
4.18.3
Building friend and follower networks
4.18.4
Estimating ideology based on Twitter networks
4.18.5
Other types of data
4.19
X-Exercise: Twitter’s REST API
5
Data collection: Web scraping
5.1
Web scraping: Basics
5.1.1
Scraping data from websites: Why?
5.1.2
Scraping the web: two approaches
5.1.3
The rules of the game
5.1.4
The art of web scraping
5.2
Screen (Web) scraping
5.2.1
Scenarios
5.2.2
HTML: a primer
5.2.3
HTML: a primer
5.2.4
Beyond HTML
5.2.5
Parsing HTML code
5.3
Lab: Scraping tables
5.4
Exercise: Scraping tables
5.5
Lab: Scraping (more) tables
5.6
Exercise: Scraping (more) tables
5.7
Exercise 2: Scraping (more) tables
5.8
Lab: Scraping unstructured data
5.9
Exercise: Scraping unstructured data
5.10
Scrape dynamic webpages: Selenium
5.11
Lab: Scraping web data behind web forms
5.12
Exercise: Scraping web data behind web forms
5.13
RSS: Scraping newspaper websites
5.14
Lab: Scraping newspaper website
Computational Social Science: Theory & Application
Session 5
Data collection: Web scraping