Processing math: 100%
Type to search
Preface
1
Introduction
1.1
Script & Material
1.2
About me
1.3
Who are you?
1.4
Content & Objectives
1.5
Overview of some readings
1.6
Tools and software we use
1.6.1
R: Why use it?
1.6.2
R: Where/how to study?
1.6.3
R: Installation and setup
1.6.4
Datacamp
1.6.5
Google Cloud
1.7
Descriptive inference, causal inference & prediction
1.7.1
Descriptive questions
1.7.2
Causal questions
1.7.3
Prediction
1.8
The digital revolution
1.9
How does the internet work? (+ access)
1.10
Technology adoption: United States
1.11
Platform usage (1): Social Media Adoption
1.12
Platform usage (2): Social Media Adoption (Barchart)
1.13
Platform usage (3): Social Networking Young
1.14
Platform usage (4): Daily hours digital media
1.15
What is Computational Social science (CSS)?
1.16
CSS: Challenges for Social Scientists
1.17
Exercise: What data can reveal about you…
1.18
Exercise: Documentaries by Deutsche Welle
1.19
X-Exercise: Farecast and Google Flu
1.20
X-Exercise: Big Data is not about the data
1.21
X-Exercise: Download your data!
1.22
Presentations: Example Barbera (2015)
1.23
Good practices in data analysis (X)
1.23.1
Reproducibility & Replicability
1.23.2
Reproducibility & Replicability
1.23.3
Why reproducability?
1.23.4
Reproducability: My current approach
1.23.5
Reproducability in practice
1.24
References
2
Big data & new data sources (1)
2.1
For a starter
2.2
What is Big Data?
2.3
Big data: Quotes for a start
2.4
Big data: Definitions
2.5
Big data: The Vs
2.6
Big data: Analog age vs. digital age (1)
2.7
Big data: Analog age vs. digital age (2)
2.8
Big data: Repurposing
2.9
Presentations
2.10
Exercise: Ten common characteristics of big data (Salganik 2017)
2.11
New forms of data: Overview
2.12
Where can we find big data sources data?
2.13
References
3
Big data & new data sources (2)
3.1
Presentations
3.2
Example: Salience of issues
3.3
Google trends: Caveats
3.4
Data security & ethics (1): What might happen?
3.5
Data security & ethics (2): Yes…
3.6
Data security & ethics (3): Protection
3.7
Data: Size & dimensions & creation
3.8
Data: How is it stored?
3.9
Data & Databases
3.10
R Database Packages
3.11
SQL: Intro
3.12
SQL: Components of a query
3.13
Lab: Working with a SQL database
3.13.1
Creating an SQL database
3.13.2
Querying an SQL database
3.13.3
Querying multiple SQL tables
3.13.4
Grouping and aggregating
3.14
Exercise: SQL database
3.15
SQL at scale: Strategy
3.16
SQL at scale: Google BigQuery
3.17
Lab (Skip!): Setting up GCP research credits
3.17.1
Google Translation API
3.18
Lab: Google Big Query
3.19
Exercise: Setting up & querying Google BigQuery
3.20
References
4
Data collection: APIs
4.1
Web APIs
4.2
API = Application Programming Interface
4.3
Why APIs?
4.4
Scraping: Decisions, decisions…
4.5
Types of APIs
4.6
Some APIs
4.7
R packages
4.8
(Reponse) Formats: JSON
4.9
(Reponse) Formats: XML
4.10
Authentication
4.11
Connect to API: Example
4.12
Lab: Scraping data from APIs
4.13
Exercise: Scraping data from APIs
4.13.1
Homework: APIs for social scientists
4.14
X-Lab: Clarify API
4.15
X-Twitter’s APIs
4.16
X-Lab: Twitter’s streaming API
4.16.1
Authenticating
4.16.2
Collecting data from Twitter’s Streaming API
4.17
X-Exercise: Twitter’s streaming API
4.18
X-Lab: Twitter’s REST API
4.18.1
Searching recent tweets
4.18.2
Extracting users’ profile information
4.18.3
Building friend and follower networks
4.18.4
Estimating ideology based on Twitter networks
4.18.5
Other types of data
4.19
X-Exercise: Twitter’s REST API
5
Data collection: Web (screen) scraping
5.1
Web scraping: Basics
5.1.1
Scraping data from websites: Why?
5.1.2
Scraping the web: two approaches
5.1.3
The rules of the game
5.1.4
The art of web scraping
5.2
Screen (Web) scraping
5.2.1
Scenarios
5.2.2
HTML: a primer
5.2.3
HTML: a primer
5.2.4
Beyond HTML
5.2.5
Parsing HTML code
5.3
Lab: Scraping tables
5.4
Exercise: Scraping tables
5.5
Lab: Scraping (more) tables
5.6
Exercise: Scraping (more) tables
5.7
Lab: Scraping unstructured data
5.8
Exercise: Scraping unstructured data
5.9
Scrape dynamic webpages: Selenium
5.10
Lab: Scraping web data behind web forms
5.10.1
Using RSelenium
5.11
RSS: Scraping newspaper websites
5.12
Lab: Scraping newspaper website
6
Machine learning: Introduction
6.1
Classical statistics vs. machine learning
6.2
Machine learning as programming paradigm
6.3
Terminological differences (1)
6.4
Terminological differences (2)
6.5
Predicting: Mean
6.6
Predicting: Linear model (Equation) (1)
6.7
Predicting: Linear model (Equation) (2)
6.8
Predicting: Linear model (Visualization)
6.9
Predicting: Linear model (Estimation)
6.10
Predicting: Linear model (Prediction)
6.11
Exercise
6.12
Exercise: Discussion
6.13
Regression Versus Classification Problems
6.14
An Overview of Classification
6.15
Assessing Model Accuracy
6.16
The Logistic Model
6.17
Logistic Regression: Recidvism
6.18
The Logistic Model: Predictions
6.19
Classification: Predicting recidivism
6.20
Measuring classifier performance
6.21
Measuring Algorithmic Fairness
6.22
Lab: Classification: Predicting recidvism
6.22.1
Inspecting the dataset
6.22.2
Splitting the datasets
6.22.3
Comparing the scores of black and white defendants
6.22.4
Building a predictiv model
6.22.5
Comparison to COMPAS score
6.23
Exercise
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
Computational Social Science: Theory & Application
6.6
Predicting: Linear model (Equation) (1)
L
inear
M
odel =
LM
= Linear regression model
Aim
(normally): Model (also understand) relationship between
outcome variable
und 1+
explanatory variables
y
i
=
β
0
+
β
1
×
x
1
i
+
β
2
×
x
2
i
⏟
?
+
ε
i
⏟
?