Seminar: Digitalisierung, Künstliche Intelligenz und Demokratie

Digital revolution

Q: Ever heard of the digitalization/digital revolution/information age?
= Third Industrial Revolution
“historical period that began in the mid-20th century to the early 21st century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on information technology” (Wikipedia)
Most information is stored in digital form today…
- …although I still have some paper folders at home!
“The World’s Technological Capacity to Store, Communicate, and Compute Information” (Hilbert and López 2011)¹

Global Information Storage Capacity

Source: https://en.wikipedia.org/wiki/Big_data

The Internet: Physical structure

“global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries a vast range of information resources and services, such as the inter-linked hypertext documents and applications of the World Wide Web (WWW), electronic mail, telephony, and file sharing.” (Wikipedia)
There is a physical structure behind it!
- Homework: Study/read through this infographic.
We have come a long way: The Sound of dial-up Internet
Q: How many people in Germany/India/China/U.S. used internet in last three months (2020)?

The Internet: Access (1)

The Internet: Access (2)

Insights

Strong inequality across countries/continents (2020)
- Don’t forget inequality within populations!
Path-dependency: Certain countries may skip evolutionary steps
Research occurs within a historical context of technology adoption (e.g., social media, dating apps)
Findings may not generalize from one geographic context to another (e.g., adoption rates)
Stunning how quickly we adapt to technology, e.g., smartphone and forget the past
Different generations socialized into different technology

Adoption of different technologies

Q: When did different technologies affect societies? (U.S. data)

Platform usage (1): Social Media Adoption

Q: Why is it hard to estimate social media platform users?

Platform usage (2): Social Media Adoption (Barchart)

Q: Anything surprising for you in this graph?

Platform usage (3): Social Media Adoption (Barchart)

Insights

Platforms may come and go… (e.g., Myspace, Vine)
Platform companies (e.g., Google, Facebook, etc.) invest in AI
Usage depends very much on age/cohort!

See, e.g., Our world in data, Statista for more statistics

Big data & new data (1)

Big Data Characteristics: The three V’s (see Graph on next slide)
- High volume: As a result of the increasing number of data-collection instruments
- High velocity: New information is being added at high rates
- Great variety: Various sources create a heterogeneous set of data points
Ten characteristics (Salganik 2017): Big; Always-on; Non-reactive; Incomplete; Inaccessible; Nonrepresentative; Drifting; Algorithmically confounded; Dirty; Sensitive
Designed data vs. Found data

Big data & new data (2)

Bad graph but highlights the idea!

Big data & new data (3) (Entwisle and Elias 2013)

Big data & new data (4)

Exercise: What data can reveal about you…

Look at the picture below (Bern!). What does it show? What can we learn/measure with such data?

Insights

Geo-locations + time stamps of individuals among most powerful data
Data may have non-obvious use cases (Mayer-Schönberger and Cukier 2012, 82, 86) (e.g. traffic prediction)
Inventors/data collectors themselves might not foresee the positive or negative potential of their data
Visualization strongly affects what you see (with/without lines)
Indirect data & identification
Database better memory than myself!

Whatsapp (and others)

What data are they collecting about you?
End-to-end encryption since ~2016 (used system by Open Whisper Systems → Signal)
Informationen die wir erheben
- Some more background from Wired
…better use Signal?

Systematische Fragen

Was für Daten werden gesammelt? (z.B. Bewegungsdaten, Social media posts, Surfverhalten)
Welche Akteure speichern die Daten und wo werden die Daten gespeichert? (z.B. Google + Google Cloud; Regierung Bawü)
Für welche Ziele wollen diese Akteure die Daten nutzen?
Wie sehen die Akteure selbst aus? (z.B. wer entscheidet bei Google?)

SML in the social sciences (1)

Supervised machine learning (SML): Focuses on prediction problems
- Goal: Predict $Y_{i}$ using $X_{i}$
- Approach
  - Estimate a model on a data subset (training data)
  - This model has not seen test data outputs $\color{#984ea3}{Y_{i}}$ before!
  - Test this model’s predictive accuracy in another subset (test data);
  - If accurate enough, use this model to predict missing data (e.g., ? in Table 1)

Insights

Q: Assume we want to predict life satisfaction. What are the features in the table above?
Q: Where does the training data come from? Do we always have the outputs/outcome readily available?
Q: What is training data you are using? Or missing data you want to predict?
Missing data could be future observations, but also observations that are missing in a dataset we already collected, i.e., missing data imputation simply predicts missing datapoints in a dataset

Table 1: Dataset/sample

$\text{Unit i} \quad$	$Name \quad$	$X1_{i}^{Age} \quad$	$X2_{i}^{Educ.} \quad$	$D_{i}^{Unempl.} \quad$	$Y_{i}^{Lifesat.} \quad$
1	Sofia	29	1	1	3
2	Sara	30	2	1	2
3	José	28	0	0	5
4	Yiwei	27	2	1	?
5	Julia	25	0	0	6
6	Hans	23	0	1	?
..	..	..	..	..	..
1000	Hugo	23	1	0	8

SML in the social sciences (2)

Methods & models: Linear/logistic regression, Penalized regression, classification and regression trees, nearest neighbor, neural networks/deep learning
Social science examples: Recidivism (Dressel & Farid 2018), deadly conflict (Cederman & Weidmann 2017), divorce (Heyman et al. 2001), mental health (Chancellor & De Choudhury 2020), poverty/wealth (Blumenstock 2015), unemployment (Sundsøy et al. 2017), sentiment (Martínez-Cámara et al. 2014, Bauer & Clemm 2021), vote shares/elections (Stoetzer et al. 2019)
Salganik et al. (2020): “Fragile Families Challenge”
- Asked 160 teams to built predictive models for life outcomes [Material hardship, GPA, Grit, Eviction, Job training, Layoff]
- “no one made very accurate predictions”
SML can be used to predict both missing observations in a dataset, e.g., in Table 1, but also to forecast future observations
- In latter case we would add a variable $\text{time }T$ to our dataset

UML in the social sciences

Unsupervised machine learning (UML): Methods for finding patterns in data
- Goal: Classify $Y_{i}$ , e.g., $Y_{i}^{Lifesat.}$ into groups that are similar
  - $Y_{i}$ are often texts, images, audio snippets, videos
  - e.g, groups of people with similar life satisfaction
- Approach: Use model to find lower dimensional representation of $Y_{i}$ (sometimes using $X_{i}$ )
  - No training, i.e., data-splitting necessary
Methods & models: Principal component, factor- , cluster-, latent class and sequence analysis; topic modelling; community detection
Examples: Find topics in… newspaper articles (Barberà et al. 2021), open-ended responses (Bauer et al. 2017), academic publications (McFarland et al. 2013), ted talks (Schwemmer & Jungkunz 2019), media discourses (DiMaggio et al. 2013), state documents (Mohr et al. 2013), tweets (Dahal et al. 2019, Bauer); Community detection.. twitter botnets (Lingam et al. 2020)
General insights: Social scientists that apply ML are still rare! Distinction between SML and UML sometimes blurry (e.g., pretrained BERT models)

Today’s Presentations/Texts

Helbing, D. et al. (2019). Will democracy survive big data and artificial intelligence?. Towards digital enlightenment: Essays on the dark and light sides of the digital revolution, 73-98.
Jungherr, Andreas, Gonzalo Rivero, and Daniel Gayo-Avello. 2020. “Data in Politics.” In Retooling Politics: How Digital Media Are Shaping Democracy, 179–211. Cambridge University Press. (Chapter 7 - 7.5)

Seminar:Digitalisierung, Künstliche Intelligenz und Demokratie

Seminar:
Digitalisierung, Künstliche Intelligenz und Demokratie