Week 4: Online and text data

For the final week, we will be looking at another type of data that is becoming increasingly commonly used (especially given the growing availability of large text corpuses on Twitter, Facebook, Google Books etc.). This is text data. Increasingly, these data come from online sources and are being used to answer questions of social scientific relevance. When online and other digitally sourced data are repurposed to these ends, we often refer to them more generally as “digital trace” data. For an effective review of recent trends in this fast moving domain, see the articles by Lazer et al. (2020) and Edelmann et al. (2020). For a brilliant book-length introduction to this new domain of social research, look no further than the book by Salganik (2018).

We will be looking at two main case study articles. The first, by Nielsen (2019), does not involve much online material but does take its source material from a set of texts sourced online from Salafi preachers to argue that social movements, even the more patriarchal among them, might make use of female preachers for pragmatic reasons.

The paper by Pan and Siegel (2020) makes use of Twitter data to gauge the impact of repression on online behaviour in Saudi Arabia. For both papers, it is worth considering questions of sample selection, language translation, and the accuracy of some forms of unsupervised machine learning techniques. These are discussed in the three general readings by Grimmer and Stewart (2013), DiMaggio (2015), and Lucas et al. (2015).

The further readings by Siegel and Badaan (2020) and Kubinec and Owen (2021) demonstrate how we construct pretty elaborate research designs from these online materials. Indeed, coming back full circle, we see that we are able effectively to recreate an experimental setting using online data and effectively measure attitudes using digital trace data alone.

We will also be able to use a portion of this week and my office hours afterward to discuss any questions or concerns you may have regarding the assessment.

Questions to consider in the seminar: Does social media data constitute a valid sample? How do we verify the validity of our sample when it comes to text data more generally? What types of inferences can we make from these datasets? Can we rely on the automatic coding of text data? How might text data be useful for questions of historical and social scientific interest?

Required reading:

General reading:

  • DiMaggio (2015)
  • Edelmann et al. (2020)
  • Grimmer and Stewart (2013)
  • Lazer et al. (2020)
  • Lucas et al. (2015)
  • Salganik (2018)

Additional case studies reading:

  • Siegel and Badaan (2020)
  • Kubinec and Owen (2021)

References

DiMaggio, Paul. 2015. “Adapting Computational Text Analysis to Social Science (and Vice Versa).” Big Data & Society 2 (2): 1–5.

Edelmann, Achim, Tom Wolff, Danielle Montagne, and Christopher A. Bail. 2020. “Computational Social Science and Sociology.” Annual Review of Sociology 46 (1): 61–81. https://doi.org/10.1146/annurev-soc-121919-054621.

Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (03): 267–97.

Kubinec, Robert, and John Owen. 2021. “When Groups Fall Apart: Identifying Transnational Polarization During the Arab Uprisings.” Political Communication, 36. https://doi.org/10.31235/osf.io/wykmj.

Lazer, David M. J., Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, et al. 2020. “Computational Social Science: Obstacles and Opportunities.” Science 369 (6507): 1060–2. https://doi.org/10.1126/science.aaz8170.

Lucas, Christopher, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley. 2015. “Computer-Assisted Text Analysis for Comparative Politics.” Political Analysis 23 (02): 254–77.

Nielsen, Richard A. 2019. “Women’s Authority in Patriarchal Social Movements: The Case of Female Salafi Preachers.” American Journal of Political Science, August, 1–15. https://doi.org/10.1111/ajps.12459.

Pan, Jennifer, and Alexandra A. Siegel. 2020. “How Saudi Crackdowns Fail to Silence Online Dissent.” American Political Science Review 114 (1): 109–25. https://doi.org/10.1017/S0003055419000650.

Salganik, Matthew J. 2018. Bit by Bit: Social Research in the Digital Age. Princeton: Princeton University Press.

Siegel, Alexandra A., and Vivienne Badaan. 2020. “#No2Sectarianism: Experimental Approaches to Reducing Sectarian Hate Speech Online.” American Political Science Review 114 (3): 837–55. https://doi.org/10.1017/S0003055420000283.