10.1 Text as Data

  • Many sources of text data for social scientists:
    • open ended survey responses, social media data, interview transcripts, news articles, official documents (public records, etc.), research publications, etc.
  • even if data of interest does not exist in textual form (yet): tools of speech recognition and machine translation, crowdworkers, etc.
  • previously: text data was often ignored, selectively read, anecdotally used or manually labeled by researchers
  • today: wide variety of text analytically methods (supervised + unsupervised) and increasing adoption of these methods by social scientists (Wilkerson and Casas 2017)


Wilkerson, John, and Andreu Casas. 2017. Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges.” Annu. Rev. Polit. Sci. 20 (1): 529–44.