7.1 Text as Data

  • Many sources of text data for social scientists:

    • open ended survey responses, social media data, interview transcripts, electronic health records, news articles, official documents (laws, regulations, etc.), research publications, digital trace data, etc.
  • even if data of interest does not exist in textual form (yet): tools of speech recognition and machine translation, crowdworkers, etc.

  • previously: text data was often ignored, selectively read and used anecdotally or manually labeled by researchers

  • now: wide variety of text analytical methods (supervised + unsupervised) and increasing adoption of these methods from social scientists