Chapter 24 Text Analysis and Text Mining

24.1 Introduction

Most “data analysis” has been focussed on quantitative methods, but in recent years text analysis and text mining methods have been developed in concert with natural language processing (NLP).

24.2 Theory and methods

Shawn Graham, Ian Milligan, and Scott Weingart (2015) “Topic Modeling: A Hands-On Adventure in Big Data”, chapter four of (Shawn Graham and Weingart 2015)

See the companion website, Exploring Big Historical Data: The Historian’s Macroscope

24.3 R

The definitive guide

Julia Silge and David Robinson (2016) Tidy Text Mining with R {most recent version dated 2016-12-19}

other general resources

Super User, 2019-02-04, An overview of the NLP ecosystem in R (#nlproc #textasdata)

24.3.1 Some examples

David Robinson (2016) “Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half” (2016-08-09)

Julia Silge (2016) “You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing” (2016-03-08)

24.3.2 Packages

24.3.2.1 CRAN Task View: NLP

CRAN Task View: Natural Language Processing

“This CRAN task view collects relevant R packages that support computational linguists in conducting analysis of speech and language on a variety of levels - setting focus on words, syntax, semantics, and pragmatics.”

24.3.2.2 `{monkeylearn}`

package

CRAN page: monkeylearn: Accesses the Monkeylearn API for Text Classifiers and Extractors

github page: monkeylearn: R package for text analysis with Monkeylearn

articles

24.3.2.3 `{quanteda}`

quanteda.io

package

CRAN page: quanteda: Quantitative Analysis of Textual Data

articles

“Getting Started with quanteda” (package vignette)

24.3.2.4 `{tidytext}`

package

CRAN page: tidytext: Text Mining using ‘dplyr’, ‘ggplot2’, and Other Tidy Tools

github repo: tidytext on github

articles

Julia Silge, “Term Frequency and tf-idf Using Tidy Data Principles”, 2016-06-27

Julia Silge and David Robinson (2016-10-27) “Introduction to tidytext” (package vignette)

24.3.2.5 `{tm}`

package

CRAN page: tm: Text Mining Package

articles

Inigo Feinerer (2015) “Introduction to the tm Package: Text Mining in R” (package vignette)

Ingo Feinerer, Kurt Hornik, David Meyer (2007) “Text Mining Infrastructure in R”, Journal of Statistical Software, 25 (5).

References

Shawn Graham, Ian Milligan, and Scott Weingart. 2015. Exploring Big Historical Data: The Historian’s Macroscope. Imperial College Press. http://www.themacroscope.org/2.0/.

Chapter 24 Text Analysis and Text Mining

24.1 Introduction

24.2 Theory and methods

24.3 R

24.3.1 Some examples

24.3.2 Packages

24.3.2.1 CRAN Task View: NLP

24.3.2.2 {monkeylearn}

24.3.2.3 {quanteda}

24.3.2.4 {tidytext}

24.3.2.5 {tm}

References

24.3.2.2 `{monkeylearn}`

24.3.2.3 `{quanteda}`

24.3.2.4 `{tidytext}`

24.3.2.5 `{tm}`