Chapter 24 Text Analysis and Text Mining
24.1 Introduction
Most “data analysis” has been focussed on quantitative methods, but in recent years text analysis and text mining methods have been developed in concert with natural language processing (NLP).
24.2 Theory and methods
Shawn Graham, Ian Milligan, and Scott Weingart (2015) “Topic Modeling: A Hands-On Adventure in Big Data”, chapter four of (Shawn Graham and Weingart 2015)
- See the companion website, Exploring Big Historical Data: The Historian’s Macroscope
24.3 R
The definitive guide
Julia Silge and David Robinson (2016) Tidy Text Mining with R {most recent version dated 2016-12-19}
other general resources
Super User, 2019-02-04, An overview of the NLP ecosystem in R (#nlproc #textasdata)
24.3.1 Some examples
David Robinson (2016) “Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half” (2016-08-09)
Julia Silge (2016) “You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing” (2016-03-08)
24.3.2 Packages
24.3.2.1 CRAN Task View: NLP
CRAN Task View: Natural Language Processing
- “This CRAN task view collects relevant R packages that support computational linguists in conducting analysis of speech and language on a variety of levels - setting focus on words, syntax, semantics, and pragmatics.”
24.3.2.2 {monkeylearn}
package
CRAN page: monkeylearn: Accesses the Monkeylearn API for Text Classifiers and Extractors
github page: monkeylearn: R package for text analysis with Monkeylearn
articles
24.3.2.3 {quanteda}
package
CRAN page: quanteda: Quantitative Analysis of Textual Data
articles
“Getting Started with quanteda
” (package vignette)
24.3.2.4 {tidytext}
package
CRAN page: tidytext: Text Mining using ‘dplyr’, ‘ggplot2’, and Other Tidy Tools
github repo: tidytext on github
articles
Julia Silge, “Term Frequency and tf-idf Using Tidy Data Principles”, 2016-06-27
Julia Silge and David Robinson (2016-10-27) “Introduction to tidytext
” (package vignette)
24.3.2.5 {tm}
package
CRAN page: tm: Text Mining Package
articles
Inigo Feinerer (2015) “Introduction to the tm
Package: Text Mining in R” (package vignette)
Ingo Feinerer, Kurt Hornik, David Meyer (2007) “Text Mining Infrastructure in R”, Journal of Statistical Software, 25 (5).