17 Text data
Text is a frequent type of data — and so familiar to us, that we hardly notice it as a type of data. This changes when we are trying to manipulate or analyze it in a quantitative fashion. Although most people are familiar with reading and writing text on a computer, far fewer people are thinking about encoding and parsing text data.
Note
This chapter is yet to be written, but Chapter 9: Text data of Data Science for Psychologists (Neth, 2023a) provides an overview of its topic.
17.1 Introduction
- See Section 9.1 of the ds4psy book
17.2 Essentials
Basic text-manipulation issues and functions:
- See Sections 9.2 and 9.3 of the ds4psy book
17.3 Advanced text-manipulation
Advanced text-manipulation involves using more advanced text-manipulation functions of the stringr package (Wickham, 2022) and regular expressions:
See Section 9.4 Advanced text-manipulation of the ds4psy book
See Appendix E: Using regular expressions of the ds4psy book
17.4 Conclusion
- See some example applications in Section 9.5 of the ds4psy book
17.4.1 Summary
- See Section 9.6 of the ds4psy book
17.4.2 Resources
Take a look at the Posit cheatsheet on stringr:
- Figure 17.1:
- The contributed cheatsheets also contain a fine reference on Basic Regular Expressions in R.
See additional links at Section 9.8 of the ds4psy book (Neth, 2023a).
17.5 Exercises
Here are six exercises from 9.7: Exercises of the ds4psy book (Neth, 2023a):