17 Text data

Text is a frequent type of data — and so familiar to us, that we hardly notice it as a type of data. This changes when we are trying to manipulate or analyze it in a quantitative fashion. Although most people are familiar with reading and writing text on a computer, far fewer people are thinking about encoding and parsing text data.

Note

This chapter is yet to be written, but Chapter 9: Text data of Data Science for Psychologists (Neth, 2023a) provides an overview of its topic.

17.1 Introduction

17.2 Essentials

Basic text-manipulation issues and functions:

17.3 Advanced text-manipulation

Advanced text-manipulation involves using more advanced text-manipulation functions of the stringr package (Wickham, 2022) and regular expressions:

17.4 Conclusion

17.4.1 Summary

17.4.2 Resources

Take a look at the Posit cheatsheet on stringr:

Text and string manipulation with stringr and regular expressions from Posit cheatsheets.

Figure 17.1: Text and string manipulation with stringr and regular expressions from Posit cheatsheets.

See additional links at Section 9.8 of the ds4psy book (Neth, 2023a).

17.4.3 Preview

This chapter provided an in-depth focus on text (aka. strings or characters) as a particular data type.

17.5 Exercises

i2ds: Exercises

Here are six exercises from 9.7: Exercises of the ds4psy book (Neth, 2023a):

17.5.1 Escaping into Unicode

17.5.2 Pasting vectors

17.5.3 Searching color names

17.5.4 Patterns in pi

17.5.5 Naive cryptography

17.5.6 Known unknowns