15 Text data

Text is a frequent type of data — and so familiar to us, that we hardly notice it as a type of data. This changes when we are trying to manipulate or analyze it in a quantitative fashion. Although most people are familiar with reading and writing text on a computer, far fewer people are thinking about encoding and parsing text data.

Note

This chapter is yet to be written, but see Chapter 9: Text data of Data Science for Psychologists (Neth, 2023a).

15.1 Introduction

See Section 9.1 of the ds4psy book (Neth, 2023a).

15.2 Essentials

See Sections 9.2, 9.3 and 9.4 of the ds4psy book (Neth, 2023a).

15.3 Conclusion

See some example applications in Section 9.5 of the ds4psy book (Neth, 2023a).

15.3.1 Summary

See Section 9.6 of the ds4psy book (Neth, 2023a).

15.3.2 Resources

Take a look at the Posit cheatsheet on stringr:

Text and string manipulation with stringr and regular expressions from Posit cheatsheets.

Figure 15.1: Text and string manipulation with stringr and regular expressions from Posit cheatsheets.

See additional links at Section 9.8 of the ds4psy book (Neth, 2023a).

15.3.3 Preview

This chapter provided an in-depth focus on text (aka. strings or characters) as a particular data type.

15.4 Exercises

i2ds: Exercises

Here are six exercises from 9.7: Exercises of the ds4psy book (Neth, 2023a):

15.4.1 Escaping into Unicode

15.4.2 Pasting vectors

15.4.3 Searching color names

15.4.4 Patterns in pi

15.4.5 Naive cryptography

15.4.6 Known unknowns