E Using regular expressions

ds4psy: Regular expressions (regex) primer

Regular expressions (aka. regex) are character sequences that define a search pattern. In theoretical computer science and formal language theory, such patterns are used for validating text inputs and for finding or replacing patterns in strings of text.

Many R commands involving character data (e.g., the base R functions grep() and strsplit(), and most of the stringr functions discussed in Chapter 9 on Text data) support the use of regular expressions. While regular expressions can be immensely powerful and time-saving tools, their abstract nature and formal appearance often seem scary and intimidating. For instance, given a vector dinos that contains the 10 character strings

two moderately cryptic grep() and str_view() commands

would each find the following results:

To provide a glimpse into the potential of regular expressions without requiring too much formal overhead, this appendix provides a gentle introduction into using regular expressions in R.