A.4 Backreferences
Backreferences are used to overcome the problem that one match has no knowledge of its previous match, appearing as a pair of a subexpression and a \number
referencing to that subexpression.
Find all repeated words (often typos):
text <- "This is a block of of text, several words here are are repeated, and and they should not be."
str_view_all(text, "(\\w+) \\1")
Another example with html data where we want to match all normal header tags, note that the last pair <h2>...<h3>
is invalid:
text <- "<BODY>
<H1>Welcome to my Homepage</H1>
Content is divided into two sections:<BR>
<H2>ColdFusion</H2>
Information about Macromedia ColdFusion.
<H2>Wireless</H2>
Information about Bluetooth, 802.11, and more.
<H2>This is not valid HTML</H3>
</BODY>"
str_extract_all(text, "<[Hh](\\d)>.+</[Hh]\\1>")
#> [[1]]
#> [1] "<H1>Welcome to my Homepage</H1>" "<H2>ColdFusion</H2>"
#> [3] "<H2>Wireless</H2>"
Backreferences is particularly useful when performing replace operations.