A.3 Looking ahead and back

Lookahead specifies a pattern to be matched but not returned. A lookahead is actually a subexpression and is formatted as such. The syntax for a lookahead pattern is a subexpression preceded by ?=, and the text to match follows the = sign. Some refer to this behaviour as “match but not consume”, in the sense that lookhead and lookahead match a pattern after/before what we actually want to extract, but do not return it.

In the following example, we only want to matcch “my homepage” that followed by a </title>, and we do not want </title> in the results

Similarly, ?<= is interpreted as the lookback operator, which specifies a pattern before the text we actually want to extract. Following is an example. A database search lists products, and you need only the prices.

Following is an example. A database search lists products, and you need only the prices.

ookahead and lookbehind operations may be combined, as in the following example

Additionally, (?=) and (?<=) are known as positive lookahead and lookback. A lesser used version is the negative form of those two operators, looking for text that does not match the specified pattern.

class description
(?=) positive lookahead
(?!) negative lookahead
(?<=) positive lookbehind
(?<!) negative lookbehind

Suppose we want to extract just the quantities but not the prices in the followin text:

  • I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order.