Forum Moderators: coopster

Message Too Old, No Replies

Lookbehind & Lookahead [Information]

Improve your regular expressions with these two gems

         

vincevincevince

12:45 pm on Aug 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two infrequently used parts of the perl compatible regular expression syntax are lookbehind and lookahead, which are actually some of the most useful regex resources. They enable you to test the preceding or following characters without consuming them in the match.

Lookahead takes the form (?=X)
Lookbehind takes the form (?<=X)

How are they used?

$text="The big hairy dog loves eating minced meat. The interesting parrot hates eating minced meat.";

Suppose we want to change minced meat for chocolate, but only where the animal hates minced meat:

preg_replace("/(?<=hates eating )minced meat/","chocolate",$text);

So, we lookbehind and only match when it's preceded by loves eating, but we won't be changing loves eating at all in our replacement, because it's a lookbehind.

$text="16541265413285 50-10-06 Mr Joe Blogs. 62511265413285 50-10-06 Mr John Doe.";

Now the task is to change John Doe's account number to 13516512165715 with the same sort code. We start with making a regex to locate a valid account number and sort code for Mr John Doe:

"/[0-9]{14}\s[0-9]{2}-[0-9]{2}-[0-9]{2}\sMr\sJohn\Doe\./"
(14 numbers, a space, 2 numbers, a hyphen, 2 numbers, a hyphen, two numbers, a space, then "Mr John Doe" exactly.

We need to match all these things to ensure we don't accidently get the wrong record and mess up another person's account details. Without using a lookahead, we will have to use this:

preg_replace("/[0-9]{14}\s([0-9]{2})-([0-9]{2})-([0-9]{2})\sMr\sJohn\Doe\./","13516512165715 $1-$2-$3 Mr John Doe",$text);

This is not only long, it's higly inefficient.

Using a lookahead we can do:

preg_replace("/[0-9]{14}(?=\s[0-9]{2}-[0-9]{2}-[0-9]{2}\sMr\sJohn\Doe\.)/","13516512165715",$text);

So, we've not had to do back-references, we've avoided having to repeat information in our statement, and the code is a lot easier to understand.

It may be helpful to know that! gives you a NOT, ie:

"/sex (?!chocolate)/" will only match sex if it is not followed by chocolate.

likewise:

"/(?<!sex) smoking/" will match instances of smoking only if they are not preceded by sex.

Notes:
This one reason why you must escape < and > when used in regex.
This is especially useful when you want to trim text to obtain a nice sized snippet to display that ends with a . but not were the . is followed by numbers (ie don't break in the middle of $3.99).
Most uses of this can be mimiced using backreferences, but only at the cost of often quite substantial processor resources, and has advantages above backreferences in terms of the once only pattern matching.

References:
PHP manual for perl compatible pattern syntax [uk2.php.net...]

<edit reason: small mistake>

[edited by: vincevincevince at 1:34 pm (utc) on Aug. 8, 2003]

Paul in South Africa

1:05 pm on Aug 8, 2003 (gmt 0)

10+ Year Member



Thank you! An simple and clearly explained solution to a problem that I have been trying to solve for the past 2 weeks.

I'll admit that the closest I get to a regular expression is "A beer please" when I get to the pub after work.