Forum Moderators: coopster
Lookahead takes the form (?=X)
Lookbehind takes the form (?<=X)
How are they used?
$text="The big hairy dog loves eating minced meat. The interesting parrot hates eating minced meat.";
Suppose we want to change minced meat for chocolate, but only where the animal hates minced meat:
preg_replace("/(?<=hates eating )minced meat/","chocolate",$text);
So, we lookbehind and only match when it's preceded by loves eating, but we won't be changing loves eating at all in our replacement, because it's a lookbehind.
$text="16541265413285 50-10-06 Mr Joe Blogs. 62511265413285 50-10-06 Mr John Doe.";
Now the task is to change John Doe's account number to 13516512165715 with the same sort code. We start with making a regex to locate a valid account number and sort code for Mr John Doe:
"/[0-9]{14}\s[0-9]{2}-[0-9]{2}-[0-9]{2}\sMr\sJohn\Doe\./"
(14 numbers, a space, 2 numbers, a hyphen, 2 numbers, a hyphen, two numbers, a space, then "Mr John Doe" exactly.
We need to match all these things to ensure we don't accidently get the wrong record and mess up another person's account details. Without using a lookahead, we will have to use this:
preg_replace("/[0-9]{14}\s([0-9]{2})-([0-9]{2})-([0-9]{2})\sMr\sJohn\Doe\./","13516512165715 $1-$2-$3 Mr John Doe",$text);
This is not only long, it's higly inefficient.
Using a lookahead we can do:
preg_replace("/[0-9]{14}(?=\s[0-9]{2}-[0-9]{2}-[0-9]{2}\sMr\sJohn\Doe\.)/","13516512165715",$text);
So, we've not had to do back-references, we've avoided having to repeat information in our statement, and the code is a lot easier to understand.
It may be helpful to know that! gives you a NOT, ie:
"/sex (?!chocolate)/" will only match sex if it is not followed by chocolate.
likewise:
"/(?<!sex) smoking/" will match instances of smoking only if they are not preceded by sex.
Notes:
This one reason why you must escape < and > when used in regex.
This is especially useful when you want to trim text to obtain a nice sized snippet to display that ends with a . but not were the . is followed by numbers (ie don't break in the middle of $3.99).
Most uses of this can be mimiced using backreferences, but only at the cost of often quite substantial processor resources, and has advantages above backreferences in terms of the once only pattern matching.
References:
PHP manual for perl compatible pattern syntax [uk2.php.net...]
<edit reason: small mistake>
[edited by: vincevincevince at 1:34 pm (utc) on Aug. 8, 2003]