Forum Moderators: coopster

Message Too Old, No Replies

Regular Expression question

How to capture everything between tags with RegEx

         

Kadence

9:19 pm on Mar 21, 2004 (gmt 0)

10+ Year Member



What regular expression should I use to capture everything between tags? e.g., given: "<tag> blah blah blah </tag>", I want to capture the " blah blah blah ".

Using preg_match_all('/<tag>.*</tag>/') doesn't work. It skips over the first instance of </tag>, and stops at the second.

I've tried using '/<tag>([[:alnum:][:blank:][:punct:]]*)<\/tag>/' but that also doesn't work. Adding the [:punct:] results in the same thing that using .* does--skipping over the first instance of </tag> and stopping at the second. I guess this is because it counts the closing tag </ as punctuation, but then why does it stop at the 2nd instance of the tag?

I'm currently using the code;


$match=array();
$html = 'some html code';
if($num=preg_match_all('/<tag>([a-zA-Z0-9<i><\/i>\s\.\-\',]*)<\/tag>/', $html, $match)){
echo "$num matches were found <br>";
}

This works mostly, and does stop at the first instance of </tag>, but it doesn't always capture everything.

Any help would be appreciated :)

PCInk

9:48 pm on Mar 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.* only matches anything

.*? matches the shortest possible string (of anything)

<a>.*</a> may match several <a> tags before stopping at </a>

<a>.*?</a> will stop at the closest </a> to the <a> as it is the shortest string

ergophobe

10:43 pm on Mar 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




.* only matches anything

Specifically, it is "greedy". By default, regex (posix or pcre) are greedy, meaning they grab the largest block satisfying a condition. Look in the manual for the pcre regular expressions regarding switches for greedy/ungreedy

Tom

Kadence

11:30 pm on Mar 21, 2004 (gmt 0)

10+ Year Member



Wow thanks a lot! .*? has been very useful to me. It works great!