Forum Moderators: coopster

Message Too Old, No Replies

Preg match pattern

Need to match inside <p> tags

         

fenimor

6:28 am on Mar 21, 2009 (gmt 0)

10+ Year Member



I need to extact text between html paragraphs. The problem is that <p> not necessarily needed to be closed by </p>. Moreover it may contain other html tags inside, like:

<p> Some text </p>
<p> More text
<p> Innter <b>tag</b> </p>

As a result I want an array where every paragraph is a an element of array

Patterns like
/<p>(.*)<\/?p>/is - will match from the first to the last tag
/<p>([^<]+)<\/?p>/is - doesn't work if there are inner html tags.

So I need something that will match from <p> till next </?p>. I'm thanking about recursive matching, but can't create a working pattern.

coopster

12:14 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Will your first pattern work if you add the "ungreedy" modifier?

rob7591

2:06 pm on Mar 21, 2009 (gmt 0)

10+ Year Member



Try what coopster said. I think you can do something like:
/<p>(.*?)<\/?p>/is (The ? after * makes it a "lazy *" so it tries to capture as few characters as possible to still match the pattern).

There's no real easy way to negate a string (or another regular expression). I had a look at the look-ahead and look-behind operators:
[regular-expressions.info...] , but I don't really think they will help you.