Forum Moderators: coopster

Message Too Old, No Replies

preg_match_all

         

MediumDave

7:58 pm on Apr 22, 2005 (gmt 0)

10+ Year Member



Can any body tell me how to extract multiple lines of text from a file, using regular expressions?

I would like to extract the text between two headings in an HTML file but can only extract one line at time from the HTML file.

For example:

preg_match_all("/<h2>(.*)<h2>/i", $htmlFile, $result)

doesn't work because the <h2> tags are on different lines in the HTML file. Can anybody tell me how to fix it?

Thanks.

Tom

killroy

8:50 pm on Apr 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Add the m modifier to the i modifier u already use:

m (PCRE_MULTILINE)
By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl.

When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

And make it lazy too:

preg_match_all("/<h2>(.*?)(?=<h2>)/im", $htmlFile, $result)

SN

MediumDave

9:15 pm on Apr 22, 2005 (gmt 0)

10+ Year Member



Thank you, I'll try that.

MediumDave

10:54 pm on Apr 22, 2005 (gmt 0)

10+ Year Member



I tried the line of code:

preg_match_all("/<h2>(.*?)(?=<h2>)/im", $htmlFile, $result)

but did not get any output when I tried to 'echo' an element of $result.

The text I want to extract is spread over more than 2 lines, does this make any difference? Can the above code be applied to expressions that are spread over several lines, rather than just two?

I can get it to work when the expression is on one line only, but not when it's on more than one.

coopster

2:29 am on Apr 27, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld, MediumDave.

I use the "s" modifier in instances such as these.


s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Should get you over the hump anyway -- coopster