Forum Moderators: coopster & phranque

Message Too Old, No Replies

parsing strings in perl

help-a-novice question

         

amznVibe

5:00 am on Sep 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am so confused why this doesn't work. Anyone wanna help?
I successfully retrieved a page via perl, but I can't parse a chunk of data out of it.

If I do this, for between the title tags, it works fine:

 if ($c =~ /<title>(.*?)<\/title>/) { $out = $1; }

above WORKS but if I try to do it for head or body, it doesnt return anything?

 if ($c =~ /<body>(.*?)<\/body>/) { $out = $1; }

above does NOT work
(and yes, the body or head tags are there and in lowercase, exact match)

Does this have anything to do with the linefeeds/newlines in the html?

Thanks for any assistance!

amznVibe

5:44 am on Sep 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, after about an hour of reading I finally understood the answer...
I have to add /sm at the end to allow newlines, etc.
if ($c =~ /<body>(.*?)<\/body>/sm) { $out = $1; } 

regex is so powerful but sooo hard

timster

7:16 pm on Sep 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That may work, but the syntax is a little zany. The "s" and "m" there are basically opposites, and it looks like (fortunately) "s" is winning the argument. (But it sure makes it tricky to read.)

s - Treat the string as a Single line (i.e., ignore line breaks)
m - Treat the string as Multiple lines

So I'd write it:

$out = $1 if ($c =~ /<body>(.*?)<\/body>/s);

Yes Regex is tricky. It's basically a language of its own.