homepage Welcome to WebmasterWorld Guest from 54.237.78.165
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Question - haven't done this in a while
Perl, script
lak12

10+ Year Member



 
Msg#: 4075677 posted 5:12 am on Feb 7, 2010 (gmt 0)

Dear fellows!
I must be real tired or something that I am missing. I am trying to figure-out why in the world this would not match:

#!/usr/bin/perl -w
$n = 'some test here whatever...
<a href="read.cgi?do.htm"><img src="images/431.png" border=0 width=109 height=250</a>';
$n =~ s/(<a )(.*?)(<\/a>)/$&/; $r = "$&";
$n =~ s/$r/IMAGE2/s;


the whole links with image would not match in $n
Any ideas?
Thanks!

[edited by: phranque at 6:29 am (utc) on Feb 7, 2010]
[edit reason] disabled graphic smileys ;) [/edit]

 

janharders

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4075677 posted 7:04 pm on Feb 8, 2010 (gmt 0)

well, you'd need to escape the content of $r ... otherwise, the ? will get you in trouble in s/// ...
$r = quotemeta("$&");
should work fine ...

vit1



 
Msg#: 4075677 posted 8:42 pm on Feb 10, 2010 (gmt 0)

/*
s/(<a )(.*?)(<\/a>)/$&/;
*/
should be s/(<a )(.*)?(<\/a>)/$&/;

[edited by: phranque at 6:51 am (utc) on Feb 11, 2010]
[edit reason] disabled graphic smileys ;) [/edit]

janharders

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4075677 posted 2:18 am on Feb 11, 2010 (gmt 0)


should be s/(<a )(.*)?(<\/a>)/$&/;

no. apart from the fact that you cannot quantify backtracking matches (at least not like that), in this case, it wouldn't even make sense: .* means any character 0 or more times, and putting the ? outside of that would (if it was possible) mean to match "any char 0 or more times" 0 or one time (as that's pretty much what ? means ... bbba?bbb matches bbbbbb and bbbabbb, but not bbbcbbb), while, if it's put into the brackets, it makes the unlimited * ungreedy, basically saying "match any character 0 or more times, but as few times as possible", which, in this case, is necessary, because the .* would happily match the ending </a> etc.

in general: unless you're hacking stuff together for a quick fix, it's usually not the best idea to operate on markup languages like html or xml with regexps. not only is it hard to match what you want, you cannot define complex patterns which you could easily define with something like HTML::TreeBuilder [search.cpan.org], which will offer you look_down where you could simply look for all a-nodes that contain a b-node and a img-node which, itself, has a src matching a certain url-pattern.
regexps on html are able to fix easy problems, but are generally a bad idea, because they tend to break stuff 5 months from now when nobody remembers they're in effect.

[edited by: phranque at 6:53 am (utc) on Feb 11, 2010]
[edit reason] disabled graphic smileys ;) [/edit]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved