Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regex for verifying a link

problems finding class=external

         

Marcello

6:40 am on Apr 22, 2005 (gmt 0)

10+ Year Member



Hi,

I am using a snippet of code in a reciprocal-link script that detects (using LWP) if the page still has a link to my site.

Here is the code snippet:
foreach (split(m!\<A!i, $text)) {
next unless (m!^([^\>]*)HREF(\s+)?=(\s+)?\"?([^\"\s\>]+)!i);
$ThisLink = $4;
$reciprocal = "no";
if ($ThisLink =~ /^http\:\/\// ){
if ($ThisLink =~ /^http\:\/\/www\.widget\.com/ ){$reciprocal = "yes";}
}
} # END foreach

This above code works good to detect the reciprocal link

BUT Some webmasters afterwards add the "class=external" code, so that the link becomes of No PR value.

I want to detect if the word "external" is present in the <a>...</a> link and therefor have changed my code, BUT it does not work.

Changed code:
foreach (split(m!\<A!i, $text)) {
next unless (m!^([^\>]*)HREF(\s+)?=(\s+)?\"?([^\"\s\>]+)!i);
$ThisLink = $4;
$reciprocal = "no";
if ($ThisLink =~ /^http\:\/\// ){
if ($ThisLink =~ /^http\:\/\/www\.widget\.com/ ){
if ($ThisLink =~ m/external/i){$reciprocal = "external";}
else {$reciprocal = "yes";}
}
}
} # END foreach

Any help would be appreciated

Marcello

7:56 am on Apr 25, 2005 (gmt 0)

10+ Year Member



Sorry, but I am bumping this one a little.
I'm really stuck here
Thanks

timster

6:37 pm on Apr 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It looks like you've set up your $ThisLink variable to capture just the string of the URL, e.g,
http://www.widget.com
.

Try changing:

if ($ThisLink =~ m/external/i){$reciprocal = "external";}

to:

if (m/external/i){$reciprocal = "external";}

Still, that looks like just part of the problem. Should

$reciprocal = "no";
be moved above the foreach?