Forum Moderators: coopster & phranque

Message Too Old, No Replies

LWP - error code 500

Head request returns error for valid URL

         

mark_roach

7:53 pm on Nov 9, 2004 (gmt 0)

10+ Year Member



I have a simple script which checks the outbound links on my site are still valid. The relevant bits of the code are as follows:


use LWP::UserAgent;
use URI::URL;

$ua = new LWP::UserAgent;
$ua->agent("Myagent");
$ua->timeout(30);

$URL = "http://".$$fields{website};

# Create a request
my $req = new HTTP::Request HEAD => $URL;

# Pass request to the user agent and get a response back
$res = $ua->request($req);

# Check the outcome of the response
if($res->is_success ¦ $res->code() eq '405')
{
......
}
else
{
print LOG $$fields{id},' : ',$URL;
print LOG " - FAILED. Error : ",$res->code()," - ",$res->message(),"\n";
$link_checked=$$fields{link_checked}+1;
}

This works well for the majority of links, however for certain URLs I get errors even when the link is valid.
For example:

zhttp://www.site1.co.uk - FAILED. Error : 500 - Can't connect to www.site1.co.uk:80 (connect: Invalid argument)
zhttp://www.site2.co.uk - FAILED. Error : 500 - Can't connect to www.site2.co.uk:80 (Bad hostname 'www.site2.co.uk')

(z added to delink URLs)

The sites it fails on all appear to ones which are framed, however there are plenty of other similarly framed sites which work ok.

I notice also that the error message mentions port 80, but can see no mention of how to specify an alternate port in the LWP documentation to see if this would make a difference.

Any suggestions gratefully received.

Mark

<please sticky the member for specific url examples if needed> - jatar_k

[edited by: jatar_k at 6:04 pm (utc) on Nov. 10, 2004]
[edit reason] removed urls [/edit]

moltar

8:56 pm on Nov 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The script seems to be fine, but few things to check:

Are you sure that

$$fields{website}
is always just a domain name? And not prefixed with http already? So that when you join it with protocol name you get http;//http://www.example.com?

Maybe some of the addresses have https and cannot be accessed thru http?

Does it happen on the same URLs all the time?

mark_roach

9:32 pm on Nov 11, 2004 (gmt 0)

10+ Year Member



I strip the http from any urls prior to entering them into the database and I have not come across any that use https, so I am quite happy that the URLs are being constructed correctly.

In the main it is same sites that consistently produce the error , however sometimes a site may fail on one run and pass on the next. To put the numbers into context I would imagine less than 1% of valid sites fail the test, however it is frustrating to know that I am removing valid sites from my directory.

Is there anyone out there with similar link checking software that could try a couple of my dodgy URLs for me?