Forum Moderators: coopster & phranque

Message Too Old, No Replies

perl webpage checker problem

socket, webserver, response

         

ddoyle

7:18 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



I've been trying to write a perl script that could go out and check the status of webpages (a link checker). But, I'm finding that the responses I get back from a number of webservers don't fit with what I know of how they work. When I check a page on my own server (which runs IIS 5.0 and hosts multiple web sites), I get a '404' error for a file that exists. Yet both Netscape and IE are able to pull up that page. Any ideas?

The test script is at: and the source code for this test harness is at:

I'm stumped

[edited by: jatar_k at 7:48 pm (utc) on Jan. 10, 2004]
[edit reason] no personal urls thanks [/edit]

JasonD

8:50 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



Sticky me the URLs and I'll have a look

ddoyle

9:34 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



Weird, the URLS showed up when I previewed the message, then vanished when I submitted it.

The test script is at: <snip>

[edited by: jatar_k at 4:40 am (utc) on Jan. 11, 2004]
[edit reason] as above [/edit]

SeanW

10:25 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



The code doesn't show up formatted well, but it looks like you're trying to do HTTP through IO::Socket... Have a look at LWP, it will do the HEAD call for you and parse the error properly. The O'Reilly book "LWP and Perl" is also excellent, I reviewed it here: <snip>

One of the examples in the end of the book is a link checking spider, much like what you want.

Sean

[edited by: jatar_k at 4:44 am (utc) on Jan. 11, 2004]
[edit reason] no personal urls thanks [/edit]

JasonD

10:27 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



The URLs were deleted by the moderator as no URLs are allowed in the forum. I suggest you delete them before the mod does again.

Now I have seen the code though may I suggest using LWP::UserAgent to check the status of the pages rather than IO::Socket for 2 main reasons.

#1. It is MUCH simpler
#2. It supports http 1.1 by default

The script is falling over as it is sending a 1.0 request (no problem there) but the server is sending back a 1.1 redirection which the script can't work with.

LWP version (lifted straight from http:// search. cpan.org/~gaas/libwww-perl-5.76/lib/LWP/UserAgent.pm)


require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://search.cpan.org/');

if ($response->is_success) {
print $response->content; # or whatever
}
else {
die $response->status_line;
}

ddoyle

11:03 pm on Jan 10, 2004 (gmt 0)

10+ Year Member



Thanks. I tried LWP earlier and gave up on it for some reason or another. I'll try going back to it!