Welcome to WebmasterWorld Guest from 54.159.250.110

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

parse html response using perl

   
5:54 am on Jun 10, 2009 (gmt 0)

5+ Year Member



I need to parse html response using perl. Could anyone please help
5:58 am on Jun 10, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



use HTML::TreeBuilder [search.cpan.org]
12:17 pm on Jun 10, 2009 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



you could start with this to make a request and handle the response:
#!/usr/local/bin/perl

use LWP::UserAgent;
use HTTP::Request;

my $agent = LWP::UserAgent->new(env_proxy => 1,keep_alive => 1, timeout => 30);
my $url = "http://example.com/";
my $header = HTTP::Request->new(GET => $url);
my $request = HTTP::Request->new('GET', $url, $header);
my $response = $agent->request($request);

# Check the outcome of the response
if ($response->is_success){

# parse your response here

}elsif ($response->is_error){
print $response->error_as_HTML;
}

1:06 pm on Jun 10, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I believe,
my $header = HTTP::Request->new(GET => $url);

should be
my $header = HTTP::Headers->new();

copy & pasted? ;)

and, to add, the HTML in phranque's example will be available in $response->content in the block where you'll parse it.

12:56 pm on Jun 11, 2009 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



yes, check that:
#!/usr/local/bin/perl

use LWP::UserAgent;
use HTTP::Request;

my $agent = LWP::UserAgent->new(env_proxy => 1,keep_alive => 1, timeout => 30);
my $url = "http://example.com/";
my $request = HTTP::Request->new('GET', $url);
my $response = $agent->request($request);

# Check the outcome of the response
if ($response->is_success){

# parse $response->content here

}elsif ($response->is_error){
print $response->error_as_HTML;
}