Welcome to WebmasterWorld Guest from 54.161.147.106

Forum Moderators: coopster & jatar k & phranque

Perl for retrieving information from a webpage (javascript/html)

Trying to find the best set of tools to retrieve the information in webpage

   
4:22 pm on Jun 5, 2014 (gmt 0)



Hi!

I am kind of new in Perl but I would like to know what would you advise me to use in order to retrieve information allocated in a couple of webpages.

Given [HTML] tables [on web pages] what would be the best way to automatically access the pages, retrieve those tables, and manipulate them? Given that they are both in HTML, would it be enough to just parse the tables?

I know I'm asking this in a Perl forum but if MySQL would be more fitting, please let me know.

Note: I am not asking for you to do any coding, I'm just looking for advice on packages, tools and whatnot that I may use for this.

Thank you very much!

[edited by: coopster at 1:34 pm (utc) on Jun 9, 2014]
[edit reason] no site specifics please and thank you! [/edit]

1:37 pm on Jun 9, 2014 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Welcome to WebmasterWorld, sosippus.

If the pages are on the same server then you merely use the file open, read and write APIs available.

If the pages are on an external server then the most popular tool is cURL. You can open your own sockets and read write from them as well but the cURL library is quite extensive and a handy tool.

As far as parsing the tables you may want to investigate the "tidy" library.
8:35 pm on Aug 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use Mojo::UserAgent and/or Mojo::DOM. For example (untested, but see the links below for more info)


use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $rows $ua->get('www.foo.com')->res->dom->find('table tr');

$rows->each(sub{
my $r = shift;
print $r->find('td')->pluck('text')->join("\t")."\n";
});



Mojo::UserAgent
[mojolicio.us...]

Mojo::DOM
[mojolicio.us...]
 

Featured Threads

Hot Threads This Week

Hot Threads This Month