homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

Perl for retrieving information from a webpage (javascript/html)
Trying to find the best set of tools to retrieve the information in webpage

Msg#: 4677772 posted 4:22 pm on Jun 5, 2014 (gmt 0)


I am kind of new in Perl but I would like to know what would you advise me to use in order to retrieve information allocated in a couple of webpages.

Given [HTML] tables [on web pages] what would be the best way to automatically access the pages, retrieve those tables, and manipulate them? Given that they are both in HTML, would it be enough to just parse the tables?

I know I'm asking this in a Perl forum but if MySQL would be more fitting, please let me know.

Note: I am not asking for you to do any coding, I'm just looking for advice on packages, tools and whatnot that I may use for this.

Thank you very much!

[edited by: coopster at 1:34 pm (utc) on Jun 9, 2014]
[edit reason] no site specifics please and thank you! [/edit]



WebmasterWorld Administrator coopster us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 4677772 posted 1:37 pm on Jun 9, 2014 (gmt 0)

Welcome to WebmasterWorld, sosippus.

If the pages are on the same server then you merely use the file open, read and write APIs available.

If the pages are on an external server then the most popular tool is cURL. You can open your own sockets and read write from them as well but the cURL library is quite extensive and a handy tool.

As far as parsing the tables you may want to investigate the "tidy" library.


WebmasterWorld Senior Member 10+ Year Member

Msg#: 4677772 posted 8:35 pm on Aug 14, 2014 (gmt 0)

Use Mojo::UserAgent and/or Mojo::DOM. For example (untested, but see the links below for more info)

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $rows $ua->get('www.foo.com')->res->dom->find('table tr');

my $r = shift;
print $r->find('td')->pluck('text')->join("\t")."\n";



Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved