homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

Creating/Setting up a web crawler or spider
How to do it, what to use, any suggestions?

 9:31 pm on Aug 19, 2005 (gmt 0)

I am trying to set up a simple webcrawler that goes out and crawls specific urls. Could someone point me in the right direction, I have never configured or built a crawler before. I could use a nudge in the right direction. I found some stuff on WGET but I am not sure if that is the right approach. Any suggestions would be helpful in my quest for information on creating or configuring one. Thanks!



 1:05 am on Aug 20, 2005 (gmt 0)

[chat11.com ]

This website offers an exhaustive number of links regarding spider development (including some free spider implmentations).

I'd recommend that you consider a PHP solution if this is one of your first projects - you'll find tons of helpful documentation at:

[us3.php.net ]
[webreference.com ]


 11:00 pm on Aug 20, 2005 (gmt 0)

Search CPAN for LWP::UserAgent it should have everything you need to set up a simple crawler.


 4:27 pm on Aug 21, 2005 (gmt 0)

There are two indespensible books by O'Reilly... LWP & Perl, and Spidering hacks.

The first talks about building spiders and parsing pages. The second really gets into the specifics of getting information out of pages.

Others have mentioned using PHP and LWP. I really recommend WWW::Mechanize. It's based on LWP, so you can always revert back to it, but it adds a bunch of things to easily follow links and submit forms.



 6:56 am on Aug 22, 2005 (gmt 0)

In addition to LWP::UserAgent, but sure to also check out LWP::Simple & the LWP cookbook.



 12:31 am on Sep 4, 2005 (gmt 0)

Check out LWP::RobotUA

See some other posts: [webmasterworld.com...]

Great book: ISBN 0-596-00313-7

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved