homepage Welcome to WebmasterWorld Guest from 54.145.243.51
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Creating/Setting up a web crawler or spider
How to do it, what to use, any suggestions?
firedancer5

5+ Year Member



 
Msg#: 4209 posted 9:31 pm on Aug 19, 2005 (gmt 0)

I am trying to set up a simple webcrawler that goes out and crawls specific urls. Could someone point me in the right direction, I have never configured or built a crawler before. I could use a nudge in the right direction. I found some stuff on WGET but I am not sure if that is the right approach. Any suggestions would be helpful in my quest for information on creating or configuring one. Thanks!

 

dlefree

5+ Year Member



 
Msg#: 4209 posted 1:05 am on Aug 20, 2005 (gmt 0)

[chat11.com ]

This website offers an exhaustive number of links regarding spider development (including some free spider implmentations).

I'd recommend that you consider a PHP solution if this is one of your first projects - you'll find tons of helpful documentation at:

[us3.php.net ]
[webreference.com ]

lexipixel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4209 posted 11:00 pm on Aug 20, 2005 (gmt 0)

Search CPAN for LWP::UserAgent it should have everything you need to set up a simple crawler.

SeanW

10+ Year Member



 
Msg#: 4209 posted 4:27 pm on Aug 21, 2005 (gmt 0)

There are two indespensible books by O'Reilly... LWP & Perl, and Spidering hacks.

The first talks about building spiders and parsing pages. The second really gets into the specifics of getting information out of pages.

Others have mentioned using PHP and LWP. I really recommend WWW::Mechanize. It's based on LWP, so you can always revert back to it, but it adds a bunch of things to easily follow links and submit forms.

Sean

wdr1



 
Msg#: 4209 posted 6:56 am on Aug 22, 2005 (gmt 0)

In addition to LWP::UserAgent, but sure to also check out LWP::Simple & the LWP cookbook.

[search.cpan.org...]
[search.cpan.org...]

Stuart_S

5+ Year Member



 
Msg#: 4209 posted 12:31 am on Sep 4, 2005 (gmt 0)

Check out LWP::RobotUA

See some other posts: [webmasterworld.com...]
[webmasterworld.com...]

Great book: ISBN 0-596-00313-7

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved