Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Creating/Setting up a web crawler or spider

How to do it, what to use, any suggestions?



9:31 pm on Aug 19, 2005 (gmt 0)

10+ Year Member

I am trying to set up a simple webcrawler that goes out and crawls specific urls. Could someone point me in the right direction, I have never configured or built a crawler before. I could use a nudge in the right direction. I found some stuff on WGET but I am not sure if that is the right approach. Any suggestions would be helpful in my quest for information on creating or configuring one. Thanks!


1:05 am on Aug 20, 2005 (gmt 0)

10+ Year Member

[chat11.com ]

This website offers an exhaustive number of links regarding spider development (including some free spider implmentations).

I'd recommend that you consider a PHP solution if this is one of your first projects - you'll find tons of helpful documentation at:

[us3.php.net ]
[webreference.com ]


11:00 pm on Aug 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Search CPAN for LWP::UserAgent it should have everything you need to set up a simple crawler.


4:27 pm on Aug 21, 2005 (gmt 0)

10+ Year Member

There are two indespensible books by O'Reilly... LWP & Perl, and Spidering hacks.

The first talks about building spiders and parsing pages. The second really gets into the specifics of getting information out of pages.

Others have mentioned using PHP and LWP. I really recommend WWW::Mechanize. It's based on LWP, so you can always revert back to it, but it adds a bunch of things to easily follow links and submit forms.



6:56 am on Aug 22, 2005 (gmt 0)

In addition to LWP::UserAgent, but sure to also check out LWP::Simple & the LWP cookbook.



12:31 am on Sep 4, 2005 (gmt 0)

10+ Year Member

Check out LWP::RobotUA

See some other posts: [webmasterworld.com...]

Great book: ISBN 0-596-00313-7


Featured Threads

Hot Threads This Week

Hot Threads This Month