homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

Creating a web spider - What language? Perl, Python, PHP?

 5:08 pm on Jul 14, 2005 (gmt 0)

I am creating a web spider to gather statistical data from vendors' web pages who don't have a data feed.

The gathered data will go into a MySQL database.

Am I better off going with PERL, Python, or PHP




 5:18 pm on Jul 14, 2005 (gmt 0)

My preference is Perl with WWW::Mechanize, HTML::TreeBuilder, and HTML::TokeParser. Between O'Reilly's Spidering Hacks and the LWP&Perl book, that's all you need to know.



 5:21 pm on Jul 14, 2005 (gmt 0)

if you know all 3 languages I definitely agree with Sean, if you have more experience in one of them then it may just be easier to write it in that.


 5:43 pm on Jul 14, 2005 (gmt 0)

Yesterday I bought both the O'Reilly's Spidering Hacks book and the LWP&Perl book, hoping they would help get me going in the right direction.

Hopefully I'm off to a good start.

I have had some experience in both PHP and Perl, but have never looked at Python. I've just heard that it has some powerful web crawling capabilities.

Thanks for the input.
I'd appreciate others' experience as well



 6:05 pm on Jul 14, 2005 (gmt 0)

Perl can do a lot for ya! If the job is not extremely complex, I suggest to look at wget and Perl combination. wget has a good crawler already built in. It can feed the data via pipe to a perl script. You can use the perl script just for parsing (what it's best at). Everyone is happy :). Builing a good crawler is a lot of work in any language.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved