Welcome to WebmasterWorld Guest from 54.167.155.147

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Creating a web spider - What language? Perl, Python, PHP?

   
5:08 pm on Jul 14, 2005 (gmt 0)

10+ Year Member



I am creating a web spider to gather statistical data from vendors' web pages who don't have a data feed.

The gathered data will go into a MySQL database.

Am I better off going with PERL, Python, or PHP

Custodian

5:18 pm on Jul 14, 2005 (gmt 0)

10+ Year Member



My preference is Perl with WWW::Mechanize, HTML::TreeBuilder, and HTML::TokeParser. Between O'Reilly's Spidering Hacks and the LWP&Perl book, that's all you need to know.

Sean

5:21 pm on Jul 14, 2005 (gmt 0)

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member



if you know all 3 languages I definitely agree with Sean, if you have more experience in one of them then it may just be easier to write it in that.
5:43 pm on Jul 14, 2005 (gmt 0)

10+ Year Member



Yesterday I bought both the O'Reilly's Spidering Hacks book and the LWP&Perl book, hoping they would help get me going in the right direction.

Hopefully I'm off to a good start.

I have had some experience in both PHP and Perl, but have never looked at Python. I've just heard that it has some powerful web crawling capabilities.

Thanks for the input.
I'd appreciate others' experience as well

Custodian

6:05 pm on Jul 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perl can do a lot for ya! If the job is not extremely complex, I suggest to look at wget and Perl combination. wget has a good crawler already built in. It can feed the data via pipe to a perl script. You can use the perl script just for parsing (what it's best at). Everyone is happy :). Builing a good crawler is a lot of work in any language.