Welcome to WebmasterWorld Guest from 126.96.36.199 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Subscribe to WebmasterWorld
Creating a web spider - What language? Perl, Python, PHP? Custodian msg:443174 5:08 pm on Jul 14, 2005 (gmt 0) I am creating a web spider to gather statistical data from vendors' web pages who don't have a data feed.
The gathered data will go into a MySQL database.
Am I better off going with PERL, Python, or PHP
SeanW msg:443175 5:18 pm on Jul 14, 2005 (gmt 0)
My preference is Perl with WWW::Mechanize, HTML::TreeBuilder, and HTML::TokeParser. Between O'Reilly's Spidering Hacks and the LWP&Perl book, that's all you need to know.
jatar_k msg:443176 5:21 pm on Jul 14, 2005 (gmt 0)
if you know all 3 languages I definitely agree with Sean, if you have more experience in one of them then it may just be easier to write it in that. Custodian msg:443177 5:43 pm on Jul 14, 2005 (gmt 0)
Yesterday I bought both the O'Reilly's Spidering Hacks book and the LWP&Perl book, hoping they would help get me going in the right direction.
Hopefully I'm off to a good start.
I have had some experience in both PHP and Perl, but have never looked at Python. I've just heard that it has some powerful web crawling capabilities.
Thanks for the input.
I'd appreciate others' experience as well
moltar msg:443178 6:05 pm on Jul 14, 2005 (gmt 0)
Perl can do a lot for ya! If the job is not extremely complex, I suggest to look at wget and Perl combination. wget has a good crawler already built in. It can feed the data via pipe to a perl script. You can use the perl script just for parsing (what it's best at). Everyone is happy :). Builing a good crawler is a lot of work in any language.