Forum Moderators: phranque

Message Too Old, No Replies

Rss Spider

How to...

         

johnjameson

12:14 pm on Sep 25, 2006 (gmt 0)

10+ Year Member



Hello

I want to start a blog search engine.
And the purpose of this searchn engine will be to bring RSS feed and Tags 2gether, just like syndic8.

For this i need an RSS spider who'll look for RSS feeds all by himself.
Is this Java or php?
Is it free somewhere or i have to buy the solution?

Can anyone help me?

Thanks in advance!

gbulmash

9:10 pm on Sep 28, 2006 (gmt 0)

10+ Year Member



The spider can be in just about any language. PHP, Java, Perl, Python Ruby, C++... The best one to use will depend on the platform on which you're planning to use it. Most importantly, whatever you do, I'd suggest making sure that it's a respectful spider that honors the robots.txt conventions. If not, you may have some of the best blogs blocking your spider or feeding it garbage.

There are free ones, ones you have to buy. But really, the spider is just the first part of the formula. Once the spider collects all the data, you need to construct an efficient database schema, a parser to insert all the data in the database, a backend to access the database, and a frontend to interact with the users.... and that's just scratching the surface. There's a lot of other logic you'll need to work out.

If you feel up to the task, consider looking at some of the open-source spiders and spider frameworks available on sourceforge.net as a starting point.

Cheers,

Greg