Forum Moderators: phranque

Message Too Old, No Replies

Freeware crawlers

are there any....

         

futureX

10:47 pm on Apr 9, 2003 (gmt 0)

10+ Year Member



I only ask because I would like to set up a search engine/portal of my own, which will only crawl spcified sites (sites that for arguments sake deal with widgets). Is there any software I can download to spider sites that is free, or at least doesnt cost a packet?

I would like something that could give relevant keyword results at least. :)

jeremy goodrich

10:58 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



moved post

I would start by doing a search for 'gnu search engine' or similar, you are bound to find many, many open source projects out there.

I've used a few, any one of them will do, or perhaps none, depending on what features you want.

HTdig is a pretty common / well known open source search engine, and there are many others written in Perl, Php, C, and C++. Even some in Java that I've seen.

Without knowing the specifics of your project, it's pretty tough to say, "use this one" or "try that one" :)

figment88

11:21 pm on Apr 9, 2003 (gmt 0)

futureX

2:10 am on Apr 10, 2003 (gmt 0)

10+ Year Member



thanks guys, I know i wasnt very specific, but i basically want to run the spider from my computer (which may require an apache install) and build an index of widget news articles / reviews that can be searched.

One that can index php would be good too :)

I think I have a long task ahead of me, as I want to integrate a directory into it too.

I'll go search and try what i find :)

jeremy goodrich

2:22 am on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are doing a directory / search combination, try Gossamer Threads. Very highly recommended, and there is a 'free' option with some small limitations.

Indexing PHP would just be indexing the HTML output of the PHP, so anything that will index HTML will also handle Perl, Cold Fusion, Asp, etc.

Though the query strings are a whole nother issue :) Good luck with it.

Oh, btw - the larbin mentioned above is very good, just not that flexible, built with C++.

PHPdig is essentially a php version of htdig, which is in C.

If you want to hack the code, etc, you'll probably want something else. RuterSearch or Ruter Search (if it's still out there...) is in Perl or PHP, and simple enough you can take a hack at it. :) Freeware, too.

In short, lots and lots of options...hence, my asking for some more specifics. :) Mangled a few scripts together to make a 'better SE' myself once. It can be fun & very addictive.

Psycho1

3:01 am on Apr 10, 2003 (gmt 0)

10+ Year Member



If you are doing a directory / search combination, try Gossamer Threads. Very highly recommended, and there is a 'free' option with some small limitations.

I just looked through Gossamer's website and I didn't spot any free version. Did you mean the Shareware download they offer? I imagine they have it set up so it stops working after so many weeks? Maybe I'm just blind and didn't see it:)

carfac

3:18 am on Apr 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Psycho:

>>> Did you mean the Shareware download they offer?

There is a version called Links that is Perl, and is free for non-commercial use. It is very powerful for a smaller type site. There is also a large community of "modders" there to help you customize it for exactly your needs.

I use Links' Daddy, the Links SQL. Much more powerful, and much faster. But it is expensive, too. If your site grows, and you find the need to upgrade, there are easy upgrade paths from the shareware to the SQL version.

I would heartily reccomend either, based on your needs.

Oh, and no, the basic Links does NOT shut down after a week or a month.

dave