Forum Moderators: open
Also, you can add the functionality of the above directly onto this forum: [webmasterworld.com...] (msg #9).
Be careful with the code though--you can mess things up [webmasterworld.com].
Any fresh ideas to put forth?
se requirements:
index: 1+ million pages (scalable to 5m pages)
update once a night.
handle dynamic environment.
moderate box requirements
scalable to 1mil views a day in traffic.
Currently, there is only 1 public product I know that can handle such a system.
Why not get Google to sponsor / donate one of their search appliances? Or does that turn into a conflict of interest? Or have they stopped selling them?
I've also had very good luck with mnogo (search dot mnogo dot ru) and am also looking at nutch.org for a small (about 1TB) local search / archive engine.
Finally, not sure what your architecture is like, so this may be a completely incorrect recommendation. Have you taken a look at mysql's fulltext ( [dev.mysql.com...] ) capablities for implementing something that goes through various databases witn the system and reports the matches?
Please don't tell me you use grep at the moment. ;)
Or does that turn into a conflict of interest?
I realize you asked this of Brett directly, but I'd like to add I remember it brought up by some members that they would be uncomfortable with having any particular search engine's hands in the backend here.
Think about the privacy implications for sticky mail and the supporters forum... :(
Of course I can't find the thread -- Google site search's for "webmasterworld site search" isn't too useful as the terms are on every page. ;)
[webmasterworld.com...]
If I was faced with the latter situation I think I would probably create a program to port the static pages into a db and then search them from there. Would be kind of an exciting challenge I think!
>> privacy implications for sticky mail and the supporters forum...None. It will still follow robots.txt exclusions AND will not be able to access cookied content.
I agree that's how it should work. But privacy concerns are never based on the best situation -- it's based in the "what if?" scenarios. If everything worked out as expected, privacy would never be any concern to anyone.