Forum Moderators: open

Message Too Old, No Replies

is there a more useful search tool?

         

vabtz

6:59 pm on Oct 6, 2004 (gmt 0)



the search tool on this site is simply awful. Is there ever going to be a more useful one added?

photon

7:28 pm on Oct 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On Google: "site:webmasterworld.com putyoursearchtermshere" (works for any given site crawled by Google).

Also, you can add the functionality of the above directly onto this forum: [webmasterworld.com...] (msg #9).

Be careful with the code though--you can mess things up [webmasterworld.com].

Brett_Tabke

2:55 pm on Oct 7, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> Is there ever going to be a more useful one added?

Any fresh ideas to put forth?

se requirements:

index: 1+ million pages (scalable to 5m pages)
update once a night.
handle dynamic environment.
moderate box requirements
scalable to 1mil views a day in traffic.

Currently, there is only 1 public product I know that can handle such a system.

vabtz

3:40 pm on Oct 7, 2004 (gmt 0)



Thats infinitly larger than anything I have ever dealt with and I am sure that any suggestions I could come up with have been made already.

shri

6:42 am on Oct 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Brett,

Why not get Google to sponsor / donate one of their search appliances? Or does that turn into a conflict of interest? Or have they stopped selling them?

I've also had very good luck with mnogo (search dot mnogo dot ru) and am also looking at nutch.org for a small (about 1TB) local search / archive engine.

Finally, not sure what your architecture is like, so this may be a completely incorrect recommendation. Have you taken a look at mysql's fulltext ( [dev.mysql.com...] ) capablities for implementing something that goes through various databases witn the system and reports the matches?

Please don't tell me you use grep at the moment. ;)

whoisgregg

7:56 pm on Oct 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or does that turn into a conflict of interest?

I realize you asked this of Brett directly, but I'd like to add I remember it brought up by some members that they would be uncomfortable with having any particular search engine's hands in the backend here.

Think about the privacy implications for sticky mail and the supporters forum... :(

Of course I can't find the thread -- Google site search's for "webmasterworld site search" isn't too useful as the terms are on every page. ;)

bcolflesh

8:25 pm on Oct 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe Brett is holding out for Google to offer him an appliance? He makes mention of the cost factor in a previous thread:

[webmasterworld.com...]

mfishy

9:19 pm on Oct 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<<The only current off-the-shelf search that would work here would be a Google search appliance at over $20k.>>

Fork it over Brett...what is that, 5 extra Pubcon sponsors :)

shri

4:59 am on Oct 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> privacy implications for sticky mail and the supporters forum...

None. It will still follow robots.txt exclusions AND will not be able to access cookied content.

musicales

3:30 pm on Oct 11, 2004 (gmt 0)

10+ Year Member



Are we to assume from this that the site dynamically generates complete static pages and that there is no database storage of individual messages? As far as I understand most of the main database apps would be able to handle 1 million plus full text search without a great deal of difficulty. If however you are effectively having to search 1 million plus static html pages then I can sure see the problem, but would equally be surprised if this was the way things were set up.

If I was faced with the latter situation I think I would probably create a program to port the static pages into a db and then search them from there. Would be kind of an exciting challenge I think!

whoisgregg

11:14 pm on Oct 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> privacy implications for sticky mail and the supporters forum...

None. It will still follow robots.txt exclusions AND will not be able to access cookied content.

I agree that's how it should work. But privacy concerns are never based on the best situation -- it's based in the "what if?" scenarios. If everything worked out as expected, privacy would never be any concern to anyone.

papamaku

9:06 am on Oct 13, 2004 (gmt 0)

10+ Year Member



Surely Nutch is the obvious choice - its open source + i'm sure will need less than google's $20k worth of hardware.