homepage Welcome to WebmasterWorld Guest from 54.234.0.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Alternative Search Engines
Forum Library, Charter, Moderators: bakedjake

Alternative Search Engines Forum

    
How Is clustering Worked Out?
zootreeves




msg:460952
 1:03 pm on Nov 6, 2005 (gmt 0)

I want to add clustering (Vivisimo style) to the search engine I'm developing at the moment. I've spent a while trying to figure out how to do it, but I havn't got a decent way yet..

I was wondering whether any search engine owners (I know some visit this forum) had done this type of thing before, how was it done?

 

Lord Majestic




msg:460953
 4:47 pm on Nov 6, 2005 (gmt 0)

A better question is whether people actually care about clustering that much - I certainly don't and it seems tp me that apparent lack of success in current most well known clustering search engines indicates that people prefer Google style interface so long as matches are relevant.

bostonBeans




msg:460954
 1:34 pm on Nov 7, 2005 (gmt 0)

In my experience, clustering is more valuable in enterprise search applications than in consumer search applications. Not sure who you are building your engine for, but as Lord points out, if you are building a consumer application your efforts may be better served elsewhere.

As for how it is done, I am not familiar with the technical aspects, but from a higher level, clustering is just identifying commonality. I have seen (enterprise) clustering done on things as simple as the file system directory and on things as abstract as 'topical themes' (like Vivisimo). Interestingly enough, the simpler solutions seemed to get more usage.

Sorry I don't have any answers for you, but hopefully this gets your thoughts rolling again.

-bB

ByronM




msg:460955
 1:55 pm on Nov 7, 2005 (gmt 0)

While google may not cluster the results per say as others visually do, they're probably doing it behind the scenes.

After running my search engine for the past 2 years and looking at the query history it's a little scary how bad people search. Very rarely are people searching for an answer to something vs searching for everything about something.

For example a lot of people just search for "boat" "car" "mortgage" and simple terms like that. Clustering on by default either by sorting the results in your serps or presenting them graphically would save these people lots of time by creating categories of results that match what they're looking for.

BTW, google isn't hard to compete against because they don't graphically cluster, they're just so branded into everyones mind and huge as a business that the little search engines that do more than advertising and marketing technologies don't get much "face time"

Lord Majestic




msg:460956
 3:04 pm on Nov 7, 2005 (gmt 0)

google isn't hard to compete against

Send your resume/CV to billg @ microsoft.com immediately :)

Perhaps clustering may help, however people who cluster do not use their own search engines, they are in effect just presenting other search engine's output differently and I see no point to use some site that will produce similar results to the search engine that I use anyway.

People got used to simple text searching - it is very hard to abandon habits and this is why clustering search engines have big problem of acceptance.

I'd prefer search engine to index twice as many pages rather than cluster existing.

zootreeves




msg:460957
 8:21 pm on Nov 7, 2005 (gmt 0)

Ok thanks for all your replies. I guess you are right, not many people actually use the clustered links.

I guess my time would be better spent trying to think of an idea thats actaully useful... back to the drawing board...

ByronM




msg:460958
 2:24 pm on Nov 8, 2005 (gmt 0)


Send your resume/CV to billg @ microsoft.com immediately :)

Are you a politician?

Google does cluster, they just don't represent it graphically. The algo they run and the process that happens behind the scenes clusters the results to build a standard output page.

I said google isn't hard to compete against because they do or don't "visually" cluster, but because they're a behemouth of a brand ;)

Clustering, Stemming, Ontollogy are all parts of the semantics used to create an understandable & searchable index of the web. While google may minimize the impact of these technologies i'm betting it does happen.

Lord Majestic




msg:460959
 2:31 pm on Nov 8, 2005 (gmt 0)

Are you a politician?

Not sure about this - when I make promises I firmly intend to keep them, depending on where you live it may or may not be integral part of politicians ;)

Google sure clusters, but not necesserily clusters by topic -- they cluster by domain for fast site: searches, they cluster data by similiarity, but do they cluster by page topic? I don't think so - there were a few recent examples of split screens that seem like coming from different topics, but this is done (now) very rarely. I am actually in favour of that type of mild clustering where its matters, ie when good matches fall into very distinct topics.

ByronM




msg:460960
 2:54 pm on Nov 8, 2005 (gmt 0)

Not sure about this -

Just asking because you were good at cherrypicking my comment hehe


when I make promises I firmly intend to keep them, depending on where you live it may or may not be integral part of politicians ;)

good man!


Google sure clusters, but not necesserily clusters by topic -- they cluster by domain for fast site: searches, they cluster data by similiarity, but do they cluster by page topic? I don't think so - there were a few recent examples of split screens that seem like coming from different topics, but this is done (now) very rarely. I am actually in favour of that type of mild clustering where its matters, ie when good matches fall into very distinct topics.

I'm not sure how that differes from visually representing these as clusters of the search vs pretending they're "natural" serps.

While mozdex isn't anywhere near the scale of google yet, try a search with clustered enabled and let us know how/why that would deterr from the overall experience.

It would be just as easy to remove the side display and show them as natural results expanded from the query if we wanted to as well.

But yes.. clustering won't really make or break someone these days. It's really a technology inherant of a large search engine and almost unknown to most users. However some people like to be visually navigated through related searches and that is where some of the niche engines really excell.

Lord Majestic




msg:460961
 4:50 pm on Nov 8, 2005 (gmt 0)

cherrypicking my comment

ehehe, a politician has to be careful about making comments ;)

I'd say mozdex clustering is pretty reasonable - its an addition to the main listing and your results pages are pretty clean, I'd say well done!

I suppose the reason clustering is not picking up is due to most people using non-clustering search engine and clustering alone is not good enough, especially considering that many clustering search engines are merely meta-searchers (not mozdex).

Clustering is nice, but I'd rather index twice as much data or improve anti-spam algorithms: clustering or not, but if the stuff I want is not in Top 10 then I am going to rephrase my query, it is faster to type it for me than to click mouse on one of the clustered groups.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Alternative Search Engines
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved