Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

Teoma Development



2:35 pm on Dec 26, 2005 (gmt 0)

10+ Year Member

Im a programmer and webmaster , Id like to find indepth information into Teoma and how it works. Does anyone know where I can find information on how teoma works, its algorithm , ranking , sourcecode etc?


8:12 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Teoma has been rather willing to share information about their algorithm -- which uses insights from Jon Kleinberg's HITS algorithm to identify "web communites". Teoma also conquered quite a technical a challenge in finding a way to give rapid results, building those community clusters on the fly. I've often wondered if the scalability of that solution isn't a major reason why AJ/Teoma hasn't taken on Google in a big way.

Mike Grehan has a very informative interview with Paul Gardi, SVP Search at Ask Jeeves/Teoma online:

At the end of the interview there's a link to a free pdf with more information about HITS and linkage based algorithms.

<fixed spelling>

[edited by: tedster at 5:12 pm (utc) on Dec. 30, 2005]


12:46 pm on Dec 27, 2005 (gmt 0)

10+ Year Member


Yes I understand they base the majority of the ranking on the HITS principle , Is there anywhere which provides information on how they managed to improve upon it?


5:56 pm on Dec 30, 2005 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

As I understand it, the main algorithm can be understood as HITS, modified by CLEVER [almaden.ibm.com], further modified by work done in the DISCOWEB [cse.lehigh.edu] project. That last link offers some good detail on the math involved, as well as further source papers in the footnotes.

I also always assumed that there was a pinch of HILLTOP [cs.toronto.edu] thrown in to limit maniplation by affiliated websites, but I can't find confirmation anywhere. Teoma's "topic distillation" seems more to be an alternative to the Hilltop approach.

Beyond that, as I said earlier, the big deal for Teoma was creating a way to retrieve and cluster the results with a runtime measured in seconds rather than minutes -- but that is more operational rather than algorithmic. I don't think anything like exact sourcecode is publicly available.

Another good starting point is this pdf, also from Mike Grehan:


Featured Threads

Hot Threads This Week

Hot Threads This Month