Welcome to WebmasterWorld Guest from 54.161.64.174

Forum Moderators: open

Message Too Old, No Replies

Teoma Development

     
2:35 pm on Dec 26, 2005 (gmt 0)

10+ Year Member



Im a programmer and webmaster , Id like to find indepth information into Teoma and how it works. Does anyone know where I can find information on how teoma works, its algorithm , ranking , sourcecode etc?
8:12 pm on Dec 26, 2005 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Teoma has been rather willing to share information about their algorithm -- which uses insights from Jon Kleinberg's HITS algorithm to identify "web communites". Teoma also conquered quite a technical a challenge in finding a way to give rapid results, building those community clusters on the fly. I've often wondered if the scalability of that solution isn't a major reason why AJ/Teoma hasn't taken on Google in a big way.

Mike Grehan has a very informative interview with Paul Gardi, SVP Search at Ask Jeeves/Teoma online:
[e-marketing-news.co.uk...]

At the end of the interview there's a link to a free pdf with more information about HITS and linkage based algorithms.

<fixed spelling>

[edited by: tedster at 5:12 pm (utc) on Dec. 30, 2005]

12:46 pm on Dec 27, 2005 (gmt 0)

10+ Year Member



Hi

Yes I understand they base the majority of the ranking on the HITS principle , Is there anywhere which provides information on how they managed to improve upon it?

5:56 pm on Dec 30, 2005 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



As I understand it, the main algorithm can be understood as HITS, modified by CLEVER [almaden.ibm.com], further modified by work done in the DISCOWEB [cse.lehigh.edu] project. That last link offers some good detail on the math involved, as well as further source papers in the footnotes.

I also always assumed that there was a pinch of HILLTOP [cs.toronto.edu] thrown in to limit maniplation by affiliated websites, but I can't find confirmation anywhere. Teoma's "topic distillation" seems more to be an alternative to the Hilltop approach.

Beyond that, as I said earlier, the big deal for Teoma was creating a way to retrieve and cluster the results with a runtime measured in seconds rather than minutes -- but that is more operational rather than algorithmic. I don't think anything like exact sourcecode is publicly available.

Another good starting point is this pdf, also from Mike Grehan:
[searchguild.com...]

 

Featured Threads

Hot Threads This Week

Hot Threads This Month