| 2:35 pm on Dec 26, 2005 (gmt 0)|
Im a programmer and webmaster , Id like to find indepth information into Teoma and how it works. Does anyone know where I can find information on how teoma works, its algorithm , ranking , sourcecode etc?
| 8:12 pm on Dec 26, 2005 (gmt 0)|
Teoma has been rather willing to share information about their algorithm -- which uses insights from Jon Kleinberg's HITS algorithm to identify "web communites". Teoma also conquered quite a technical a challenge in finding a way to give rapid results, building those community clusters on the fly. I've often wondered if the scalability of that solution isn't a major reason why AJ/Teoma hasn't taken on Google in a big way.
Mike Grehan has a very informative interview with Paul Gardi, SVP Search at Ask Jeeves/Teoma online:
At the end of the interview there's a link to a free pdf with more information about HITS and linkage based algorithms.
[edited by: tedster at 5:12 pm (utc) on Dec. 30, 2005]
| 12:46 pm on Dec 27, 2005 (gmt 0)|
Yes I understand they base the majority of the ranking on the HITS principle , Is there anywhere which provides information on how they managed to improve upon it?
| 5:56 pm on Dec 30, 2005 (gmt 0)|
As I understand it, the main algorithm can be understood as HITS, modified by CLEVER [almaden.ibm.com], further modified by work done in the DISCOWEB [cse.lehigh.edu] project. That last link offers some good detail on the math involved, as well as further source papers in the footnotes.
I also always assumed that there was a pinch of HILLTOP [cs.toronto.edu] thrown in to limit maniplation by affiliated websites, but I can't find confirmation anywhere. Teoma's "topic distillation" seems more to be an alternative to the Hilltop approach.
Beyond that, as I said earlier, the big deal for Teoma was creating a way to retrieve and cluster the results with a runtime measured in seconds rather than minutes -- but that is more operational rather than algorithmic. I don't think anything like exact sourcecode is publicly available.
Another good starting point is this pdf, also from Mike Grehan: