Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: phranque
Tedster said elsewhere:
I poked around a bit. It looks like Gerard Salton died in 1996. He was the first and principal person to promote the concepts of Inverse Document Frequency (IDF) and Term Frequency (TF) for automated information retrieval systems.
Whenever I read about search engine algorithms, these terms are everywhere. Looks like this is our roots."
Recommended reading for anyone looking for a deeper understanding of information retrieval.
Search for it here [cs-tr.cs.cornell.edu]
<quote>"We can no longer predict what's going to happen with relevancy ranking—it's gotten beyond us at this point." This is because Northern Light allows some relevance factors to be at odds with each other (for example, a new document with timely, fresh information vs. a popular document that's been on the Web for years).</quote>
That fits right in with some of the other pieces I'm starting to put together.
Here's my working thesis: Information Retrieval (IR), such as Salton helped to develop, was the first approach to web indexing and search. More recently there is a movement to also use various linguistic tools from the field of Document Retrieval (DR). I'm learning that DR is a wildly different discipline from IR -- in many ways, its methods are even in opposition.
Where IR ranks relevance on a spectrum, DR looks for a match, either yes or no. DR is related to natural language, and meaning. This may be a piece of what the Teragram Corp is working on with Alta Vista, for instance.
What, exactly, will this mean? Like the above quote says, "we can no longer predict."
I just started reading this online book [dcs.gla.ac.uk] by C.J. van Rijsbergen at the University of Scotland. Besides doing a linear read, you can click on "Index" and get a huge list of concepts, then pop around to the pages which address those concepts. Mighty helpful, and a nice reference for these sometimes obscure terms.