In a report on a search engine conference held last year, I read this:
<quote>"We can no longer predict what's going to happen with relevancy ranking—it's gotten beyond us at this point." This is because Northern Light allows some relevance factors to be at odds with each other (for example, a new document with timely, fresh information vs. a popular document that's been on the Web for years).</quote>
That fits right in with some of the other pieces I'm starting to put together.
Here's my working thesis: Information Retrieval (IR), such as Salton helped to develop, was the first approach to web indexing and search. More recently there is a movement to also use various linguistic tools from the field of Document Retrieval (DR). I'm learning that DR is a wildly different discipline from IR -- in many ways, its methods are even in opposition.
Where IR ranks relevance on a spectrum, DR looks for a match, either yes or no. DR is related to natural language, and meaning. This may be a piece of what the Teragram Corp is working on with Alta Vista, for instance.
What, exactly, will this mean? Like the above quote says, "we can no longer predict."
I just started reading this online book [dcs.gla.ac.uk] by C.J. van Rijsbergen at the University of Scotland. Besides doing a linear read, you can click on "Index" and get a huge list of concepts, then pop around to the pages which address those concepts. Mighty helpful, and a nice reference for these sometimes obscure terms.