Welcome to WebmasterWorld Guest from 107.21.183.163

Forum Moderators: phranque

Message Too Old, No Replies

A Theory of Indexing - G. Salton

     

NFFC

10:14 am on Jan 20, 2001 (gmt 0)

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A Theory of Indexing - G. Salton

Tedster said elsewhere:

<quote>
I poked around a bit. It looks like Gerard Salton died in 1996. He was the first and principal person to promote the concepts of Inverse Document Frequency (IDF) and Term Frequency (TF) for automated information retrieval systems.
Whenever I read about search engine algorithms, these terms are everywhere. Looks like this is our roots."
</quote>

Recommended reading for anyone looking for a deeper understanding of information retrieval.

Search for it here [cs-tr.cs.cornell.edu]

tedster

1:16 pm on Jan 21, 2001 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



In a report on a search engine conference held last year, I read this:

<quote>"We can no longer predict what's going to happen with relevancy ranking—it's gotten beyond us at this point." This is because Northern Light allows some relevance factors to be at odds with each other (for example, a new document with timely, fresh information vs. a popular document that's been on the Web for years).</quote>

That fits right in with some of the other pieces I'm starting to put together.

Here's my working thesis: Information Retrieval (IR), such as Salton helped to develop, was the first approach to web indexing and search. More recently there is a movement to also use various linguistic tools from the field of Document Retrieval (DR). I'm learning that DR is a wildly different discipline from IR -- in many ways, its methods are even in opposition.

Where IR ranks relevance on a spectrum, DR looks for a match, either yes or no. DR is related to natural language, and meaning. This may be a piece of what the Teragram Corp is working on with Alta Vista, for instance.

What, exactly, will this mean? Like the above quote says, "we can no longer predict."

I just started reading this online book [dcs.gla.ac.uk] by C.J. van Rijsbergen at the University of Scotland. Besides doing a linear read, you can click on "Index" and get a huge list of concepts, then pop around to the pages which address those concepts. Mighty helpful, and a nice reference for these sometimes obscure terms.

NFFC

9:05 pm on Feb 28, 2001 (gmt 0)

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Just bumping this one back up in case anyone missed it!

It really is well worth taking the effort to read any paper by Gerard Salton, A Theory of Indexing is a good place to start.