Welcome to WebmasterWorld Guest from 54.225.16.10

Forum Moderators: phranque

Message Too Old, No Replies

A Theory of Indexing - G. Salton

     
10:14 am on Jan 20, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2000
posts:3604
votes: 0


A Theory of Indexing - G. Salton

Tedster said elsewhere:

<quote>
I poked around a bit. It looks like Gerard Salton died in 1996. He was the first and principal person to promote the concepts of Inverse Document Frequency (IDF) and Term Frequency (TF) for automated information retrieval systems.
Whenever I read about search engine algorithms, these terms are everywhere. Looks like this is our roots."
</quote>

Recommended reading for anyone looking for a deeper understanding of information retrieval.

Search for it here [cs-tr.cs.cornell.edu]

1:16 pm on Jan 21, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


In a report on a search engine conference held last year, I read this:

<quote>"We can no longer predict what's going to happen with relevancy ranking—it's gotten beyond us at this point." This is because Northern Light allows some relevance factors to be at odds with each other (for example, a new document with timely, fresh information vs. a popular document that's been on the Web for years).</quote>

That fits right in with some of the other pieces I'm starting to put together.

Here's my working thesis: Information Retrieval (IR), such as Salton helped to develop, was the first approach to web indexing and search. More recently there is a movement to also use various linguistic tools from the field of Document Retrieval (DR). I'm learning that DR is a wildly different discipline from IR -- in many ways, its methods are even in opposition.

Where IR ranks relevance on a spectrum, DR looks for a match, either yes or no. DR is related to natural language, and meaning. This may be a piece of what the Teragram Corp is working on with Alta Vista, for instance.

What, exactly, will this mean? Like the above quote says, "we can no longer predict."

I just started reading this online book [dcs.gla.ac.uk] by C.J. van Rijsbergen at the University of Scotland. Besides doing a linear read, you can click on "Index" and get a huge list of concepts, then pop around to the pages which address those concepts. Mighty helpful, and a nice reference for these sometimes obscure terms.

9:05 pm on Feb 28, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2000
posts:3604
votes: 0


Just bumping this one back up in case anyone missed it!

It really is well worth taking the effort to read any paper by Gerard Salton, A Theory of Indexing is a good place to start.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members