homepage Welcome to WebmasterWorld Guest from 54.145.183.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
Forum Library, Charter, Moderators: phranque

SEM Research Topics Forum

    
A Theory of Indexing - G. Salton
NFFC

WebmasterWorld Senior Member nffc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28 posted 10:14 am on Jan 20, 2001 (gmt 0)

A Theory of Indexing - G. Salton

Tedster said elsewhere:

<quote>
I poked around a bit. It looks like Gerard Salton died in 1996. He was the first and principal person to promote the concepts of Inverse Document Frequency (IDF) and Term Frequency (TF) for automated information retrieval systems.
Whenever I read about search engine algorithms, these terms are everywhere. Looks like this is our roots."
</quote>

Recommended reading for anyone looking for a deeper understanding of information retrieval.

Search for it here [cs-tr.cs.cornell.edu]

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28 posted 1:16 pm on Jan 21, 2001 (gmt 0)

In a report on a search engine conference held last year, I read this:

<quote>"We can no longer predict what's going to happen with relevancy ranking—it's gotten beyond us at this point." This is because Northern Light allows some relevance factors to be at odds with each other (for example, a new document with timely, fresh information vs. a popular document that's been on the Web for years).</quote>

That fits right in with some of the other pieces I'm starting to put together.

Here's my working thesis: Information Retrieval (IR), such as Salton helped to develop, was the first approach to web indexing and search. More recently there is a movement to also use various linguistic tools from the field of Document Retrieval (DR). I'm learning that DR is a wildly different discipline from IR -- in many ways, its methods are even in opposition.

Where IR ranks relevance on a spectrum, DR looks for a match, either yes or no. DR is related to natural language, and meaning. This may be a piece of what the Teragram Corp is working on with Alta Vista, for instance.

What, exactly, will this mean? Like the above quote says, "we can no longer predict."

I just started reading this online book [dcs.gla.ac.uk] by C.J. van Rijsbergen at the University of Scotland. Besides doing a linear read, you can click on "Index" and get a huge list of concepts, then pop around to the pages which address those concepts. Mighty helpful, and a nice reference for these sometimes obscure terms.

NFFC

WebmasterWorld Senior Member nffc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 28 posted 9:05 pm on Feb 28, 2001 (gmt 0)

Just bumping this one back up in case anyone missed it!

It really is well worth taking the effort to read any paper by Gerard Salton, A Theory of Indexing is a good place to start.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved