homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
Forum Library, Charter, Moderator: open

Deprecated - Altavista, Alltheweb.com Forum

FAST index: stop words, size and coverage
"Careful about what we put into the catalog"

 7:27 am on Jun 14, 2002 (gmt 0)

Discussion expanded from incoming links [webmasterworld.com] topic, with in-depth clarification:

Another point I'd love to have some clarification on:

Stopwords and size of index. FAST doesn't utilize stopwords. Does that imply you have to store more indexed text as you'd have to when using stopwords? Is that a limiting factor to increasing your index?

Hi Heini,

Stopwords are indexed as part of our index, but are given very little weight during ranking, of course. The reason to have them there is to offer TRUE phrase matching (like "to be or not to be", "the best of the who"). Our algorithms handle this in a clever way, and it has no impact on scaling at all. ;-)

- Knut Magne / FAST

[edited by: Marcia at 3:20 am (utc) on June 17, 2002]



 1:57 pm on Jun 14, 2002 (gmt 0)

Thanks, Knut Magne

The true phrase matching is a cool feature. All the better if it doesn't drain on your resources.

Scalability is one of the features FAST always emphasizes - do you plan to enlarge your index even further over the next months?


 11:18 am on Jun 16, 2002 (gmt 0)

Our scalability is a key features in most of our large scale enterprise installations (like FirstGov, eBay, Reuters). On the Web Search arena, we focus more on reach and coverage than the actual size number, and if you do studies on our index, you will find our coverage to be quite superior..

- Knut Magne / FAST


 2:26 pm on Jun 16, 2002 (gmt 0)

Hej Knut Magne,
Tak for dine svar :)

>reach and coverage

By this do mean that you foucs more on getting out in the corners of the web and finding special unique/content, rather than sheer number of pages?

To me, coverage in search engine terms has always been how much of the "audience" you cover - how many users you reach. Could you elaborate a bit on this?


 3:00 pm on Jun 16, 2002 (gmt 0)

It's hard reveal direct parts of our roadmap, of course. But our goal is to serve the best possible search service for our customers and users. That implies having a large enough catalog, but we need to be careful about what we put into the catalog. So being able to cover the most important parts of the web, at a detail level that is the optimal for our users - that's where our focus lies.

- Knut Magne / FAST


 7:51 am on Jun 17, 2002 (gmt 0)

Hey, great to have someone from FAST on the board! Hope you'll be sticking around Knut and thanks for the information ;)



 9:06 am on Jun 17, 2002 (gmt 0)

Well, size does matter. Sure, weeding out duplicates is an important factor for keeping the quality of an index.
Nevertheless, given the size of the web even the elite class indexes of FAST and Google only reflect a small part of what's really there.

And then there are alternative file formats. With the new PDF indexing ( some 14 Mill indexed PDFs, if I'm correct), FAST has started to go into this direction.

Without any doubt the ability to index a varity of file formats is there - the corporate search technology from FAST indexes all kinds of files.

I certainly wonder: How large should an index ideally be, and what defines which files are worth indexing?


 1:42 pm on Jun 17, 2002 (gmt 0)

Thank you for sharing some of your knowledge with us Knut Magne. I hope you will be tuned in to this discussion forum regularly. :)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved