homepage Welcome to WebmasterWorld Guest from 54.204.182.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
Forum Library, Charter, Moderator: open

Deprecated - Altavista, Alltheweb.com Forum

    
FAST index: stop words, size and coverage
"Careful about what we put into the catalog"
KnutRisvik

10+ Year Member



 
Msg#: 507 posted 7:27 am on Jun 14, 2002 (gmt 0)

Discussion expanded from incoming links [webmasterworld.com] topic, with in-depth clarification:

Another point I'd love to have some clarification on:

Stopwords and size of index. FAST doesn't utilize stopwords. Does that imply you have to store more indexed text as you'd have to when using stopwords? Is that a limiting factor to increasing your index?


Hi Heini,

Stopwords are indexed as part of our index, but are given very little weight during ranking, of course. The reason to have them there is to offer TRUE phrase matching (like "to be or not to be", "the best of the who"). Our algorithms handle this in a clever way, and it has no impact on scaling at all. ;-)

- Knut Magne / FAST

[edited by: Marcia at 3:20 am (utc) on June 17, 2002]

 

heini

WebmasterWorld Senior Member heini us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 507 posted 1:57 pm on Jun 14, 2002 (gmt 0)

Thanks, Knut Magne

The true phrase matching is a cool feature. All the better if it doesn't drain on your resources.

Scalability is one of the features FAST always emphasizes - do you plan to enlarge your index even further over the next months?

KnutRisvik

10+ Year Member



 
Msg#: 507 posted 11:18 am on Jun 16, 2002 (gmt 0)

Our scalability is a key features in most of our large scale enterprise installations (like FirstGov, eBay, Reuters). On the Web Search arena, we focus more on reach and coverage than the actual size number, and if you do studies on our index, you will find our coverage to be quite superior..

- Knut Magne / FAST

Rumbas

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 507 posted 2:26 pm on Jun 16, 2002 (gmt 0)

Hej Knut Magne,
Tak for dine svar :)

>reach and coverage

By this do mean that you foucs more on getting out in the corners of the web and finding special unique/content, rather than sheer number of pages?

To me, coverage in search engine terms has always been how much of the "audience" you cover - how many users you reach. Could you elaborate a bit on this?

KnutRisvik

10+ Year Member



 
Msg#: 507 posted 3:00 pm on Jun 16, 2002 (gmt 0)

It's hard reveal direct parts of our roadmap, of course. But our goal is to serve the best possible search service for our customers and users. That implies having a large enough catalog, but we need to be careful about what we put into the catalog. So being able to cover the most important parts of the web, at a detail level that is the optimal for our users - that's where our focus lies.

- Knut Magne / FAST

Nick_W

WebmasterWorld Senior Member nick_w us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 507 posted 7:51 am on Jun 17, 2002 (gmt 0)

Hey, great to have someone from FAST on the board! Hope you'll be sticking around Knut and thanks for the information ;)

Nick

heini

WebmasterWorld Senior Member heini us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 507 posted 9:06 am on Jun 17, 2002 (gmt 0)

Well, size does matter. Sure, weeding out duplicates is an important factor for keeping the quality of an index.
Nevertheless, given the size of the web even the elite class indexes of FAST and Google only reflect a small part of what's really there.

And then there are alternative file formats. With the new PDF indexing ( some 14 Mill indexed PDFs, if I'm correct), FAST has started to go into this direction.

Without any doubt the ability to index a varity of file formats is there - the corporate search technology from FAST indexes all kinds of files.

I certainly wonder: How large should an index ideally be, and what defines which files are worth indexing?

Tor

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 507 posted 1:42 pm on Jun 17, 2002 (gmt 0)

Thank you for sharing some of your knowledge with us Knut Magne. I hope you will be tuned in to this discussion forum regularly. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved