Welcome to WebmasterWorld Guest from 54.205.60.171

Forum Moderators: open

Message Too Old, No Replies

FAST index: stop words, size and coverage

"Careful about what we put into the catalog"

     
7:27 am on Jun 14, 2002 (gmt 0)

New User

10+ Year Member

joined:Aug 22, 2001
posts:9
votes: 0


Discussion expanded from incoming links [webmasterworld.com] topic, with in-depth clarification:

Another point I'd love to have some clarification on:

Stopwords and size of index. FAST doesn't utilize stopwords. Does that imply you have to store more indexed text as you'd have to when using stopwords? Is that a limiting factor to increasing your index?


Hi Heini,

Stopwords are indexed as part of our index, but are given very little weight during ranking, of course. The reason to have them there is to offer TRUE phrase matching (like "to be or not to be", "the best of the who"). Our algorithms handle this in a clever way, and it has no impact on scaling at all. ;-)

- Knut Magne / FAST

[edited by: Marcia at 3:20 am (utc) on June 17, 2002]

1:57 pm on June 14, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Jan 31, 2001
posts:4404
votes: 0


Thanks, Knut Magne

The true phrase matching is a cool feature. All the better if it doesn't drain on your resources.

Scalability is one of the features FAST always emphasizes - do you plan to enlarge your index even further over the next months?

11:18 am on June 16, 2002 (gmt 0)

New User

10+ Year Member

joined:Aug 22, 2001
posts:9
votes: 0


Our scalability is a key features in most of our large scale enterprise installations (like FirstGov, eBay, Reuters). On the Web Search arena, we focus more on reach and coverage than the actual size number, and if you do studies on our index, you will find our coverage to be quite superior..

- Knut Magne / FAST

2:26 pm on June 16, 2002 (gmt 0)

Moderator from DK 

WebmasterWorld Administrator 10+ Year Member

joined:Oct 23, 2000
posts:2541
votes: 4


Hej Knut Magne,
Tak for dine svar :)

>reach and coverage

By this do mean that you foucs more on getting out in the corners of the web and finding special unique/content, rather than sheer number of pages?

To me, coverage in search engine terms has always been how much of the "audience" you cover - how many users you reach. Could you elaborate a bit on this?

3:00 pm on June 16, 2002 (gmt 0)

New User

10+ Year Member

joined:Aug 22, 2001
posts:9
votes: 0


It's hard reveal direct parts of our roadmap, of course. But our goal is to serve the best possible search service for our customers and users. That implies having a large enough catalog, but we need to be careful about what we put into the catalog. So being able to cover the most important parts of the web, at a detail level that is the optimal for our users - that's where our focus lies.

- Knut Magne / FAST

7:51 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Hey, great to have someone from FAST on the board! Hope you'll be sticking around Knut and thanks for the information ;)

Nick

9:06 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Jan 31, 2001
posts:4404
votes: 0


Well, size does matter. Sure, weeding out duplicates is an important factor for keeping the quality of an index.
Nevertheless, given the size of the web even the elite class indexes of FAST and Google only reflect a small part of what's really there.

And then there are alternative file formats. With the new PDF indexing ( some 14 Mill indexed PDFs, if I'm correct), FAST has started to go into this direction.

Without any doubt the ability to index a varity of file formats is there - the corporate search technology from FAST indexes all kinds of files.

I certainly wonder: How large should an index ideally be, and what defines which files are worth indexing?

Tor

1:42 pm on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 31, 2000
posts:786
votes: 0


Thank you for sharing some of your knowledge with us Knut Magne. I hope you will be tuned in to this discussion forum regularly. :)