Forum Moderators: open
Another point I'd love to have some clarification on:
Stopwords and size of index. FAST doesn't utilize stopwords. Does that imply you have to store more indexed text as you'd have to when using stopwords? Is that a limiting factor to increasing your index?
Hi Heini,
Stopwords are indexed as part of our index, but are given very little weight during ranking, of course. The reason to have them there is to offer TRUE phrase matching (like "to be or not to be", "the best of the who"). Our algorithms handle this in a clever way, and it has no impact on scaling at all. ;-)
- Knut Magne / FAST
[edited by: Marcia at 3:20 am (utc) on June 17, 2002]
- Knut Magne / FAST
>reach and coverage
By this do mean that you foucs more on getting out in the corners of the web and finding special unique/content, rather than sheer number of pages?
To me, coverage in search engine terms has always been how much of the "audience" you cover - how many users you reach. Could you elaborate a bit on this?
- Knut Magne / FAST
And then there are alternative file formats. With the new PDF indexing ( some 14 Mill indexed PDFs, if I'm correct), FAST has started to go into this direction.
Without any doubt the ability to index a varity of file formats is there - the corporate search technology from FAST indexes all kinds of files.
I certainly wonder: How large should an index ideally be, and what defines which files are worth indexing?