Forum Moderators: open
"The number of listed pages will be tripled to 1.8 billion webpages. Fast is aware that experts believe the Web has more pages than this. However, Fast will throw out duplicates and "junk" (probably doorway pages and pages with little or no text)."
http://www.pandia.com/sw-2001/48-fast.html
The potential for this is interesting - imagine someone mirroring your website and then getting crawled first. Your site would be excluded from the index as duplicate material.
And that says nothing about their ability to actually give you the most relavent pages. If there are 2,000,000 pages out there relavent to buying a cell phone, but 1,990,000 of those are people talking about their pet cat or their trip to Jamaica that happen to mention "cell phone" and buying something, then I would much rather have only 10,000 pages show up as results for that search.
Unless Fast is able to work out that algo, I doubt they're going to improve their image much with regards to delivering the best content. It seems as if they'll just end up the huge bloated index they were a few months ago, just bigger.
-qianxing
I've seen their algo eliminate duplicates pretty well as of recent. Interestingly, though, their method of doing this seems to be by which page they index first...
The potential for this is interesting - imagine someone mirroring your website and then getting crawled first. Your site would be excluded from the index as duplicate material.
I do not think this is cause for concern: if you are an online store, even your telephone number and address of the CGI scripts must be the same to be counted as a duplicate, so if anything you gain even more exposure :)
As of Fast growing even bigger, I wish that they (and all other search engines) incorporate a much more powerful search engine, similar to LEXIS/NEXIS, where you can do full and complicated Boolean searches, time or size limitations, proximity to other words and so on. The works.