heini - 11:53 am on Jan 11, 2003 (gmt 0) I think that's true. There's lots of stuff not worth indexing, insofar that it doesn't benefit the user finding it in an index. Still, I think size matters for two reasons:
Fast has made different statements as to the size of their db.
They did show off when they were bigger than G for a while.
They also said something to the effect they were not keen on adding every page there is, and also that there's much stuff out there not worth indexing.
- There is an awful lot of quality information in the deep web, which none of the engines has even touched upon. All engines have undertaken steps to get to that content, by adding new file types, by starting to crawl databases. Look at what Fast has announced for this summer, going for "universal search", where the approach is not to find the best single result for a query, but to offer a bundle of information related to a query from different sources.
I think Google is heading in the same direction. Can't remember the exact quote, but didn't Brin state they were trying to basically make all information available to a searcher?
- Second a large index is important for finetuning any algo that relies on linkpopularity in any form. Doing a worldwide index means a search engine has to map the web as completely as possible in order to avoid distorted maps.
I think that's true. There's lots of stuff not worth indexing, insofar that it doesn't benefit the user finding it in an index.
Still, I think size matters for two reasons: