Forum Moderators: open
Yeah, Fast is not as good as detecting dupes as they say. Maybe exact dupes, character for character, but 90% and below they don't do too well, neither does Google. Inktomi seems to be the king on that one, but who can tell these days. Still, 100 million is still a stinkin' lot of pages.
The difference between a 100million documents and 500million is monumental. The level of complexity is a 1000's of times more difficult.
I split the other topic [webmasterworld.com] on Fast off over to the Fast board.
I wonder if the Norwegian scientist from the University of Trondheim, who are behind Fast, really intend to be in the search engine business at all. That requires a lot more capital than the search engine software business. If you read the technological description at www.fast.no it seems to me that they are arguing their case to search engine companies, hard pressed by the need to upgrade their hardware. And they have been successful in selling software licenses, as we all know. Any thoughts, anyone?
Ok, this may sound really lame but why couldn't SE maintain multiple databases and then access them the way metacrawler does and then eliminate the dups?
It was Inktomi that developed the 'parrallel' search engine first using just that technique.