Forum Moderators: open
2/3rds of the servers? It's so absurdly high, I just couldn't believe it when I read it. I still don't really.
If you calculated it out based on an estimated 3 billion indexable open documents on the web and 23 million servers (according to other se's), that would put Fast's index at about 400 to 450 million pages absolute maximum.
If you (not me!) include purely affiliate sites with little added value, link farm linked sites, auto redirections, near duplicates, third level domain duplicates, as well as the many sites that are obvious spam, you may get fairly close.
There are no banned servers. In our experience, we have seen spam on 2/3's
of all web servers, however, we do not blacklist 2/3's of all servers. It
is just an interesting statistic but doesn't really have any consequences.
I think this clear things up, (I recieved this 5 minutes ago in my mailbox)
Brett maybe another "Humble apologies" :)
being the target of more or less sophisticated spamming I had basically two choices
- build better algo
- try to scare people to death
Building a better algo is the more difficult solution. People tend to go the easy way.
I'd perhaps make a statement somewhere saying many many servers were completely blacklisted and banned, and then, when the message is out, make another statement saying, oh well, it's just stats at this time...
It's not just us web promoters who are experts in PR - search engines are in the same industry.
Seeing - not hearing - is believing.
To provide some more clarification regarding the 2BB index size. Many of you have noticed that we have been crawling aggressively over the past 3 weeks. We have been adding about 100MM URL's to the index (high quality URL's, no dupes, full-text index). We will continue to aggressively expand the size of the database however, if we determine that there are not 2BB interesting Web documents on the web, we will not sacrifice quality for quantity.
I hope this clarifies.