Page is a not externally linkable
softplus - 10:53 pm on May 10, 2006 (gmt 0)
Perhaps: Especially if they're moving towards an automated spam-reporting system, they are without question going to run into sites left and right that are prefectly legitimate but where a look in the code could be misleading for a "simple crawler". "White text on white background? oh, didn't spot the black background image" :-) If this is happening, it would look pretty much like what's happening now. Lots of simple spamy sites are reporting problems, lots of normal sites are reporting problems. Since Google by no means can look at all sites in their index before putting something like this live, they can only compare to known-spam and known-nonspam sites, accepting a possible false-postive rate in order to get a higher rate of spam-sites automatically penalized / banned. Great, only 0.5% false positives (just pulling a number out of my hat)? What would that be? A few million sites? :-( Add to that some possible issues with their proxy/cache system, and you're in for a ride....
These statements make me think that G has ratcheted up its spam definitions and a good percentage of the newly excluded pages may be ones that are now getting labelled in the index as spam when previously they were considered clean.
- they're moving to an automatic spam system, based on their Mozilla crawler (hidden text, hidden links, etc.) (compared to a previously mostly manual-review based system)
- they're moving towards partial spam-penalties (compared to full bans)