crobb305 - 5:09 pm on Oct 16, 2012 (gmt 0) [edited by: crobb305 at 5:20 pm (utc) on Oct 16, 2012]
Google is an ICANN-accredited registrar even though they do not offer registration services so they have access to whois data that you don't (original owner, domain history, etc.). I always thought that ICAAN did a sleazy deal with google in awarding them registrar status knowing full well that g only wanted the data for internal reasons. I believe they broke and are still breaking their own rules in that regard.
Well, supposing there are other businesses with websites in the same town that use the same UPS Store, creating some level of "similarity" (or in his terms, "suspiciously similar"), then we could still see a lot of false positives if one or two of those sites are, in deed, spammy. I thought 98% similar was a bit extreme just because of an overlapping Whois element (they did not share any other characteristics, e.g., Adsense accounts or web host).
Another important point he made in the article that I think is important, and may apply in my case of OOP is:
"any type of scraped, synonymised or obviously poorly written text would be a clear spam signal."
I don't scrape and I write well; but, I have had a tendency to use synonyms. I may have to rethink that strategy.
[edited by: crobb305 at 5:20 pm (utc) on Oct 16, 2012]