---- Google's 950 Penalty (part 4) - or is it Phrase Based Re-ranking?
Marcia - 6:48 am on Feb 8, 2007 (gmt 0)
 If the document is included in the SPAM_TABLE, then the document's relevance score is down weighted by predetermined factor. For example, the relevance score can be divided by factor (e.g., 5). Alternatively, the document can simply be removed from the result set entirely.
 The search result set is then resorted by relevance score and provided back to the client.
I've never for one second considered that this patent wasn't seriously connected to this "penalty." There's a fine line between defining a penalty vs. a filter, but the way the patent reads, a page can be either de-indexed altogether, or for all practical purposes put on the equivalent of a "spam black-list", and my guess is that it's done by pre-processing and filtered accordingly at query time in the latter case.
In addition, the inbound anchor text scoring described in the patent could boost the document out of the danger zone if even a single new IBL shows up.
I've seen just exactly that very thing happen with a page that recovered, and what was done was based on an "intuition" after reading through the patent (which I've read about 10 times already).
It's obviously phrase based, and while there may be more factors involved, that still remains a likely suspect. It takes analyzing and isolating the particulars, which is no easy task, because it's complex - no doubt by design.
What imho is of a major concern is what looks to be the possibility of something tripped by usage of anchor text/headings/titles - because of not only the "normal" scraper pages, but some that are doing even more than just duplication of strings of text in anchors, and are now playing around with swiping full pages.
[edited by: Marcia at 6:55 am (utc) on Feb. 8, 2007]