Abstract - Systems and methods that identify manipulated articles are described. In one embodiment, a search engine implements a method comprising determining at least one cluster comprising a plurality of articles, analyzing signals to determine an overall signal for the cluster, and determining if the articles are manipulated articles based at least in part on the overall signal.
William Slawski at SEO by the SEA has an excellent overview of this new patent that was just granted yesterday (2007-11-27)...
Google Patent on Web Spam, Doorway Pages, and Manipulative Articles [seobythesea.com...]
The identification of manipulative documents, how they might be grouped together, and how they could be treated by the search engine is described in some detail. That treatment might include removal of pages from the search index, reductions in rankings for pages, and possibly a change in how quality scores (PageRank) are calculated for links from manipulative pages.
It is definitely worth a read and contains some very interesting information. All of Google's Patents contain interesting information. Slawski has a great way of deciphering them and explaining them to his audience. :)
I run a directory that quotes blurbs from the technical pages I have listed. I believe long tail searches for the title OR subject of the original documents often brings up my directory as well as the original. (Not unlike some scientific abstract libraries, for they do this too.)
I am wondering if that's too close to the way spammers lift a little copy and create a doorway or other content. Is a patent like this going to whack me -if not today, in the future?