MikeNoLastName - 12:27 am on Nov 22, 2012 (gmt 0)
I am stymied by what G considers copying and copyright infringement these days. I found a site a couple of days ago using cop---ape on one of our previously top 5 SERP rank pages that shows as "has 4,024 words matching 33% of the page". It's a 100K+ size page. It's obvious it was copied as our seed data and entire paragraphs are letter for letter the same. The only reason it was not an EXACT match is because they obviously copied it about 2 years ago and whereas we update the page weekly, they have not changed it since then. I submitted a copyright infringe claim via WMT explaining it was of the version which existed in 2010 which is still available in archive dot org, and it came back as rejected, URL not infringing!
Many of our prior top pages are updated daily, so by the time a copy shows up in the search results chances are SOMETHING is going to be different. How much does it take for a copyright violation? So we can just copy pages, change one line and avoid a copyright violation now?
So my main point is if G does not consider an obvious 33% of a page copied as a copyright infringement, shouldn't that mean it should NOT consider it as a duplication penalty either? I'm betting they do, and leaving us with no option to get rid of them, other than rewriting our own work.