joined:May 10, 2003
I've come to understand and agree that post-Panda rankings can be highly effected by text duplication, either on-site or off-site.
On the one hand G claims they have become quite good at determining who the source is and who the copiers are... I say hogwash, but that really doesn't not matter in our following example.
In the past the primary method of determining if you had a duplication issue and who was being recognized as the authority was by searching on a snippet of your unique text in "quotes". If you came up below the top or, worse, in the subsequent "there are other pages not shown similar to this" section, then you were not considered the primary source.
Our example is website that has thounds of pages, a lot of it past news archives, which was hit hard by Panda 1.0 and after. Consider a 10 paragraph news article written from scratch, uploaded to the domain, submitted via PUSH to Google, sitemap updated, and fetched as googlebot on WMT and a few other methods. It gets indexed within hours and can be seen as the ONLY result on G when searching. Within 6-36 hours it gets copied in it's entirety by a couple of other sites and posted and indexed. One is a Wordpress site which tags every topic in the article and generates a dozen entries which all show up in G. Both are PR 0-1 while the original site is PR2-4. NOW, the same quoted snippet search shows the original (with oldest time in the G index listing) only at the very bottom of supplemental results!
However, When searching using the same snippet WITHOUT quotes the original almost always shows up well ABOVE the other two sites!
What can this mean? Who does G really see as the original and who is getting penalized for duplication? The ultimate question is, should it be necessary for the originator to noindex these pages once they are copied to avoid a duplication penalty on the rest of the site based on these kinds of results? And finally, does noindexing of duplicated pages, but keeping them available for onsite readers, actually eliminate duplication penalties or does G still read them and evaluate them for penalties (I know there was a prior thread which claimed they do, but nothing more substantial ever came of that claim).