Thoughts on Google's Pagerank penalty for duplicate content

I've read a couple of posts on how google penalizes some sites for duplicate content. I'm not sure I quite understand exactly how they're doing this, but if I get it right, if you have 2 pages that differ by only x%, google penalizes both pages.

The problem Google needs to solve:
If there are a thousand sites mirroring DMOZ.org's content, the "correct" way for google to handle this is to ignore for the most part the duplicate content and only "count" DMOZ.org, UNLESS, someone is searching the content from within a particular domain. In summary, it wouldn't hurt to index the dupe content, but you would want their results hidden from searches. You also would NOT want to penalize the original content which is DMOZ.org. In most cases however, google will have no way dynamically to determine which content is the "original" one.

The google Solution:
Penalize all pages with duplicate content (although dmoz.org seems to be the exception). The penalization will result in lower Pagerank.

The problem(s) caused by this solution:
There are many legitimate reasons for duplicate content. For example, mirrors. PHP.net has nice documentation and that is the original page. But if you are in Sweden, you may prefer to view the content from a mirror in Sweden. So perhaps php.net "deserves" the higher pagerank, but should the page in sweden have a "low" pagerank just because it is a mirror? Maybe. But then a search for "PHP manual +Sweden -America" may give less relevant sites a higher ranking due to the lower PR for that mirrored page.

But a stronger example where this causes a problem is where you have a database and want to display the same data in different ways. Let's say an articles database. You may want to sort by author, by category, by title of article. Each view may differ by only 10%. But it is original content, good content. Why should all of the pages have a lower pagerank?

Is there a better solution?:
Seems to me that rather than penalize ALL of the sites with duplicate content by offering a lower PR, it would be better to leave the PR as is and decide a kind of "penalty" based on the search done..so when a search is done, dynamically choose which of the dupe content is "most relevant", "most important" and perhaps "highest PR w/o any penalty" and hide the rest.

What do you google experts think?

P.S. Anyone notice Yahoo putting googlism in their cool websites link today? I searched googlism on googlism and had almost no responses...

Thoughts on Google's Pagerank penalty for duplicate content

discussion on dupe content

heretic

JonB

jackofalltrades

heretic

jackofalltrades

Henley

gmoney

heretic

rfgdxm1

gmoney

JonB

excell

heretic

gmoney

gmoney

austtr

europeforvisitors

Chris_R

Brett_Tabke

web_india

web_india

vitaplease

heretic

europeforvisitors

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week