Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate Content Penalty - myth or not?

         

bw3ttt

3:57 am on Aug 12, 2007 (gmt 0)

10+ Year Member



I'd like to hear your thoughts regarding the duplicate content penalty. I've been researching on the web and it seems about 50% of webmasters believe the penalty is real while the other 50% say it is a myth.

Basically I have a .co.uk site which is mirrored in the US with a .com domain. I don't really want to kill off the .com mirror, but according to some webmasters it will hurt both sites if I mirror the content. Other webmasters say that the higher ranking .co.uk domain will simply be given preference to by Goo so there is no real harm done.

Can anyone with some experience on the matter shed some light?

cbpayne

5:13 am on Aug 12, 2007 (gmt 0)

10+ Year Member



Its not a penalty but a filter. Why would any search engine want to waste resources on more than one copy of the same thing?

tedster

5:46 am on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only time I've seen something like a true penalty is when a technical error serves the same content for hundreds or thousands of URLs. The most common of these errors is serving a "custom 404 page" that doesn't really send a 404 http header, but instead uses a 302 redirect for the requested URL and then serves the custom page with a 200 OK status.

If googlebot gets a trail of these URLs through whatever source, at some point it just stops wasting resources on the domain - and I've seen ranking troubles follow at that point. I think they've done a lot in the past year or so to catch this kind of thing - when you see googlebot asking for some off the wall URL it may be a test of your 404 error-handling. As long as the http status of the response is 404, your fine.

JohnRoy

7:24 am on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> it seems about 50% of webmasters believe the penalty is real while the other 50% say it is a myth.

I assume you mean "the filter is real".

bw3ttt

4:04 pm on Aug 12, 2007 (gmt 0)

10+ Year Member



I assume you mean "the filter is real".

Essentially my question is whether both sites are penalized or whether one of the sites is simply supressed. People on various webmaster boards seem to be saying that both can happen.

Logically you would think that the duplicated content would be supressed, but that there would be no -30 or -950 penalty applied to both sites. There's just too many news articles etc. that appear on multiple web sites.

I like having a country-specific mirror so I'd rather not nuke my American site unless it is the cause of my poor SERP rankings. I do not even show up for my own domain name. I'm doing something they don't like and I've been lilly white in my promo techniques so I'm becoming very frustrated.

inbound

4:36 pm on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My experience of the duplicate content filter (having a site with 170,000 pages that are often similar to competitors) is that there needs to be quite a high level of similarity before filtering takes place. In your case you have 100% similarity so the filter will apply (with some caveats).

How does the filter work?

I believe that the filtering is done on only the snippets and titles (comparing whole pages for random searches would be too time consuming). Results are retrieved as normal and then pages that are too similar to a higher ranking result are removed or demoted (on a per query basis).

In practice this means having 2 identical sites is not an issue if all of the other ranking factors have been taken care of. Assuming you have the same content, the same quality of inbound links and the only real difference in the sites is that the .co.uk is hosted in the UK and the .com is hosted in the US you should find that a US search shows the .com and the UK search shows the .co.uk.

The real problems come into play when you have an uneven amount of link heaven going to the sites, Google may rank the better-linked site higher in the 'wrong' market and then once the de-duping is done your 'correct' site goes AWOL. I have found that geographically identifiable links can have a profound effect of which site Google chooses to show (of course these are not always easy to obtain). Try to get some good country specific links to each site you run (government links are nice).

The theory about just the snippets/titles being analysed answers the otherwise unanswerable question of "what percentage of my page should be different". The answer is that the page should be differnt enough so any snippets chosen by Google are different to other sites. This means that pages should be different right down to (roughly) sentence level.

But remember, in order to do the calculations quickly enough it's entirely plausable that 100% replication (in any of the ...snippets... or titles) may be the trigger so slight rephrasing could work.

We should remember that elaborate algorithms take time and resources, and these numbers go up exponentially when you look at the level of data that Google processes. Often we ascribe too much complexity to the processes and Google never corrects us on that as they see percieved complexity as a good thing.

There may be more to the process that lead to a larger 'penalty' such as counts of duplicates on each domain/page (that increase only as real-time queries are answered) trigger some action if the percentage of ranked/filtered results passes a given point. But overall we should try to avoid 'Sentence Similarity' and then we may not need to worry about the issue.