Forum Moderators: Robert Charlton & goodroi
Basically I have a .co.uk site which is mirrored in the US with a .com domain. I don't really want to kill off the .com mirror, but according to some webmasters it will hurt both sites if I mirror the content. Other webmasters say that the higher ranking .co.uk domain will simply be given preference to by Goo so there is no real harm done.
Can anyone with some experience on the matter shed some light?
If googlebot gets a trail of these URLs through whatever source, at some point it just stops wasting resources on the domain - and I've seen ranking troubles follow at that point. I think they've done a lot in the past year or so to catch this kind of thing - when you see googlebot asking for some off the wall URL it may be a test of your 404 error-handling. As long as the http status of the response is 404, your fine.
I assume you mean "the filter is real".
Essentially my question is whether both sites are penalized or whether one of the sites is simply supressed. People on various webmaster boards seem to be saying that both can happen.
Logically you would think that the duplicated content would be supressed, but that there would be no -30 or -950 penalty applied to both sites. There's just too many news articles etc. that appear on multiple web sites.
I like having a country-specific mirror so I'd rather not nuke my American site unless it is the cause of my poor SERP rankings. I do not even show up for my own domain name. I'm doing something they don't like and I've been lilly white in my promo techniques so I'm becoming very frustrated.
How does the filter work?
I believe that the filtering is done on only the snippets and titles (comparing whole pages for random searches would be too time consuming). Results are retrieved as normal and then pages that are too similar to a higher ranking result are removed or demoted (on a per query basis).
In practice this means having 2 identical sites is not an issue if all of the other ranking factors have been taken care of. Assuming you have the same content, the same quality of inbound links and the only real difference in the sites is that the .co.uk is hosted in the UK and the .com is hosted in the US you should find that a US search shows the .com and the UK search shows the .co.uk.
The real problems come into play when you have an uneven amount of link heaven going to the sites, Google may rank the better-linked site higher in the 'wrong' market and then once the de-duping is done your 'correct' site goes AWOL. I have found that geographically identifiable links can have a profound effect of which site Google chooses to show (of course these are not always easy to obtain). Try to get some good country specific links to each site you run (government links are nice).
The theory about just the snippets/titles being analysed answers the otherwise unanswerable question of "what percentage of my page should be different". The answer is that the page should be differnt enough so any snippets chosen by Google are different to other sites. This means that pages should be different right down to (roughly) sentence level.
But remember, in order to do the calculations quickly enough it's entirely plausable that 100% replication (in any of the ...snippets... or titles) may be the trigger so slight rephrasing could work.
We should remember that elaborate algorithms take time and resources, and these numbers go up exponentially when you look at the level of data that Google processes. Often we ascribe too much complexity to the processes and Google never corrects us on that as they see percieved complexity as a good thing.
There may be more to the process that lead to a larger 'penalty' such as counts of duplicates on each domain/page (that increase only as real-time queries are answered) trigger some action if the percentage of ranked/filtered results passes a given point. But overall we should try to avoid 'Sentence Similarity' and then we may not need to worry about the issue.