Duplicate content in Google. Does it really matter? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate content in Google. Does it really matter?

Google seems happy to choose urls, why should we worry?

bouncybunny

6:32 pm on Nov 1, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm being slightly disingenuous in the title, but this is my dilema.

I have a forum that is well indexed by Google. But, as with most dynamic web tools, the same content can be accessible from different URLs. (I am using SMF, if anyone's interested).

I have used .htaccess mod_rewrite and robots.txt extensively to create SE friendly URLs and to exclude the indexing of duplicate urls for the same content. But, until recently, I had overlooked one area.

The URL for each thread is something like this;

http://www.example.com/forum/index.php/topic,1234.0.html

But, each posting within a thread, has a URL similar to this;

http://www.example.com/forum/index.php/topic,1234.msg321.html

Now, Google has been happily indexing both types of URL and, in search results, simply choosing one over the other to display. The other URL is consigned to the supplemental index. Which is kind of fine, I don't need both indexed.

Although the first 'root' url would obviously be the ideal one, if only because it is often linked to internally. But Google seems to like to link to the second type of URL, partially because that is the url used in the 'Latest Posts' feature on the front page (so is most accessible to Googlebot) and also because external web sites sometimes link to the post, rather than the 'root' of the URL.

My dilema is whether to exclude the second format of the URL via robots.txt, to try and get some kind of consistency.

Or, should I not fix what isn't broken? I have not suffered any kind of duplicate content 'penalty' so far (touch wood) and to block these urls now might exclude hundreds of pages.

Any thoughts?

[edited by: tedster at 7:07 pm (utc) on Nov. 1, 2006]
[edit reason] use example.com [/edit]

tedster

10:41 pm on Nov 1, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Now, Google has been happily indexing both types of URL and, in search results, simply choosing one over the other to display.

In my opinion, you shouldn't try to manipulate that situation. Good for you for recognizing that there is potential for chaos --if troubleever does start you'll know what to do. But it sounds like Google is handling the duplicate issue just fine in your case, and trying to "fix" something here might actually make trouble.

bouncybunny

8:24 am on Nov 2, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thanks tedster.

Indeed. I tried the robots rule for a week or two and it seemed to slow down the indexing process somewhat. But then that makes sense as the 'root' urls are buried one level deeper than the .msg urls.

I've gone back to the way it was and I'll see how it goes.

Any other opinions welcome.

bouncybunny

7:12 am on Nov 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

And just as a follow up. After removing the exclusion rule from the robots.txt file, Google has gone back to happily indexing my forum again.

Touch wood, so far so good and all that...

bouncybunny

7:26 am on Nov 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I wonder if I might note down a couple of thoughts that came to me on this.

It seems to me that, in essence, these differnt urls are not actually duplicate content in the true sense. And perhaps all this thinking about blocking duplicate urls is too much of a linear approach to a subject that... well, isn't all that linear. Especially when we are talking about content management systems and forums.

For example, when Google indexes one of the .msg URLs on my forum, it is essentially indexing a specific item of information.

And I've noticed, on ocassion, that when searching for the same thread, using slightly different keywords, that Google puts one version of the thread before the other in its results (i.e. it is not always the same url that is supplemental, or in the main index). So, rather than the different URLs being treated as 'duplicates', Google is quite cleverly acknowledging that the web isn't made up of lots of static documents with a beginning and an end. But is rather a cyclical depository of information that can be stored and accessed in numerous different ways. Especially in the case of forums where a thread can be extremely long and cover a range of subjects and viewpoints, it seems only logical that Google would index that thread at different points, depending on how its robots see the information being presented.

It seems to me that this is quite different to a web site having duplicate versions of its content scattered around different places.

Or am I missing the whole point here...?

jobonet

2:31 am on Nov 7, 2006 (gmt 0)

10+ Year Member

Hi folks,

Anyone else experiencing this? My eBay listings contain duplicate content on approximately 8,000 products. On relevant keyword searches, I find that my eBay listing URLs come up high in SERPs while my site URLs are buried in supplemental. Is this causing a dupe penalty?

Most of the pages went supplemental in early October, and I have waited for a rebound which has not happened. I also posted a reinclusion request. The site has a PR 5 on index and many URLs below with PR 4 and lower. The URLs with PR are indexed, but the thousands of category and product URLs are supplemental. I have compared the site against competitor's sites with similar products, PR, and backlinks and they are not having the supplemental issue.

I have in place 301 redirects for canonical issues, and also to change PHP queries to friendly HTML URLs.

g1smd

6:46 pm on Nov 7, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

You have 2 URLs: "A" is the "real" URL of the page, and "B" is what you want it to be indexed as.

Link to URL "B" on the page, and set up an internal rewrite from "B" to "A" as well as an external 301 redirect from "A" to "B".

If a user asks for "A" they are redirected to "B" with a 301. If they ask for "B" then they URL is silently and internally rewritten as "A" and the content served.

This does not cause a loop because one of them is an internal rewrite, and one is a physical redirect.

moftary

6:53 pm on Nov 7, 2006 (gmt 0)

10+ Year Member

I never had problems with on-site duplicated content (if the duplication was due two versions or so). All my duplicated contents penalties were due to the similarity between my sites and others.

tedster

7:03 pm on Nov 7, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

That's been my general experience too, moftary. I've also seen big trouble when a technical error allows the exact same content to be served for thousands of URLs -- but that is really a different kind of "duplicate content" situation.

g1smd

11:39 pm on Nov 7, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

With two URLs showing duplicate content, almost every time that happens one URL slips into supplemental.