Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Internal duplicate content - has this been ramped up?

         

Whitey

11:31 pm on Sep 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Duplicate content remains one of the more complex aspects of SEO to manage and some good threads have been running since 2005 on WebmasterWorld . Google have since continued to update their guidelines, the last being in July 2011 :

[googlewebmastercentral.blogspot.com...]
[google.com...]

However, it seems that there was a tightening of the filters under the Panda updates which caught folks by surprise - both internal and between sites. We see several senior members making mention of addressing it internally on sites , some with great experience . So why were so many caught by surprise when it happened?

g1smd is one notable webmaster that has contributed greatly to duplicate content conversations that made a mention of duplicate tidy ups on WebmasterWorld in his Panda recovery.

What has changed, and what has been tightened up ? Has Google stepped beyond pure "filtering" of non priority duplicates and now classified non priority pages as "low quality" ? Has duplication extended beyond the meta title and descriptions to more strongly include the body content which may have been repeated across a site?

Whitey

1:32 am on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What is duplicate content?

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries. [googlewebmastercentral.blogspot.com...]
Just to push the point and focus a bit further, a lot of shopping / product sites and forums were amongst those reported as being hit. I'm seeking to exclude external content duplication to help maintain focus here.

g1smd

6:56 am on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Panda recovery.

I wasn't hit. :) I am helping a site that might have been.

Whitey

1:59 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am helping a site that might have been.

Thanks for clarifying that - but do you believe the Panda algo was ramped up to a greater degree, to cause that site to tip over ? I notice you say "might have been".

Sgt_Kickaxe

2:08 pm on Sep 28, 2011 (gmt 0)



There can be no one solid "way" of doing things.

The internet is far from flat, in Google's eyes there are many types of pages and different rules apply to each subset. If you've got informational pages it would make sense to quote sources and so some duplicate is probably of little consequence. On a product page however duplicate is a no-no, in fact if you show the same descriptions as other sites you won't rank well. If the type of page is pulling in navigational visitors it doesn't matter what's on the page, Google will rank you tops. (ie:navigational = keywords aimed at finding your specific site).

londrum

2:12 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i was hit by panda, and i think it was because of internal duplicate content too.

i have a large section on my site which lists events. its all hand written and original text, so there's no duplicate issues there, but i allow people to sort the events by category and date. so, naturally, if an event runs for several days (or even months) then it will appear on every date page.
i cant alter that text for every date, because i'd have to do about 50 different snippts for each event, which is ludicrous, and i cant 'noindex' those pages either, because people frequently search for events by date in the SERPs, and i get a lot of traffic from them.

so i am screwed, basically. and there is no way around having a load of internal duplicate text.

if google is ranking sites on this, then i think its crazy, because there are lots and lots of cases where its perfectly legitimate to have the same text appear on loads of different pages.

g1smd

2:25 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can't entirely confirm what the cause of the poor site performance is; only that once the site has been fixed, seeing what happens to the traffic and sales over the next few months might give some clues.

Calendars are a big source of infinite duplicate content; especially pages with no entries! Make sure that search engines cannot follow the "next day" link forever and so end up spidering dates hundreds or thousands of years into the future (or past).

Whitey

3:57 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just a reminder - i'm encouraging the question down the internal duplicate line - otherwise it'll become unfocused and morph into other complexities of duplicate content.

What about same content on parent / child pages verticals , typical on shopping or forum sites. It never used to be a problem, provided the meta's were different , Google would just filter one out and show the most relevant to the query.

But is it now, provided the content is not substantially different?

tedster

5:03 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i allow people to sort the events by category and date. so, naturally, if an event runs for several days (or even months) then it will appear on every date page... i cant 'noindex' those pages either, because people frequently search for events by date in the SERPs, and i get a lot of traffic from them.

so i am screwed, basically. and there is no way around having a load of internal duplicate text.

If you are getting a lot of traffic, then you're not screwed and shouldn't "borrow trouble" based on an abstract idea.
But if you have lost a lot of traffic, then I'd consider blocking the various types of "sort pages" and just have certain default lists indexed. Especially on large sites, this approach has been a real help for many.

londrum

5:38 pm on Sep 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if you have lost a lot of traffic, then I'd consider blocking the various types of "sort pages" and just have certain default lists indexed.


but that's the problem, because they are not your run of the mill "sort pages". they are date pages.
for example, people might search for

"exhibitions in timbuktu"
"exhibitions in timbuktu october"
"exhibitions in timbuktu november"

if an exhibition spans two months then it will appear on all three pages. but you can't noindex any of them. there's no way around it.