Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate content - filtering by degree?

         

Whitey

11:33 pm on Jul 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wanted to test a theory that Google invokes a penalty / filter by degrees for duplicate content as a proportion of the overall pages on a site.

Is anyone seeing in WMT the number of duplicate pages is proportionate to the penalty outcome. I guess the exact mathematics will never be known, is there an indicator that it may play a part.

e.g.

100% duplicate content = - 950+
-30 / 40 / 50 lesser levels of duplication

tedster

11:53 pm on Jul 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you talking about same-site duplicates, or duplicates across different domains? I'm sure that Google handles these situations differently.

Also - I'd make a strong distinction between filtering and penalties. A page can easily be filtered out of a search with no penalty applied - and that's the most common experience, by far. A drop in ranking is much more likely to indicate a loss of trust, most often involving off-page factors but also on-page things keyword crammed pages, etc.

I suppose excessive duplication across a website could also trigger a drop in ranking, but why even bother unless there appears to be some intentional manipulation going on? Just filter them out.

Whitey

1:11 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you talking about same-site duplicates, or duplicates across different domains? I'm sure that Google handles these situations differently.

That's a good point. WMT doesn't provide info on off page duplication, just on site duplication. So i guess I'm focusing on internal duplication/similarity of URL's , meta descriptions and titles.

But what has me scratching my head is that if complete duplication has occurred through canonical problems, for example , then the whole site is usually brought down in the SERP's , giving the appearance of what's maybe often reported [ or confused with ] a -950 penalty [ or filter ].

So if an entire site can disappear with "total" duplication, can partial duplication bring down a site by degrees giving the appearance of -30/40/50 type penalty/filters ?

I was hoping some folks who are experiencing those penalty/filters might look at their WMT accounts to see what proportion and type of duplicate content they have.

Quadrille

1:11 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no penalty for duplicate content, just filters.

There are other issues, like age, that may have an effect; some duplicates linger for ages before being 'bumped', while identical news stories appear on 333 sites when first FTP'd

Also, by and large, Google works page-by-page, not site-by-site - I doubt there's any site issues with duplicate content, just page issues.

The main area of doubt is how much of a page has to be duplicated to trip the filter. About half, is my guess.

And if you can predict which page will go, when two appear, then you'll go far:

It isn't necessarily age.
It isn't necessarily page rank.
It isn't necessarily the largest site.
or the smaller one.
It isn't necessarily the copyright owner.

Whitey

1:22 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also, by and large, Google works page-by-page, not site-by-site

But the filtering effect of page by page can be site-wide.

Therefore what's the effect of partial duplication?

helpnow

2:30 am on Jul 7, 2008 (gmt 0)

10+ Year Member



I agree with site-wide. I BELIEVE I am currently experiencing a site-wide reduction in ranking due to duplicate meta descriptions and title tags. GWT says I have 15,187 duplicate meta descriptions, and 18,029 duplicate title tags. (curiously,the number increases every day by a hundred or so, and this is becuase Google is still reporting new duplicates it found 11 days ago...)

My rankings have suffered from Page 1, #1, 2 or 3, to things like Page 1, #4, Page 2, #1, Page 3 #4, etc. Not a complete disaster, but a definite reduction.

Now then, for the pages that have been reduced, not all were suffering from these duplicate content issues. Of the 3 specific examples listed above, only 1 was a duplicate meta desc. So it is site-wide, based on the fact that some pages have a problem.

Oh, site:domain.com gives me 378,000 results.

So, for sure, I am suffering a site-wide reduction, and I assume it is due to some duplicate content I have here and there on my site.

Curiously, I also still have some Page 1, #1 rankings too. Is this because I have started to come back? I don't know...

But there is no correlation between which pages went down in the rankings, and which had duplicate content. Could be that the ones that had duplicate content were always filtered down a bit, not sure yet - haven't spent the time to analyze that. Too busy fixing my duplicates... ; )

Oh, for the record, I lost my ranking June 4. Like an idiot, it took me till July 2 to figure out it may be these duplicates (I was happily blaming it on another issue, but that is now resolved and I have other reasons now to believe that was a non-issue). By the end of July 2, I had programmatically addressed, say, 90% of my duplicates. So, I still have about 10% to fix which need to be addressed a bit more closely. Meanwhile, while I fix the remaining 10% or so, I suspect 90% of a fix is good enough to restore most of my rankings, so assuming a 10-15 day cycle to get my fixes into the SERPs, I am hoping for a partial/full restore by July 12-17. I'll continue to post and keep you updated until that time, whenever it comes. ; )

tedster

2:31 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One common effect could be sending the near-duplicates to the supplemental index, or whatever that has evolved into.

No matter how Google deals with any given case, the fix is is still pretty clear. Patch up the website's technology so that duplicate, tripilcate, and octuplicate versions of urls cannot all resolve with a 200. Find ways not to generate stub pages or thin pages - at least, keep Google from indexing them. Make sure that titles and meta descriptions are unique and page-specific for the pages that you do let Google index.

Quadrille

9:10 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I BELIEVE I am currently experiencing a site-wide reduction in ranking due to duplicate meta descriptions and title tags.

No-one can say for sure that you are mistaken; but I strongly suspect that you are.

First, I believe that duplicate content and duplicate tags are slightly different issues, and I suspect duplicate title tag can get a page completely delisted, though I have no guess as to why this sometimes seems to happen, but not always.

I have had the problem of duplicate meta descriptions, and the affected pages went 'supplemental-ish' - once I'd fixed the pages, full listing returned.

There was zero effect on other pages.

I agree that if you've got a wide-spread page problem, it's effects seem no different to a site problem - but a page problem needs pages fixing, and it's important not to lose sight of that.

From all that I've experienced, seen on others' sites, and read, I really don't see any evidence of site problems as a result of duplication; so if you have them, then I think you'd be wiser to look for a second issue, rather than assume it's a duplication issue. Once you fix the duplication issue, you'll know for sure within a few weeks, of course.

g1smd

11:42 am on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Quasi~parallel discussion: [webmasterworld.com...]

tedster

4:33 pm on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, let's try to differentiate these two parallel discussions.

The thread that g1smd linked to is about duplicate TITLES and DESCRIPTIONS - as identified in WMT.
This thread's topic is duplicate CONTENT. WMT does not tell you about duplicate content.