Duplicate Content

Forum Moderators: open

Message Too Old, No Replies

Duplicate Content

Page-by-page or site-wide penalties?

Pedent

10:49 am on Nov 29, 2004 (gmt 0)

Am I right in thinking that Google deals with duplicate content on a page-by-page basis, rather than site-wide? I have a well-establish site with all original content, and would like to add some classic texts, now public domain (it'll be useful for visitors). If these pages get picked up as duplicate content, then I guess Google will drop them. That's fine, but would adding this content jeopardise my original content's place in G's index?

prairie

2:34 pm on Nov 29, 2004 (gmt 0)

This is a great question. My guess is that too much similarity between hosts could lead to one dropping out of visibility, but I can't say for sure.

Does anyone have the necessary experience to answer this one?

phantombookman

3:03 pm on Nov 29, 2004 (gmt 0)

Unless you have loads of dup content then in my experience just the page gets dropped.

Why not add to the page or make it different. Add some value to the page, above and below the dup' cont' and it will rank.

I am not sure of the %age difference required but it cannot be that great, I see the wiki scrapers etc ranking well!

Pedent

3:21 pm on Nov 29, 2004 (gmt 0)

Thanks for the responses so far. The original page-by-page or site-wide question still stands, but more on the level of difference required to avoid a page-penalty would be useful; would a single paragraph at the top describing the content and significance of the work be enough?

jetboy_70

3:58 pm on Nov 29, 2004 (gmt 0)

Unless you have loads of dup content then in my experience just the page gets dropped

I'd second that. The pages are still in the SERPs, but they're listed as URLs only.

There does seem to be a threshold of duplication where a whole site can be penalized though, regardless of whether it has a small amount of unique content on or not.

prairie

4:23 pm on Nov 29, 2004 (gmt 0)

jetboy_70, do you know if duplication is limited to on-page text or extended to things like link structure?

jetboy_70

4:36 pm on Nov 29, 2004 (gmt 0)

I've been led to believe (by those who know what they're talking about) that the structure of the data plays a big part. If your structure's different then you can get away with a lot more duplicate content. No first-hand experience though.

zeus

5:11 pm on Nov 29, 2004 (gmt 0)

how much nedd to be change, what about a little in title and H1 at the top is that enogh.

prairie

4:55 pm on Dec 4, 2004 (gmt 0)

how much nedd to be change, what about a little in title and H1 at the top is that enogh.

Obviously I don't know, but I get the impression that the filter is quite tight.

For it to be working well, it would have to leave out duplicate content that shows up in a repeated context.

mark1615

5:00 pm on Dec 5, 2004 (gmt 0)

Here is a related subject. We have a site hosted by a well-known ecommerce/shopping cart host. Store inventory is on asp individual product pages. They have a service that makes static html pages out of the dynamic pages. You end up then with 2 copies of each product pages - asp and html. It is our view that G takes a dim view of this. Any thoughts?

jetboy_70

8:05 pm on Dec 5, 2004 (gmt 0)

Unless you are taking steps to prevent one or the other set of pages getting spidered then it's an unnecessary risk.

What kind of steps? Not linking the pages into your site structure; rewriting requests for dynamic URLs to the static URLs; blocking spider access with either robots.txt or robots meta tags. Any one of the options would do the trick.

vincevincevince

8:59 pm on Dec 5, 2004 (gmt 0)

Page by page, certainly.

The real question is - whether duplicate content filters apply to PARTS of pages or whole pages.

Patrick Taylor

10:52 pm on Dec 5, 2004 (gmt 0)

I've added dozens of pages of classic texts to one of my sites. Those texts already existed on lots of other sites, some more well-established than mine - for those texts, at least. My pages are doing nicely.

In fact the texts were rather long so I broke them down into bite-size chunks (with "next" and "previous" links), so they represent a lot more pages per text than do the corresponding sets of pages of my opponents. Whether this helped to reduce the duplication as seen by Google, I can't say for sure but I doubt if Google is too concerned about this sort of thing. I believe the risky sort of duplicate content is where pages are more or less identical in all respects, giving rise to the spam suspicion.

Patrick Taylor

11:29 pm on Dec 5, 2004 (gmt 0)

Further... after a little experiment...

Doing a Google search for a piece of one of my classic texts - search without speechmarks - my page is #1 out of 33,700. Search with speechmarks and Google returns only two results (my page not returned) but with an option to repeat the search with the omitted results included. This returns 48 pages on various sites that contain that classic text - my page is #4 out of 48.

Purely anecdotal, but circumstantially suggests that this is not seen as duplicate content of the spam variety, ie which might hurt a site more generally.