Duplicate Content Workaround

Forum Moderators: open

Message Too Old, No Replies

Duplicate Content Workaround

jsnively

12:00 am on Oct 7, 2004 (gmt 0)

On my site there are three pages that have a majority of duplicate content with each other. I tried to make it so that this is not the case, but due to the nature of the subject, and for logical reasons, it makes more sense to leave it that way. I'm sincerely aiming to provide valuable content and I would prefer not to say I am trying to "trick" Google. However, I was wondering if this would work:

Display the text in question on two of the three pages as images. It seems so simple, but I've never seen anyone propose it. Would this work?

Perhaps there's a good reason I've never seen this mentioned and I'm just missing something obvious :-)

WebGuerrilla

12:23 am on Oct 7, 2004 (gmt 0)

Why don't you just exclude the duplicate pages?

In the robot.txt

User-agent *
Disallow: /duplicate-page.html

or in the page itself

jsnively

12:35 am on Oct 7, 2004 (gmt 0)

I did do the <meta name ="robots" content="noindex">, but I would like for the page to be indexed by Google.

renee

12:46 am on Oct 7, 2004 (gmt 0)

that should work since google cannot read what's inside the image. it will read only the text part of your page.

sounds like a good idea of isolating dupe content which is necessary for the presentation.

AcsCh

2:29 pm on Oct 7, 2004 (gmt 0)

As much as I experienced, google does not "punish" duplicate pages, but rather ignores the duplicates, resp. "decides" which one might be the original and displays this one as if there where no duplicates. Duplicates you will just see when you click on "google suppressed a part of the results, show all.." (or so link at the end of the search results)
If you have the text as picture, it happens the same, no need to aument the loading time of your page with an image file..

renee

3:40 pm on Oct 7, 2004 (gmt 0)

>>google does not "punish" duplicate pages

this used to be so. but this days, google is aggressively dropping pages from the main index. so if the 2 pages have sufficient duplicate content, then one or both may be dropped.

i also have direct experience of pages having common content whereby for a particular query, google brings up the wrong page (from my point of view). this happens when the common content contains say keyword X but not in the non-dup content of page B but used a few time in the subject matter of page A. when querying keyword X, google brings up page B.

so there is some use to the proposed method of hiding common content. might call it reverse cloaking, where you want the user to see something but hide it from the robots!

fathom

3:45 pm on Oct 7, 2004 (gmt 0)

If you can - thrown the content into an iFrame... now one page being supported by 3 uniques, and on the visitor side the pages appear to be "as is".

mincklerstraat

3:47 pm on Oct 7, 2004 (gmt 0)

fathom: very nice, elegant answer.

Midhurst

4:31 pm on Oct 7, 2004 (gmt 0)

Renee,
How much duplicate content per page ie in percentage terms, does Google appear to accept?
Anyone know?

renee

4:54 pm on Oct 7, 2004 (gmt 0)

>>fathom: very nice, elegant answer.

yes indeed! very elegant, fathom

midhurst,

i have no idea of the % dup that triggers google. i've never really done any serious study. i would be very interested in what others have found.

jsnively

9:57 pm on Oct 7, 2004 (gmt 0)

>>If you can - thrown the content into an iFrame... now one page being supported by 3 uniques, and on the visitor side the pages appear to be "as is".

Could someone please explain to a computer-challenged person what fathom means?

renee

5:27 pm on Oct 8, 2004 (gmt 0)

say you have 2 (A, B) with common content. take the common content from A and B and place it in a new page C.

then in A and B, insert an iframe:

the C-content will then get loaded in the iframe location. furthermore, robots will consider C as a separate page altogether and will not be indexed as part of A or B.

kamran mohammed

5:39 pm on Oct 8, 2004 (gmt 0)

Thanxs Renee for ur explanation for iframe.....

KaMran

fathom

6:00 pm on Oct 8, 2004 (gmt 0)

Step by step guide on iFrames.

Note: the guides recommentations for not using specific attributes is valid for old browsers - but I wouldn't concern yourself with them - you will likely get very few "challenged ones"! ;)

[idocs.com...]

JudgeJeffries

6:18 pm on Oct 8, 2004 (gmt 0)

'Display the text in question on two of the three pages as images'
I did exactly that for 200 pages on a site. The only problem I had was that the graphic text was not as clear as I would have liked. There were no apparent duplicate problems.
So far as %ages are concerned I seem to recollect Brett mentioning a figure of about 15% some time ago, Can anyone confirm?

fathom

6:39 pm on Oct 8, 2004 (gmt 0)

So far as %ages are concerned I seem to recollect Brett mentioning a figure of about 15% some time ago, Can anyone confirm?

I missed this the first time around... this is controlled by the number of link generations between pages and the number of pages duplicated.

Examples:

If you have two pages (an original and carbon copy) linked to from the same adjacent pages 15% - 20% of the content can be dup'ed (this includes site nav bars - which is about that extra 5% about Brett's suggestion.

If pages (an original and carbon copy) are not link to from the same page - the more links between them (noting the shortest path) the more you can dup.

Stress - the number of dup pages is a major factor - Google works on pattern recognition and the more pattern it can profile the greater risk of penalty.

I tested 100% dup on 10 pages - with 3 link generations away from each all survived (this also notes why external mirrors / affiliates last - no direct linkage.

Unfortunately a client wanted to dup 15K pages (one for each city) and adding random content at about 20%. Google ranked all for 4 months then killed 14,800 of them.

For arugment sake though - there is no legit reason to dup - other than "short-cut" (saving time) and you should always go back a re-write dup pages - first and forthmost; it's a better strategy and if for no other reason - penalties can not be no easily identified, and in the long term that "short cut" will become a major liability.

webmaster99

7:50 pm on Oct 8, 2004 (gmt 0)

Thanks for all that information fathom.

Just a quick question: When people talk about duplicate content, is this working on visible text only, or the complete html source of the page?

Just something I've never been sure about.

Thanks

GodLikeLotus

7:54 pm on Oct 8, 2004 (gmt 0)

If someone steals and copies your entire site, and then gets a couple of good links, then suddenly rates higher than the original copy, what can you do?

I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.

My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.

Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.

Any advice?

fathom

10:30 pm on Oct 8, 2004 (gmt 0)

Just a quick question: When people talk about duplicate content, is this working on visible text only, or the complete html source of the page?

Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>

This most often is "text copy" but a page made solely of images can gain a dup penalty as well.

You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.

fathom

10:42 pm on Oct 8, 2004 (gmt 0)

If someone steals and copies your entire site, and then gets a couple of good links, then suddenly rates higher than the original copy, what can you do?
I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.
My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.
Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.
Any advice?

Copyright infringement is the greater issue here and if you take care of that you will likely fix your ranking problem - provided nothing else is creating a negative effect.

Google offers some exceptional advice through Digital Millennium Copyright Act (DMCA) - read it, follow through.

[google.com...]

Additionally a helpful tool will make things a little easier to be more precise in your complaint.

[copyscape.com...]

GodLikeLotus

8:36 am on Oct 9, 2004 (gmt 0)

>fathom

DMCA does not quite do it for me. The site in question is a directory and I do not own the copywrite on 10,000's of business names and addresses.

My point is that Google does not seem to look at which site came first?

markdidj

11:27 am on Oct 9, 2004 (gmt 0)

I use javascript to display common text.

if you use server side scripting you could set up the text in a normal text file, read it with server side then display it either as HTML or javascript depending on the page that calls it.

If you wanted to, you could also have a link that displays just the text, for site-impaired or mobile phone users.

fathom

8:08 pm on Oct 9, 2004 (gmt 0)

If I understand correctly this is similar to DMOZ & Google directory (but Google had licensed permission to use a copy).

In general, if you are the oldest copy this isn't the problem.

webmaster99

8:59 pm on Oct 9, 2004 (gmt 0)

Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>
This most often is "text copy" but a page made solely of images can gain a dup penalty as well.
You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.

So, just to be clear, you're saying that the bots will include all of the html source in their duplication calculations. So including all the <table><tr><td>....etc.. code.

OK - at least that is now cleared up for me. Thanks.

fathom

1:56 pm on Oct 11, 2004 (gmt 0)

yes - and to be a bit more clear.

1. two pages right up to two websites 100% dup text will survive "filters" if one is tables and the other is tableless (totally CSS)

2. changing tables columns & rows orientations will produce a similar effect or alternate div placement in CSS.

3. Noting main nav bars - is dup content (same on every page - whether we think of this or not) it counts to a certain percentage of "fair dup"... and you can easily see this at news services that share dup stories - but navigation is often totally different.

...and you can truly see the extent of duplication you can do in directories that have pre-developed categories but no listings being displayed for long periods - the naming convention (off page factors) biases pages "differently" (KEY POINT) where intentional duplication design for manipulation "link to" and biases "similiarly"... thus a penalty.