Welcome to WebmasterWorld Guest from 54.161.227.32

Forum Moderators: open

Message Too Old, No Replies

Duplicate Content Workaround

     
12:00 am on Oct 7, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 6, 2004
posts:19
votes: 0


On my site there are three pages that have a majority of duplicate content with each other. I tried to make it so that this is not the case, but due to the nature of the subject, and for logical reasons, it makes more sense to leave it that way. I'm sincerely aiming to provide valuable content and I would prefer not to say I am trying to "trick" Google. However, I was wondering if this would work:

Display the text in question on two of the three pages as images. It seems so simple, but I've never seen anyone propose it. Would this work?

Perhaps there's a good reason I've never seen this mentioned and I'm just missing something obvious :-)

12:23 am on Oct 7, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0



Why don't you just exclude the duplicate pages?

In the robot.txt

User-agent *
Disallow: /duplicate-page.html

or in the page itself

<meta name ="robots" content="noindex">

12:35 am on Oct 7, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 6, 2004
posts:19
votes: 0


I did do the <meta name ="robots" content="noindex">, but I would like for the page to be indexed by Google.
12:46 am on Oct 7, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 25, 2002
posts:207
votes: 0


that should work since google cannot read what's inside the image. it will read only the text part of your page.

sounds like a good idea of isolating dupe content which is necessary for the presentation.

2:29 pm on Oct 7, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 11, 2003
posts:71
votes: 0


As much as I experienced, google does not "punish" duplicate pages, but rather ignores the duplicates, resp. "decides" which one might be the original and displays this one as if there where no duplicates. Duplicates you will just see when you click on "google suppressed a part of the results, show all.." (or so link at the end of the search results)
If you have the text as picture, it happens the same, no need to aument the loading time of your page with an image file..
3:40 pm on Oct 7, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 25, 2002
posts:207
votes: 0


>>google does not "punish" duplicate pages

this used to be so. but this days, google is aggressively dropping pages from the main index. so if the 2 pages have sufficient duplicate content, then one or both may be dropped.

i also have direct experience of pages having common content whereby for a particular query, google brings up the wrong page (from my point of view). this happens when the common content contains say keyword X but not in the non-dup content of page B but used a few time in the subject matter of page A. when querying keyword X, google brings up page B.

so there is some use to the proposed method of hiding common content. might call it reverse cloaking, where you want the user to see something but hide it from the robots!

3:45 pm on Oct 7, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


If you can - thrown the content into an iFrame... now one page being supported by 3 uniques, and on the visitor side the pages appear to be "as is".
3:47 pm on Oct 7, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:1001
votes: 0


fathom: very nice, elegant answer.
4:31 pm on Oct 7, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 19, 2004
posts:71
votes: 0


Renee,
How much duplicate content per page ie in percentage terms, does Google appear to accept?
Anyone know?
4:54 pm on Oct 7, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 25, 2002
posts:207
votes: 0


>>fathom: very nice, elegant answer.

yes indeed! very elegant, fathom

midhurst,

i have no idea of the % dup that triggers google. i've never really done any serious study. i would be very interested in what others have found.

9:57 pm on Oct 7, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 6, 2004
posts:19
votes: 0


>>If you can - thrown the content into an iFrame... now one page being supported by 3 uniques, and on the visitor side the pages appear to be "as is".

Could someone please explain to a computer-challenged person what fathom means?

5:27 pm on Oct 8, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Nov 25, 2002
posts:207
votes: 0


say you have 2 (A, B) with common content. take the common content from A and B and place it in a new page C.

then in A and B, insert an iframe:

<iframe src="C-url"></iframe>

the C-content will then get loaded in the iframe location. furthermore, robots will consider C as a separate page altogether and will not be indexed as part of A or B.

5:39 pm on Oct 8, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:June 29, 2004
posts:81
votes: 0


Thanxs Renee for ur explanation for iframe.....

KaMran

6:00 pm on Oct 8, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


Step by step guide on iFrames.

Note: the guides recommentations for not using specific attributes is valid for old browsers - but I wouldn't concern yourself with them - you will likely get very few "challenged ones"! ;)

[idocs.com...]

6:18 pm on Oct 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 31, 2002
posts:880
votes: 0


'Display the text in question on two of the three pages as images'
I did exactly that for 200 pages on a site. The only problem I had was that the graphic text was not as clear as I would have liked. There were no apparent duplicate problems.
So far as %ages are concerned I seem to recollect Brett mentioning a figure of about 15% some time ago, Can anyone confirm?
6:39 pm on Oct 8, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


So far as %ages are concerned I seem to recollect Brett mentioning a figure of about 15% some time ago, Can anyone confirm?

I missed this the first time around... this is controlled by the number of link generations between pages and the number of pages duplicated.

Examples:

If you have two pages (an original and carbon copy) linked to from the same adjacent pages 15% - 20% of the content can be dup'ed (this includes site nav bars - which is about that extra 5% about Brett's suggestion.

If pages (an original and carbon copy) are not link to from the same page - the more links between them (noting the shortest path) the more you can dup.

Stress - the number of dup pages is a major factor - Google works on pattern recognition and the more pattern it can profile the greater risk of penalty.

I tested 100% dup on 10 pages - with 3 link generations away from each all survived (this also notes why external mirrors / affiliates last - no direct linkage.

Unfortunately a client wanted to dup 15K pages (one for each city) and adding random content at about 20%. Google ranked all for 4 months then killed 14,800 of them.

For arugment sake though - there is no legit reason to dup - other than "short-cut" (saving time) and you should always go back a re-write dup pages - first and forthmost; it's a better strategy and if for no other reason - penalties can not be no easily identified, and in the long term that "short cut" will become a major liability.

7:50 pm on Oct 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 6, 2004
posts:9
votes: 0


Thanks for all that information fathom.

Just a quick question: When people talk about duplicate content, is this working on visible text only, or the complete html source of the page?

Just something I've never been sure about.

Thanks

7:54 pm on Oct 8, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:June 11, 2002
posts:568
votes: 0


If someone steals and copies your entire site, and then gets a couple of good links, then suddenly rates higher than the original copy, what can you do?

I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.

My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.

Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.

Any advice?

10:30 pm on Oct 8, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


Just a quick question: When people talk about duplicate content, is this working on visible text only, or the complete html source of the page?

Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>

This most often is "text copy" but a page made solely of images can gain a dup penalty as well.

You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.

10:42 pm on Oct 8, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


If someone steals and copies your entire site, and then gets a couple of good links, then suddenly rates higher than the original copy, what can you do?

I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.

My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.

Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.

Any advice?

Copyright infringement is the greater issue here and if you take care of that you will likely fix your ranking problem - provided nothing else is creating a negative effect.

Google offers some exceptional advice through Digital Millennium Copyright Act (DMCA) - read it, follow through.

[google.com...]

Additionally a helpful tool will make things a little easier to be more precise in your complaint.

[copyscape.com...]

8:36 am on Oct 9, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:June 11, 2002
posts:568
votes: 0


>fathom

DMCA does not quite do it for me. The site in question is a directory and I do not own the copywrite on 10,000's of business names and addresses.

My point is that Google does not seem to look at which site came first?

11:27 am on Oct 9, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 6, 2003
posts:125
votes: 0


I use javascript to display common text.

if you use server side scripting you could set up the text in a normal text file, read it with server side then display it either as HTML or javascript depending on the page that calls it.

If you wanted to, you could also have a link that displays just the text, for site-impaired or mobile phone users.

8:08 pm on Oct 9, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


If I understand correctly this is similar to DMOZ & Google directory (but Google had licensed permission to use a copy).

In general, if you are the oldest copy this isn't the problem.

8:59 pm on Oct 9, 2004 (gmt 0)

New User

10+ Year Member

joined:Oct 6, 2004
posts:9
votes: 0


Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>

This most often is "text copy" but a page made solely of images can gain a dup penalty as well.

You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.

So, just to be clear, you're saying that the bots will include all of the html source in their duplication calculations. So including all the <table><tr><td>....etc.. code.

OK - at least that is now cleared up for me. Thanks.

1:56 pm on Oct 11, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
posts:4110
votes: 109


yes - and to be a bit more clear.

1. two pages right up to two websites 100% dup text will survive "filters" if one is tables and the other is tableless (totally CSS)

2. changing tables columns & rows orientations will produce a similar effect or alternate div placement in CSS.

3. Noting main nav bars - is dup content (same on every page - whether we think of this or not) it counts to a certain percentage of "fair dup"... and you can easily see this at news services that share dup stories - but navigation is often totally different.

...and you can truly see the extent of duplication you can do in directories that have pre-developed categories but no listings being displayed for long periods - the naming convention (off page factors) biases pages "differently" (KEY POINT) where intentional duplication design for manipulation "link to" and biases "similiarly"... thus a penalty.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members