Forum Moderators: open
Display the text in question on two of the three pages as images. It seems so simple, but I've never seen anyone propose it. Would this work?
Perhaps there's a good reason I've never seen this mentioned and I'm just missing something obvious :-)
this used to be so. but this days, google is aggressively dropping pages from the main index. so if the 2 pages have sufficient duplicate content, then one or both may be dropped.
i also have direct experience of pages having common content whereby for a particular query, google brings up the wrong page (from my point of view). this happens when the common content contains say keyword X but not in the non-dup content of page B but used a few time in the subject matter of page A. when querying keyword X, google brings up page B.
so there is some use to the proposed method of hiding common content. might call it reverse cloaking, where you want the user to see something but hide it from the robots!
then in A and B, insert an iframe:
<iframe src="C-url"></iframe>
the C-content will then get loaded in the iframe location. furthermore, robots will consider C as a separate page altogether and will not be indexed as part of A or B.
Note: the guides recommentations for not using specific attributes is valid for old browsers - but I wouldn't concern yourself with them - you will likely get very few "challenged ones"! ;)
[idocs.com...]
So far as %ages are concerned I seem to recollect Brett mentioning a figure of about 15% some time ago, Can anyone confirm?
I missed this the first time around... this is controlled by the number of link generations between pages and the number of pages duplicated.
Examples:
If you have two pages (an original and carbon copy) linked to from the same adjacent pages 15% - 20% of the content can be dup'ed (this includes site nav bars - which is about that extra 5% about Brett's suggestion.
If pages (an original and carbon copy) are not link to from the same page - the more links between them (noting the shortest path) the more you can dup.
Stress - the number of dup pages is a major factor - Google works on pattern recognition and the more pattern it can profile the greater risk of penalty.
I tested 100% dup on 10 pages - with 3 link generations away from each all survived (this also notes why external mirrors / affiliates last - no direct linkage.
Unfortunately a client wanted to dup 15K pages (one for each city) and adding random content at about 20%. Google ranked all for 4 months then killed 14,800 of them.
For arugment sake though - there is no legit reason to dup - other than "short-cut" (saving time) and you should always go back a re-write dup pages - first and forthmost; it's a better strategy and if for no other reason - penalties can not be no easily identified, and in the long term that "short cut" will become a major liability.
I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.
My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.
Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.
Any advice?
Just a quick question: When people talk about duplicate content, is this working on visible text only, or the complete html source of the page?
Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>
This most often is "text copy" but a page made solely of images can gain a dup penalty as well.
You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.
If someone steals and copies your entire site, and then gets a couple of good links, then suddenly rates higher than the original copy, what can you do?I built a directory and a customer buying traffic stole a copy of the whole site. I did do a % check on 2 of the pages in question, they were roughly 50% the same.
My original site was launched in July 2003, the copy of the site appeared around December 2003. Our Google rating vanished around March 2004. The copy of our site sits pretty in Google.
Our site has showed a PR of 5 since November 2003, but our pages are just no where to be seen. They are in the Google index, they just rate at like 850th position.
Any advice?
Copyright infringement is the greater issue here and if you take care of that you will likely fix your ranking problem - provided nothing else is creating a negative effect.
Google offers some exceptional advice through Digital Millennium Copyright Act (DMCA) - read it, follow through.
[google.com...]
Additionally a helpful tool will make things a little easier to be more precise in your complaint.
[copyscape.com...]
if you use server side scripting you could set up the text in a normal text file, read it with server side then display it either as HTML or javascript depending on the page that calls it.
If you wanted to, you could also have a link that displays just the text, for site-impaired or mobile phone users.
Bot are blind in a graphical sense... they rely on read code. Therefore duplication ends to be a combination of everything between <body> & </body>This most often is "text copy" but a page made solely of images can gain a dup penalty as well.
You can - edit things like nav bars (e.g. one page a left nav bar another have a right nav bar and because "the code" arrangement is different to reduces the chance of penalties on pure dup'ed text... but there are no firm rules in this grey area.
So, just to be clear, you're saying that the bots will include all of the html source in their duplication calculations. So including all the <table><tr><td>....etc.. code.
OK - at least that is now cleared up for me. Thanks.
1. two pages right up to two websites 100% dup text will survive "filters" if one is tables and the other is tableless (totally CSS)
2. changing tables columns & rows orientations will produce a similar effect or alternate div placement in CSS.
3. Noting main nav bars - is dup content (same on every page - whether we think of this or not) it counts to a certain percentage of "fair dup"... and you can easily see this at news services that share dup stories - but navigation is often totally different.
...and you can truly see the extent of duplication you can do in directories that have pre-developed categories but no listings being displayed for long periods - the naming convention (off page factors) biases pages "differently" (KEY POINT) where intentional duplication design for manipulation "link to" and biases "similiarly"... thus a penalty.