This post refers to this older post [webmasterworld.com].
I can think of 3 main approaches:
1. Including the duplicate content in an external Js, assign it to variables, and do innerHTML to some divs. (old style)
2. Using XmlHTTPRequest (GET) to retrieve the data in XML format and then put it into the page.
3. Do an Ajax POST and retrieve the XML content with this. (slower as the page will be processed twice by the server but this seems the safest as in my research Google cannot do post requests).
A 4th method would be using an encryption that google can't solve (I would know how to do that), but I am reluctant to cheat or to be appearing like cheating as I care about my site.... so I don't want to use it.
I think this is a very important and complex topic, what do you think is the best approach to get rid of dupe? Do you think google could get suspicious with any of these approaches? I don't mind that google reads the content, but I would like that google doesn't consider that text as much as the unique text on the page.
Please give me your opinion even if you are not sure or you didn't test it, I would like to hear all opinions and maybe we can find a good answer.
I'm inclined to favour method #2 above. Simply because it'd be easy to tuck your content into a CMS, or into plain *.html files, and loading the content wouldn't take much longer than loading other dependent files like images, *.js and *.css files.
However I suspect (without proof) that bots go trolling for any URLs found within scripts. For instance, if I happen to inject this script onto a page:
var dummy = "http://www.example.com/dummypage.html"
My theory is that Googlebot will visit that URL and try to index it. Even though it's just a string in a JS variable, never actually used for anything.
'Twould be a simple hypothesis to test...
A fifth option is to use iframes, the contents of which theoretically wouldn't "count" as being on the same URL as the page which contains it. Again, another untested hypothesis. There are people who specialize in this kind of myth debunking, I'm disappointed that none of them are contributing here...
My natal language is spanish, not english... I don't understand if you are trying to put dup content but avoiding Google to see it?
So, you're suggesting good old cloacking... and possible penalties, number 2 has my vote.
|use iframes, the contents of which theoretically wouldn't "count" as being on the same URL as the page which contains it |
using this method on a few sites.
iframes are treated similar to links from the host page to the one displayed. Content is indexed only for the URL in the frame. You can use the iframe on as many pages as you want to. it'll promote the importance of its target. on my sites it even has PageRank with no actual links to it.
and once separated this way, you can decide whether to add or not add noindex, nocache if you want to... I haven't done this but it only makes sense that it'd work as it should on any other URL.
I use method #5 - I don't have or use potential duplicate content on my site.
I guess the fact that I've never seen (or more accurately, never noticed) these in SERPs might be evidence to the contrary, but just because something doesn't rank well doesn't mean it wouldn't throw a wrench into my SEO.
If you had content in external JS files, could you exclude your js directory in robots.txt?
Where is the danger for dupe content when the bot cant even render pages?
Would invite comments from the experts..