Forum Moderators: open
I have a site that contains only articles / white papers. Each article also contains a printer friendly and PDF version. This is of course the same content but without ads, navigation, or branding information.
Do you guys exclude these from search engines on the premise that the actual article will show up for the search, or do you allow these to be indexed?
Also, have you had problems with people linking to the non-branded content?
Thanks in advance
Mitchell
Do you guys exclude these from search engines on the premise that the actual article will show up for the search, or do you allow these to be indexed?
As Jalinder points out above, the best method to keep those pages out of the index are to utilize the above Robots META Tag with noindex.
<meta name="robots" content="noindex"> On a side note, once I picked up on CSS a few years ago, I stopped making duplicate pages for print versions. Now we utilize a print stylesheet and apply classes to blocks of content that should not print. For example, in my CSS...
.none{display:none;} And then in my content that I don't want to print...
<div class="none"></div> There is much that can be done using CSS and print stylesheets. There are plenty of topics here at WebmasterWorld to review on this subject.
Print Style Sheets [google.com]
Would you also use the nofollow attribute? If you have noindex in the meta robots tag but not the nofollow will the links on that page be spidered - ie index the dupe PDF's?
Is there a way to imbed noindex meta data into a PDF? or indeed does noindex act as a barrier?
Completely agree about the use of CSS for on and off screen media. Made the same comment myself this morning - also applies to text only versions and low graphics versions.
Would be interested to know if HTML and Flash mixed sites could be seen as duplicate content too (if there is imbedded html in the .swf)
Would you also use the nofollow attribute? If you have noindex in the meta robots tag but not the nofollow will the links on that page be spidered - ie index the dupe PDF's?
I would have used the nofollow up until about 2 weeks ago when someone I know and trust on matters related to indexing shared a little insight with me.
If there is a link anywhere on the web to those documents you don't want indexed, Googlebot is going to follow it. And so will other spiders. You can drop it in there for good measure but I don't think it will produce the desired result.
<meta name="robots" content="noindex, nofollow"> I figure if I can keep the bot from indexing that page, it can follow whatever it wants.
You can also use the new
<a rel="nofollow"> link relationship tag on those links if you really want to get granular as encyclo would say. ;)
But in current update the site has lost many rankings probably a penalty. That's why I wanted to know if print versions are considered duplicates. Thanks for the info.
How about using <a rel="nofollow"> on the link "Print this page" from main/graphical article? Or will it affect ranking of the main article for linking to nofollows?
In this case I'm worried about a google penalty, even though I think it would help viewers to have both indexed.
I'm still not sure if one part of the question has been answered. Does google consider pdf and html file formats with the same content to be duplicates?