Forum Moderators: open

Message Too Old, No Replies

Printer Friendly and PDF Version

Do you exclude in robots.txt?

         

mrowton

7:45 pm on Feb 4, 2005 (gmt 0)

10+ Year Member



Hello all,

I have a site that contains only articles / white papers. Each article also contains a printer friendly and PDF version. This is of course the same content but without ads, navigation, or branding information.

Do you guys exclude these from search engines on the premise that the actual article will show up for the search, or do you allow these to be indexed?

Also, have you had problems with people linking to the non-branded content?

Thanks in advance

Mitchell

Jalinder

2:25 pm on Feb 6, 2005 (gmt 0)

10+ Year Member



I also want to know if printer friendly pages are considered duplicate content by search engines.

I noticed one well optimized site using <META NAME="ROBOTS" CONTENT="NOINDEX"> in the print version pages.

Hoping for some guidance from experts.

pageoneresults

2:06 pm on Feb 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you guys exclude these from search engines on the premise that the actual article will show up for the search, or do you allow these to be indexed?

As Jalinder points out above, the best method to keep those pages out of the index are to utilize the above Robots META Tag with noindex.

<meta name="robots" content="noindex">

On a side note, once I picked up on CSS a few years ago, I stopped making duplicate pages for print versions. Now we utilize a print stylesheet and apply classes to blocks of content that should not print. For example, in my CSS...

.none{display:none;}

And then in my content that I don't want to print...

<div class="none"></div>

There is much that can be done using CSS and print stylesheets. There are plenty of topics here at WebmasterWorld to review on this subject.

Print Style Sheets [google.com]

limbo

2:36 pm on Feb 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Pageone

Would you also use the nofollow attribute? If you have noindex in the meta robots tag but not the nofollow will the links on that page be spidered - ie index the dupe PDF's?

Is there a way to imbed noindex meta data into a PDF? or indeed does noindex act as a barrier?

Completely agree about the use of CSS for on and off screen media. Made the same comment myself this morning - also applies to text only versions and low graphics versions.

Would be interested to know if HTML and Flash mixed sites could be seen as duplicate content too (if there is imbedded html in the .swf)

pageoneresults

2:50 pm on Feb 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would you also use the nofollow attribute? If you have noindex in the meta robots tag but not the nofollow will the links on that page be spidered - ie index the dupe PDF's?

I would have used the nofollow up until about 2 weeks ago when someone I know and trust on matters related to indexing shared a little insight with me.

If there is a link anywhere on the web to those documents you don't want indexed, Googlebot is going to follow it. And so will other spiders. You can drop it in there for good measure but I don't think it will produce the desired result.

<meta name="robots" content="noindex, nofollow">

I figure if I can keep the bot from indexing that page, it can follow whatever it wants.

You can also use the new

<a rel="nofollow">
link relationship tag on those links if you really want to get granular as encyclo would say. ;)

limbo

3:02 pm on Feb 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cheers Pageone

I had a feeling that it was going to be quite optimistic to stop a google bot if it had already decided that it was going to crawl a page.... like the idea of using the <a rel="nofollow"> attribute and getting a little more granular ;).

Jalinder

4:22 pm on Feb 7, 2005 (gmt 0)

10+ Year Member



We had all our print versions in google index and lot of them were bringing traffic also, especially the ones in which main article had paging.

But in current update the site has lost many rankings probably a penalty. That's why I wanted to know if print versions are considered duplicates. Thanks for the info.

How about using <a rel="nofollow"> on the link "Print this page" from main/graphical article? Or will it affect ranking of the main article for linking to nofollows?

mrowton

1:37 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



I see several people searching for my particular keywords with .pdf extensions. If you publish longish technical documents then people seem to trust pdf's a little more.

In this case I'm worried about a google penalty, even though I think it would help viewers to have both indexed.

I'm still not sure if one part of the question has been answered. Does google consider pdf and html file formats with the same content to be duplicates?