Forum Moderators: open

Message Too Old, No Replies

Avoid indexing print page

how do i force G not to include print page

         

iwannano1

7:34 pm on Jan 14, 2007 (gmt 0)

10+ Year Member



Hi

Almost all my article has print button which open a new url
http://example.com/article.php/123455 -> article
http://example.com/article.php/123455/print -> print page

However google/yahoo gives high preferences to print page url http://example.com/article.php/123455/print most user hit to print page w/o ads

How do I force google not to include print page? robots.txt file? Webserver redirection?

I cannot change site url structure as it is linked from various other web-pages.

TIA

[edited by: pageoneresults at 8:22 pm (utc) on Jan. 14, 2007]
[edit reason] Examplified URI References [/edit]

Robert Charlton

8:57 pm on Jan 14, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'd use the robots meta tag in the head section of each print page...

<meta name="robots" content="noindex, nofollow">

The robots meta seems to provide the best insulation for Google. It (theoretically) stops Google from indexing a page as well as references (links) to it.

If you use robots.txt to block the print directory, Google will still spider and index references it finds that lead to the print pages, but it won't spider the pages themselves. For this reason, you should not use both robots.txt and the robots meta tag. If you use both, Google will see the robots.txt first and won't spider the pages. Thus, it won't see the robots meta tag, and you may end up with the links to your print pages indexed.

[edited by: Robert_Charlton at 9:02 pm (utc) on Jan. 14, 2007]

iwannano1

10:30 pm on Jan 14, 2007 (gmt 0)

10+ Year Member



Hello,

Thanks for tip.

My final solution is as follows:

robots.txt

Disallow: /print.php

Ok I have added following tag to print.php header file

<meta name="Robots" content="noindex" />

Each url is as follows in article.php/12555/


<a href="/print.php?id=12555" title="Page This Page" rel="nofollow">Print This Page</a>

Hope this will now stop google and yahoo. Please comment back on final solution.

My question is - Shell I remove Disallow /print.php line from robots.txt file?

Robert Charlton

1:55 am on Jan 15, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



My question is - Shell I remove Disallow /print.php line from robots.txt file?

iwannano1 - I feel that the meta robots tag by itself is the better solution.

If you're going to use the meta robots tag, you should not also use the robots.txt disallow, for the reasons I describe in my post. I suggest you reread it.

agerhart

5:56 pm on Jan 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could also try using a non SE friendly link format for those print links so the spiders don't crawl them.