Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google indexing Drupal's /comment/reply/* despite robots.txt

         

smithaa02

4:24 pm on Jul 12, 2012 (gmt 0)

10+ Year Member



The default drupal robots.txt clearly blocks this off:

Disallow: /comment/reply/

Yet google has indexed a LARGE number of these comment pages. Seems this is a widespread problem and that I'm not the only one dealing with...ideas for fixes?

Some have suggested robots.txt just prevents pages from being crawled and not indexed. If so what is the solution? Put meta no-index'es into the templates would be a pain. Any .htaccess or robots.txt tricks to keeping google out of these pages?

g1smd

6:36 pm on Jul 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What
User-agent
directive precedes that line?

smithaa02

7:15 pm on Jul 12, 2012 (gmt 0)

10+ Year Member



User-agent: *

netmeg

7:38 pm on Jul 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is Google indexing new comment pages as well? Or could the comment pages already have been in G before they processing the robots.txt? (Probably unlikely if that's the default, but not every CMS actually installs a robots.txt by default so one has to ask)

aakk9999

7:48 pm on Jul 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt does not stop indexing, it stops crawling. If comment pages that are in index have only "Google made up" title when shown in SERPs, this would indicate that Google is honouring robots.txt.

Another thing to check is whether you have a separate directive for Google user agent in your robots.txt. If you do, Google will not follow "User-agent: *" directive. In this case, unless you also have comments disallowed under Google user agent, comment pages will not be blocked by robots to Google.