Welcome to WebmasterWorld Guest from 50.16.78.128

Forum Moderators: goodroi

Message Too Old, No Replies

Google not respecting robots.txt

Found >500 pages in Google index that are forbiden by robots.txt

   
5:34 pm on Jan 19, 2007 (gmt 0)

10+ Year Member



I can't believe this is true.

In robots.txt I disallowed crawlers to index one page even before that page existed on the site (3 months ago). Today I found out that pages could be found in G index. First I thought that this is a problem with robot.txt instructions, but I took destination URL of the page from SERP and paste it in the Google Webmaster tool that tests URLs against robots.txt file from the web server. Result was: Blocked by line 2!

Anyone experienced something like that?

12:06 am on Jan 22, 2007 (gmt 0)

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member



No direct experience, but I've heard tales of this happening when the URL has incoming links from other sites.

And if the site is dynamic, it may be that there's other routes to the same page.

Xenu might be your friend?

1:56 am on Jan 22, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



what are you finding in the google index? is google listing the url only or is it also listing a title and snippet?
1:59 am on Jan 22, 2007 (gmt 0)

10+ Year Member



Maybe, just maybe Google only obeys robots.txt for whole directories and not individual files.
5:54 am on Jan 22, 2007 (gmt 0)

10+ Year Member



I'm not sure about Googlebot, but the robots.txt checker in webmaster tools is case sensitive. Blocking /bob* in your robots file will not block /Bob* in the tester. May be common knowledge, but it stumped me the other day. Silly purchased cart software had mixed cases all over the place...
 

Featured Threads

Hot Threads This Week

Hot Threads This Month