Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Google not respecting robots.txt

Found >500 pages in Google index that are forbiden by robots.txt

5:34 pm on Jan 19, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 29, 2002
votes: 0

I can't believe this is true.

In robots.txt I disallowed crawlers to index one page even before that page existed on the site (3 months ago). Today I found out that pages could be found in G index. First I thought that this is a problem with robot.txt instructions, but I took destination URL of the page from SERP and paste it in the Google Webmaster tool that tests URLs against robots.txt file from the web server. Result was: Blocked by line 2!

Anyone experienced something like that?

12:06 am on Jan 22, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member quadrille is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 22, 2002
votes: 0

No direct experience, but I've heard tales of this happening when the URL has incoming links from other sites.

And if the site is dynamic, it may be that there's other routes to the same page.

Xenu might be your friend?

1:56 am on Jan 22, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
votes: 335

what are you finding in the google index? is google listing the url only or is it also listing a title and snippet?
1:59 am on Jan 22, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2001
votes: 0

Maybe, just maybe Google only obeys robots.txt for whole directories and not individual files.
5:54 am on Jan 22, 2007 (gmt 0)

Junior Member from US 

10+ Year Member

joined:Mar 16, 2004
votes: 0

I'm not sure about Googlebot, but the robots.txt checker in webmaster tools is case sensitive. Blocking /bob* in your robots file will not block /Bob* in the tester. May be common knowledge, but it stumped me the other day. Silly purchased cart software had mixed cases all over the place...