homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Google not respecting robots.txt
Found >500 pages in Google index that are forbiden by robots.txt

 5:34 pm on Jan 19, 2007 (gmt 0)

I can't believe this is true.

In robots.txt I disallowed crawlers to index one page even before that page existed on the site (3 months ago). Today I found out that pages could be found in G index. First I thought that this is a problem with robot.txt instructions, but I took destination URL of the page from SERP and paste it in the Google Webmaster tool that tests URLs against robots.txt file from the web server. Result was: Blocked by line 2!

Anyone experienced something like that?



 12:06 am on Jan 22, 2007 (gmt 0)

No direct experience, but I've heard tales of this happening when the URL has incoming links from other sites.

And if the site is dynamic, it may be that there's other routes to the same page.

Xenu might be your friend?


 1:56 am on Jan 22, 2007 (gmt 0)

what are you finding in the google index? is google listing the url only or is it also listing a title and snippet?


 1:59 am on Jan 22, 2007 (gmt 0)

Maybe, just maybe Google only obeys robots.txt for whole directories and not individual files.


 5:54 am on Jan 22, 2007 (gmt 0)

I'm not sure about Googlebot, but the robots.txt checker in webmaster tools is case sensitive. Blocking /bob* in your robots file will not block /Bob* in the tester. May be common knowledge, but it stumped me the other day. Silly purchased cart software had mixed cases all over the place...

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved