homepage Welcome to WebmasterWorld Guest from 54.226.93.128
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google not respecting robots.txt
Found >500 pages in Google index that are forbiden by robots.txt
roots




msg:3225118
 5:34 pm on Jan 19, 2007 (gmt 0)

I can't believe this is true.

In robots.txt I disallowed crawlers to index one page even before that page existed on the site (3 months ago). Today I found out that pages could be found in G index. First I thought that this is a problem with robot.txt instructions, but I took destination URL of the page from SERP and paste it in the Google Webmaster tool that tests URLs against robots.txt file from the web server. Result was: Blocked by line 2!

Anyone experienced something like that?

 

Quadrille




msg:3227077
 12:06 am on Jan 22, 2007 (gmt 0)

No direct experience, but I've heard tales of this happening when the URL has incoming links from other sites.

And if the site is dynamic, it may be that there's other routes to the same page.

Xenu might be your friend?

goodroi




msg:3227145
 1:56 am on Jan 22, 2007 (gmt 0)

what are you finding in the google index? is google listing the url only or is it also listing a title and snippet?

piskie




msg:3227146
 1:59 am on Jan 22, 2007 (gmt 0)

Maybe, just maybe Google only obeys robots.txt for whole directories and not individual files.

Eathan




msg:3227266
 5:54 am on Jan 22, 2007 (gmt 0)

I'm not sure about Googlebot, but the robots.txt checker in webmaster tools is case sensitive. Blocking /bob* in your robots file will not block /Bob* in the tester. May be common knowledge, but it stumped me the other day. Silly purchased cart software had mixed cases all over the place...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved