homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Googlebot disobeys the conditions in robots.txt
googlebot disobeys conditions set under user-agent: *

 11:28 am on Sep 16, 2010 (gmt 0)

The GWT robots.txt test shows that googlebot disobeys the conditions specified under user-agent: * and only takes in to account the conditions specified under user-agent: googlebot.

For example, I disallow 3 folders namely x.y & z from all bots and the folder w from only googlebot. But the test robots.txt option available in the Google webmaster tool disobeys the condition specified to all robots (ie under *).

Is this a bug or this how the robots directive work? Would appreciate id anyone sheds some light on this.

Thanks in advance.



 11:31 am on Sep 16, 2010 (gmt 0)

99% of google disobeying robots.txt turns out to be an error with the robots.txt file (typo, incorrectly formatted etc) and not google making a mistake.

i would suggest you double check your robots.txt to make sure it is ordered properly, has zero typos, has the necessary extra spaces & line breaks, uploaded to the right space and is 100% error free.


 11:58 am on Sep 16, 2010 (gmt 0)

goodroi, thanks for the response.

I think you have misunderstood what I meant. To be clear, from the above example, the test robots.txt option in the webmaster tools shows that google only reads the conditions specified under googlebot and ignores those specified under *.

For example, check on this sample.

user-agent: *

Disallow: /x
Disallow: /y
Disallow: /z

user-agent: googlebot

Disallow: /w

So my question here is, will googlbot disallow only w or all 4 (x,y,z & w)?

The tool in GWT says only W.

Please clarify. Will be of huge help.



 12:17 pm on Sep 16, 2010 (gmt 0)

Block or remove pages using a robots.txt file - Webmaster Tools Help -
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449&from=40360&rd=1 [google.com]:
Each section in the robots.txt file is separate and does not build upon previous sections.

For example:
User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

In this example only the URLs matching /folder2/ would be disallowed for Googlebot.


 12:37 pm on Sep 16, 2010 (gmt 0)

Hi phranque,

Thanks for the clarification.

Highly appreciated :))

grandma genie

 5:34 am on Oct 25, 2010 (gmt 0)

I thought Googlebot was doing something similar on my site today. I have my robots.txt set up like this for all bots:
User-Agent: *
Disallow: /osc/popup_image.php

But Googlebot showed up today doing this:

66.249.71.nn - - [24/Oct/2010:06:25:48 -0400] "GET /osc/popup_image.php?pID=3339 HTTP/1.1" 200 709 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:53 -0400] "GET /osc/popup_image.php?pID=1775 HTTP/1.1" 200 703 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:58 -0400] "GET /osc/popup_image.php?pID=3105 HTTP/1.1" 200 736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I also have this further down where I list all the specific bots:

User-agent: Googlebot-Image
Disallow: /

What am I doing wrong? Googlebot totally descended on my site today like gang busters.



 5:59 am on Oct 25, 2010 (gmt 0)

do you have a section for regular "Googlebot"?
have you tested your robots.txt in the Crawler access section of GWT?

grandma genie

 4:38 pm on Oct 25, 2010 (gmt 0)

I do not have a section for the regular Googlebot. I have tested my robots.txt in the Crawler access section of GWT and it is working. Googlebot has access to most of my site, but the specific file referenced is blocked by robots.txt. Do I need to put an asterisk behind the popup_image.php* ?


 7:55 pm on Oct 25, 2010 (gmt 0)

I used to have my robots.txt file with some bots totally disallowed and one single section for those parts of my site which were disallowed to everyone.

It must now be about a year ago, Google started poking into the disallowed parts, even picking up images etc, (though Image bot is entirely disallowed) so I wrote a specific section for Gbot alone detailing everything it has to keep its hands off and that works.


 9:20 pm on Oct 25, 2010 (gmt 0)

When you have a section for Googlebot, Google reads ONLY that section.

Here's a detailed thread from 4 years ago... [webmasterworld.com...]


 1:34 am on Oct 26, 2010 (gmt 0)

Do I need to put an asterisk behind the popup_image.php* ?

the robots exclusion protocol for robots.txt matches left-to-right, so the wildcard should not be necessary.
perhaps the Googlebot-Image section is triggering something.
i would try repeating the wildcard User-Agent exclusion in a Googlebot-specific section.

grandma genie

 2:05 am on Oct 26, 2010 (gmt 0)

I tried creating a section just for the googlebot and it looks like that did the trick. I'll see if they are still obeying the robots.txt tomorrow. Thanks all.

grandma genie

 1:23 am on Oct 28, 2010 (gmt 0)

Just a follow up to my last post. Yes, the changes in robots.txt did the trick. Googlebot is now obeying all directives. Thanks for all the help.


 11:26 am on Oct 28, 2010 (gmt 0)

it's good that you were able to fix that, but the GWT Help documentation i linked to above doesn't match the results you observed.

grandma genie

 12:16 am on Oct 30, 2010 (gmt 0)

Perhaps Google should change their docs. I just did the same thing with MSN and that works, too.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved