| 11:31 am on Sep 16, 2010 (gmt 0)|
99% of google disobeying robots.txt turns out to be an error with the robots.txt file (typo, incorrectly formatted etc) and not google making a mistake.
i would suggest you double check your robots.txt to make sure it is ordered properly, has zero typos, has the necessary extra spaces & line breaks, uploaded to the right space and is 100% error free.
| 11:58 am on Sep 16, 2010 (gmt 0)|
goodroi, thanks for the response.
I think you have misunderstood what I meant. To be clear, from the above example, the test robots.txt option in the webmaster tools shows that google only reads the conditions specified under googlebot and ignores those specified under *.
For example, check on this sample.
So my question here is, will googlbot disallow only w or all 4 (x,y,z & w)?
The tool in GWT says only W.
Please clarify. Will be of huge help.
| 12:17 pm on Sep 16, 2010 (gmt 0)|
Block or remove pages using a robots.txt file - Webmaster Tools Help -
|Each section in the robots.txt file is separate and does not build upon previous sections. |
In this example only the URLs matching /folder2/ would be disallowed for Googlebot.
| 12:37 pm on Sep 16, 2010 (gmt 0)|
Thanks for the clarification.
Highly appreciated :))
| 5:34 am on Oct 25, 2010 (gmt 0)|
I thought Googlebot was doing something similar on my site today. I have my robots.txt set up like this for all bots:
But Googlebot showed up today doing this:
66.249.71.nn - - [24/Oct/2010:06:25:48 -0400] "GET /osc/popup_image.php?pID=3339 HTTP/1.1" 200 709 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:53 -0400] "GET /osc/popup_image.php?pID=1775 HTTP/1.1" 200 703 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:58 -0400] "GET /osc/popup_image.php?pID=3105 HTTP/1.1" 200 736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I also have this further down where I list all the specific bots:
What am I doing wrong? Googlebot totally descended on my site today like gang busters.
| 5:59 am on Oct 25, 2010 (gmt 0)|
do you have a section for regular "Googlebot"?
have you tested your robots.txt in the Crawler access section of GWT?
| 4:38 pm on Oct 25, 2010 (gmt 0)|
I do not have a section for the regular Googlebot. I have tested my robots.txt in the Crawler access section of GWT and it is working. Googlebot has access to most of my site, but the specific file referenced is blocked by robots.txt. Do I need to put an asterisk behind the popup_image.php* ?
| 7:55 pm on Oct 25, 2010 (gmt 0)|
I used to have my robots.txt file with some bots totally disallowed and one single section for those parts of my site which were disallowed to everyone.
It must now be about a year ago, Google started poking into the disallowed parts, even picking up images etc, (though Image bot is entirely disallowed) so I wrote a specific section for Gbot alone detailing everything it has to keep its hands off and that works.
| 9:20 pm on Oct 25, 2010 (gmt 0)|
When you have a section for Googlebot, Google reads ONLY that section.
Here's a detailed thread from 4 years ago... [webmasterworld.com...]
| 1:34 am on Oct 26, 2010 (gmt 0)|
|Do I need to put an asterisk behind the popup_image.php* ? |
the robots exclusion protocol for robots.txt matches left-to-right, so the wildcard should not be necessary.
perhaps the Googlebot-Image section is triggering something.
i would try repeating the wildcard User-Agent exclusion in a Googlebot-specific section.
| 2:05 am on Oct 26, 2010 (gmt 0)|
I tried creating a section just for the googlebot and it looks like that did the trick. I'll see if they are still obeying the robots.txt tomorrow. Thanks all.
| 1:23 am on Oct 28, 2010 (gmt 0)|
Just a follow up to my last post. Yes, the changes in robots.txt did the trick. Googlebot is now obeying all directives. Thanks for all the help.
| 11:26 am on Oct 28, 2010 (gmt 0)|
it's good that you were able to fix that, but the GWT Help documentation i linked to above doesn't match the results you observed.
| 12:16 am on Oct 30, 2010 (gmt 0)|
Perhaps Google should change their docs. I just did the same thing with MSN and that works, too.