99% of google disobeying robots.txt turns out to be an error with the robots.txt file (typo, incorrectly formatted etc) and not google making a mistake.
i would suggest you double check your robots.txt to make sure it is ordered properly, has zero typos, has the necessary extra spaces & line breaks, uploaded to the right space and is 100% error free.
goodroi, thanks for the response.
I think you have misunderstood what I meant. To be clear, from the above example, the test robots.txt option in the webmaster tools shows that google only reads the conditions specified under googlebot and ignores those specified under *.
For example, check on this sample.
So my question here is, will googlbot disallow only w or all 4 (x,y,z & w)?
The tool in GWT says only W.
Please clarify. Will be of huge help.
Block or remove pages using a robots.txt file - Webmaster Tools Help -
|Each section in the robots.txt file is separate and does not build upon previous sections. |
In this example only the URLs matching /folder2/ would be disallowed for Googlebot.
Thanks for the clarification.
Highly appreciated :))
I thought Googlebot was doing something similar on my site today. I have my robots.txt set up like this for all bots:
But Googlebot showed up today doing this:
66.249.71.nn - - [24/Oct/2010:06:25:48 -0400] "GET /osc/popup_image.php?pID=3339 HTTP/1.1" 200 709 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:53 -0400] "GET /osc/popup_image.php?pID=1775 HTTP/1.1" 200 703 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.nn - - [24/Oct/2010:06:25:58 -0400] "GET /osc/popup_image.php?pID=3105 HTTP/1.1" 200 736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I also have this further down where I list all the specific bots:
What am I doing wrong? Googlebot totally descended on my site today like gang busters.
do you have a section for regular "Googlebot"?
have you tested your robots.txt in the Crawler access section of GWT?
I do not have a section for the regular Googlebot. I have tested my robots.txt in the Crawler access section of GWT and it is working. Googlebot has access to most of my site, but the specific file referenced is blocked by robots.txt. Do I need to put an asterisk behind the popup_image.php* ?
I used to have my robots.txt file with some bots totally disallowed and one single section for those parts of my site which were disallowed to everyone.
It must now be about a year ago, Google started poking into the disallowed parts, even picking up images etc, (though Image bot is entirely disallowed) so I wrote a specific section for Gbot alone detailing everything it has to keep its hands off and that works.
When you have a section for Googlebot, Google reads ONLY that section.
Here's a detailed thread from 4 years ago... [webmasterworld.com...]
|Do I need to put an asterisk behind the popup_image.php* ? |
the robots exclusion protocol for robots.txt matches left-to-right, so the wildcard should not be necessary.
perhaps the Googlebot-Image section is triggering something.
i would try repeating the wildcard User-Agent exclusion in a Googlebot-specific section.
I tried creating a section just for the googlebot and it looks like that did the trick. I'll see if they are still obeying the robots.txt tomorrow. Thanks all.
Just a follow up to my last post. Yes, the changes in robots.txt did the trick. Googlebot is now obeying all directives. Thanks for all the help.
it's good that you were able to fix that, but the GWT Help documentation i linked to above doesn't match the results you observed.
Perhaps Google should change their docs. I just did the same thing with MSN and that works, too.