Forum Moderators: Robert Charlton & goodroi
Are you sure that it is really Googlebot (verified the IP address using dns then reverse dns)?
How do you know that they're ignoring the directive?
Using browser gathered data to know where a visitor went? Very possible too but with huge implications when I'm saying Googlebot don't via robots.txt.
re:403, is there a proven method of shutting the door on all googlebot activity since robots.txt is apparently not enough? They do occasionally visit with other referrers, such as when a member of the ratings team takes a look from somewhere like India.
What happen when you use WMT to fetch as googlebot?
If you want to run a simple test, create a new piece of content and block it via robots.txt. Then link to it from other pages on your site. When I've done this in the past, the URL of the disallowed page will still show in their index, but Google displays no other information about the page. They know it exists, but haven't looked at it. And because they know it exists, they'll index it.
The only plausably "legit" reason for this would be if Google wants to check the response code before it includes this URL in its index (with no title and without snippet, as it does for pages blocked by robots.txt but linked from elsewhere).