Forum Moderators: open
I have known for a while that GoogleBot is not to keen on complying with a robots.txt exclusion on the cgi-bin directory, but this has got to a point where my bandwidth is seriously suffering. I am at the point of making the decision to feed the bot with 403's as I don't need an overusage fee on my bandwidth. Has anyone else implemented such a ban based on the user agent?
Your experience is very unusual. I suggest checking your /robots.txt and checking that Googlebot really is fetching those URLs.
If Google gets 403s or 404s then the pages won't appear in Google's index (same for NOINDEX in the robots meta tag), but this wouldn't help with bandwidth like /robots.txt exclusion does.
Often, there is confusion because Google lists URLs that it has been asked not to crawl in /robots.txt
These would be URL-only listings of course, as Google would not have fatched the URLs.
Often, there is confusion because Google lists URLs that it has been asked not to crawl in /robots.txt
Yes google is listing the URL's in the SERPS when I do a site: search.
The bandwidth is definately being used, I can see it in my log files. I am definately going to be looking at blocking them with a re-write.