Is anyone else feeding GoogleBot 403's?

Forum Moderators: open

Message Too Old, No Replies

Is anyone else feeding GoogleBot 403's?

What with it's hunger for scripts and disregard for cgi-bin disallows?

vrtlw

12:49 pm on Mar 24, 2004 (gmt 0)

I have been amazed by the hunger recently for GoogleBeot and their test bot for JavaScript's and the like.

I have known for a while that GoogleBot is not to keen on complying with a robots.txt exclusion on the cgi-bin directory, but this has got to a point where my bandwidth is seriously suffering. I am at the point of making the decision to feed the bot with 403's as I don't need an overusage fee on my bandwidth. Has anyone else implemented such a ban based on the user agent?

ciml

1:15 pm on Mar 24, 2004 (gmt 0)

> I have known for a while that GoogleBot is not to keen on complying with a robots.txt exclusion on the cgi-bin directory

Your experience is very unusual. I suggest checking your /robots.txt and checking that Googlebot really is fetching those URLs.

If Google gets 403s or 404s then the pages won't appear in Google's index (same for NOINDEX in the robots meta tag), but this wouldn't help with bandwidth like /robots.txt exclusion does.

Often, there is confusion because Google lists URLs that it has been asked not to crawl in /robots.txt

These would be URL-only listings of course, as Google would not have fatched the URLs.

vrtlw

9:26 pm on Mar 24, 2004 (gmt 0)

Hi Ciml

Often, there is confusion because Google lists URLs that it has been asked not to crawl in /robots.txt

Yes google is listing the URL's in the SERPS when I do a site: search.

The bandwidth is definately being used, I can see it in my log files. I am definately going to be looking at blocking them with a re-write.