|MSNBot killing my server, not this again!|
It's back again and frankly this is annoying me, i dont like blocking the bot even if it only sends me 1% of my daily traffic i'd like to give it a fair shot but when it comes to load i'd rather have my normal load of 0.50 rather than 5, 10 or even 15 at times.
I have spoke to a few engineers over there, they promise to set a delay their end as the bot ignores robots.txt commands but with little effect.
So Bing, i assume you read here, sort your bot out, it's a massive resource hog and how the hell do you expect to win over webmasters and searchers alike if we are all blocking your bot. Right now i'm having to resort to adding a line in my htaccess.
Could you elaborate on what you type of site you're running (EG News, blog, informational, static or dynamic, etc.) and what else you're doing to try to slow the bot down besides blocking it? EG Serving last modified headers, serving e-tag headers, expires, etc.
I ask because I work on a couple of decent sized sites which are both dynamic, but behave as if they are static (serve full headers, including different expiration times by file type, etc.) and haven't ever had an issue with MSNBot at all. It usually requests everything twice in a row, but the issue you are talking about is definitely not an 'everyone issue' so knowing the differences in situations would be good, IMO, and might help figure out what's causing the issue for you and not everyone...
If no other cause is found, and crawl-delay in robots.txt plus the above cache-, expiry-, and E-tag-header suggestions don't help and and this problem persists, you have the option to serve a 503-Service Unavailable response accompanied with a Retry-After header. If you set the Retry-After time at 5 to 15 seconds, your problem should be alleviated.
The above is based on the HTTP protocol. I personally have no idea whether msnbot will handle it correctly.
Also, don't complain too loudly to them about msnbot's behavior. I did that several years ago, and the site I was complaining about is still "banned" at Bing, although none of the techs can see any problem in the tools available to them, and all report that the site is *not* blocked, despite the fact that it no longer shows even for it's own domain, and there is a "Some results have been removed" message at the bottom of the screen. It's a non-profit, informational site, so I'm not losing any money (or sleep) over it.
I would also check that the IP number of the bot does I fact belong to Bing. Pretending to be MSNBot is pretty easy to do.
Aside: that's an interesting phenomenon JD. Did you (for a time) tell msnbot to noindex, nofollow? I wonder if.. Somewhere in the depths of bing's database, you are still on a list of sites that msnbot has effctively banned itself from crawling.