Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
1. Why ban Googlebot-Image? Is it just to save time because this is a text-based site?
2. Why ban User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) and other such browser identification strings? They aren't spidering, are they?
3. I can see why you disallow Forum 9 (Foo). But why Forums 19 (Community Center) and 29 (Commercial exchange)?
4. Are all the bots listed well-behaved or do you also have to ban some of them via rewrites?
2- some bots do support robots.txt and do use standard agents. It catches a few.
3- off topic referrals and local related discussions.
4- I spend an hour to two hours a day tracking down bad bots and banning them, their ip, or there entire isp. It isn't bad when you have a few dozen or few hundred pages; but, when you get into 70-80k pages, bots can tear a system up in no time flat without sufficient tracking and response systems. It is a very serious problem that threatens the entire system on a daily basis. We are having a discussion about moving to a subscription based system and stopping rogue bots would be my #1 reason for doing so.
Community Center has quite a few on-topic threads. Maybe they've started in the wrong place and a moderator will move them to a more relevant forum.
Is subscription-only the best way to stop bots? (I'll be brief here as it is obviously being discussed at length elsewhere). A simple mechanism to trip them up at the doorway might be a sign-on screen which has human-readable instructions how to sign on as a guest -- such as typing in a displayed random string.
I was looking for a complete bot ban list (not that I have the problems you do -- I'm not that successful, and what you have should make you happy -- it's a problem of success) so I'm pleased to see that the comments in your robots.txt let other people use it freely.