Forum Moderators: phranque
Close to perfect .htaccess ban list [webmasterworld.com].
also the same forum, provides quite a lot of details about .htaccess implementation, and adding rogue robots.
There's no such thing as a cut and paste .htaccess, but near perfect ain't bad.
If you head down the link a good 50+ posts, the list get pretty comprehensive with reasonable straight forward instructions on application. keep your eye's open its a while since i read that thread.
There is a script (and several derivatives) posted here that is close to what you are asking for. The script does not block based on whether robots.txt is requested, it blocks based on whether robots.txt is obeyed. The problem with blocking based on robots.txt requests is that you have to "track" by IP number or hostname (which might change, a la Google) and you have to remember the robots.txt request for each IP address or hostname - sometimes for several days. This results in a database that is large, difficult to determine purge criteria for, and generally a pain.
Here's a link to a later thread, and you can follow the backlinks in the thread to get back to the original post by key_master containing the background and theory of operation: [webmasterworld.com...]
It works well.
Jim