In addition to bannning certain 'suspicious' user-agent strings, you might want to take a look at header information that is supplied, and also the ip range that a supposed visitor is accessing your site from. YMMV ;)
Msg#: 4255039 posted 10:39 pm on Jan 19, 2011 (gmt 0)
The problem with those lists is, they only block those bots that are bad (ia_archiver is not if that's archive.org's bot) and stupid, but it doesn't block those that really want the information. Blocking harvesters is pretty much the same as blocking spam bots, you might want to look at "bad behaviour", it's a client fingerprinting-based solution that tries to identify bots that pose as regular browsers and denys them access.