Forum Moderators: goodroi
...and exclude the restThe most annoying bots that spider for email addys or who knows what will just ignore your robots.txt.
The regular ones are best dealt with in .htaccess mod_rewrite or IP denial. If apache.
Some mask their user-agents... Some just can't be stopped. Most hit once and never return so banning IP's just swells your .htaccess file.
There are lots of threads here on how to do this.
Good luck.
I also make sure that bots can't actually find e-mail address on my web site.
For some methods see
[projecthoneypot.org...]
Using key_master's and xlcus' bad-bots scripts, you can block bad-bots behaviourally.
key_master's script [webmasterworld.com] (PERL) uses invisible 'trap' links seeded into your pages which are disallowed by robots.txt. If a bad bot ignores robots.txt or doesn't fetch it, then it follows those links. The result is that the script is activated, which adds the offending bot's IP address to a denied-IP list in .htaccess.
xlcus' script [webmasterworld.com] (PHP) blocks access based on the speed of requests. It's good for catching scrapers and site downloaders. Again, once the trap is sprung, the offender's IP address is added to a denied-IP list in .htaccess.
Once a month, you can go through and prune the list to keep your .htaccess file's size reasonable.
Note that the links above lead to threads with modified/enhanced versions of these scripts. These threads contain links to the originals, and I have credited the original authors here.
Jim