Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
Every single day I see new robots in my logs (...and when I research them, I see new sites with scraped copies of others' lists of robots...boo-hiss). On my sites, the unbridled proliferation of bots and crawlers from every country imaginable is both intriguing, and irritating because it's costly in terms of bandwidth used::value returned. So I block all but a handful.
So anyway, here's a linked list of my most reliable 'robot research' sources. (Many of the URLs are the first of many pages of data, usually listed/linked alphabetically.) These are original sites compiling and offering their own site's data just as they've been doing for years. The webmasters (and their programs) are doing the obsessively hard work and terrifically good work, too. I tip my hat to every single one.
List of User-Agents (Spiders, Robots, Crawler, Browser) [psychedelix.com]
The Best of the Rest:
SUMMARY.NET -- Known Robots [summary.net] (live site demo)
Stefan Helbing's Table of bad robots [helbing.nu]
KLOTH.NET -- List of Bad Bots [kloth.net] (alas, not as current as it once was)
John A Fotheringham's Search engine robots that visit your web site [jafsoft.com]
AWM-Webmaster.com -- Browser, Spider, Robots und Crawlers [awm-webmaster.com]
Note: You could go slightly goofy trying to combine the preceding pages' entries in .htaccess, etc., in order to block or otherwise handle the worst one by one. (I know. I tried:) There are simply too danged many of the evil spawn...