The purpose of this thread is to collect a list common default HTTP user agent names used by various programming libraries and command line tools. Many of these user agents may actually be used in a real spider but for whatever reason the default user agent string wasn't reset although most are easily changed to reflect the spider name.
Hopefully this list will help people understand what tools are being used which may provide some insights into the spider's purpose, as well as being a useful quick resource guide in the future.
Let me kick off this list with a few entries of my own:
Example: "Jakarta Commons-HttpClient/3.0.1"
HTTP client protocol library, see Apache.org for more details
Example: "Snoopy v1.2.3"
Snoopy v1.2.3 it appears to be a PHP class with one version on Source Forge (see the snoopy project) and it's definitely included in Wordpress.
Example: "Wget/1.10.2 (Red Hat modified)"
GNU wget (wget) is a freely available network utility to retrieve files from the World Wide Web, using HTTP (Hyper Text Transfer Protocol) and FTP (File Transfer Protocol), the two most widely used Internet protocols."
This is a general-purpose application library for retrieving HTTP documents used by PERL scripts typically from Linux servers. This particular library is often associated with hacking attempts and botnet attacks [webmasterworld.com] and should be blocked in general.
curl or libcurl-agent
Example: "curl/7.15.5 (i686-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
There are several variations on this user agent but it's all related to cURL which is a command line tool for transferring files with URL syntax often called from scripts on Linux servers.
I'll add more later and please feel free to add others to this list.
NOTE: This isn't a discussion thread, it's an information posting thread only so if you feel the need to discuss one of the user agents listed in detail please start a new thread.
[edited by: incrediBILL at 8:39 pm (utc) on July 14, 2009]