Forum Moderators: open
Feb 2008 thread [webmasterworld.com]
Don't recall exactly when I added the UA to my robots.txt.
User-agent: T-H-U-N-D-E-R-S-T-O-N-E
Disallow: /
Today a visit, read robots and left.
206.183.1.zz - - [14/Aug/2009:11:02:06 +0100] "GET /robots.txt HTTP/1.0" 200 4858 "-" "Mozilla/4.0 (compatible; http ://search.thunderstone.com/texis/websearch/about.html)"
If you modify their UA-URL (remove "search" and add "www", their website provides more/accurate specific details than the UA Link.)
copilot.thunderstone.com
Mozilla/4.0 (compatible; [search.thunderstone.com...]
robots.txt? YES
-----
1.) Thunderstone's been crawling DMOZ.org for years, as a kind of free demo:
"This engine searches the Open Directory Project's growing catalog of over 4 million web sites. The raw data is offered by Open Directory as RDF/XML data files. Our search engine downloads this data, converts it into a Texis database, and provides a categorized, searchable interface."
[thunderstone.com...]
And here's their iteration:
So if you're in DMOZ and like/get any referral traffic, chances are that's why "copilot.thunderstone.com" comes around.
-----
2.) HOWEVER, other sites can use Thunderstone's free online search and/or install a free version of their engine and initially misconfigure crawling prefs. (Not that I'd know this first-hand or anything...) Plus budding bot-runners can override the default read+heed robots.txt rule.
-----
3.) So even though I've installed and use a paid-for version of their Webinator program, I block Thunderstone (as Host and UA) and T-H-U-N-D-E-R-S-T-O-N-E (as UA), ditto anything Webinator (other than my own semi-branded UA). I don't want countless others crawling around, accidentally or on purpose. FWIW