jdMorgan - 4:01 am on Jan 5, 2011 (gmt 0)
On the subject of MJ12 "sighting reports," I'd like to offer the observation that at one point, I was getting hundreds of fake MJ12 request per month, and zero real MJ12 requests. After implementing the MJ12 validation scheme on several servers, the number of fake MJ12 'bots continued to rise for a few months, and then began to slowly decline. Now after about a year, I hardly ever see a fake MJ12 'bot attempting to gain access; the vast majority of requests with the MJ12 user-agent are legitimate.
So, for other distributed-bot owners out there: If you want to prevent spoofers from using your user-agent string and damaging your 'bots reputation, implement a simple validation scheme like MJ12bot did. It works.
On the use of encrypted keys as proposed above: Using an encrypted key is all well and good, but it requires a script to do the checking on the server end -- Not something that "Joe average Webmaster" is going to be able to implement easily. Checking for a specific string in a specific HTTP header and/or appended to the user-agent string is sufficient to the task of discriminating a legitimate distributed 'bot from a spoofer.
And if there is any doubt that the 'secret passphrase' has been compromised, one can simply change it.
There are perfect solutions and there are simple solutions. But only some of the simple methods actually solve the problem they're intended to address. The simplest solution that effectively solves a problem is termed "elegant." After noting the almost total disappearance of spoofed MJ12 requests, that's the term I'd use here.