Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000

Message Too Old, No Replies

Suggesting an improvement to Majestic's validation


9:25 am on Oct 25, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 8, 2002
posts: 2335
votes: 0

I just read the thread where Majestic is providing a validation key for webmasters in his crawl to "prove" it's a real bot.

A better system would be to "sign" the crawl using secret keys and the url crawled. This way, if the "secret" gets found out, it's unusable.

I believe if you check how Facebook and Twitter handle their "connect" initiatives, you will find a good example of it.

Something encrypted w/ Majestic's public key can only be read by the webmaster's private key....

So if the key gets found it, it's only good for that url, not sitewide.
3:23 pm on Nov 26, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2041
votes: 1

I can't speak to the validation scheme, sorry, but I can at least report this MJ12bot sighting circa four hours ago:

Mozilla/5.0 (compatible; MJ12bot/v1.3.3; http://www.majestic12.co.uk/bot.php?+)
robots.txt? Yes


MJ12bot v1.3.0 Implements Ground Breaking Validation Capability [webmasterworld.com]

MJ12bot - When you're a hammer, everything looks like a nail [webmasterworld.com]
4:01 am on Jan 5, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
votes: 0

On the subject of MJ12 "sighting reports," I'd like to offer the observation that at one point, I was getting hundreds of fake MJ12 request per month, and zero real MJ12 requests. After implementing the MJ12 validation scheme on several servers, the number of fake MJ12 'bots continued to rise for a few months, and then began to slowly decline. Now after about a year, I hardly ever see a fake MJ12 'bot attempting to gain access; the vast majority of requests with the MJ12 user-agent are legitimate.

So, for other distributed-bot owners out there: If you want to prevent spoofers from using your user-agent string and damaging your 'bots reputation, implement a simple validation scheme like MJ12bot did. It works.

On the use of encrypted keys as proposed above: Using an encrypted key is all well and good, but it requires a script to do the checking on the server end -- Not something that "Joe average Webmaster" is going to be able to implement easily. Checking for a specific string in a specific HTTP header and/or appended to the user-agent string is sufficient to the task of discriminating a legitimate distributed 'bot from a spoofer.

And if there is any doubt that the 'secret passphrase' has been compromised, one can simply change it.

There are perfect solutions and there are simple solutions. But only some of the simple methods actually solve the problem they're intended to address. The simplest solution that effectively solves a problem is termed "elegant." After noting the almost total disappearance of spoofed MJ12 requests, that's the term I'd use here.

10:19 pm on Jan 5, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
votes: 18

After adding mj12 to robots.txt the spate of bots died down and finally ceased, at least on the web pages (I assume robots.txt is still hit occasionally).

I also now notice an absence of "fake" mj12 bots which prompts me to wonder:

a) were they really fake or simply out of date and out of control; OR

b) did the drivers of the fakes simple lose interest; OR

c) were the bots themselves gradually updated by the fakers (who in that case were running genuine mj12's).

I doubt very much the mj12's were (eg) renamed nutch bots or we would still be seeing them

Credit to Majestic for obeying robots.txt. Nevertheless, I still will not permit distributed bots on my server. There is far too little control over them OR too much hassle setting up for them (eg mj12's validation).