Careful with the bings. They've been moving into all kinds of obscure new IPs. (Apparently they missed the early-90s /8 handout by about five minutes, so they have to grab IPs where they can find them.) If it's an unfamiliar range and the visitor begins by asking for robots.txt, it may be worth looking them up.
Isn't MJ12 the one where you sign-up to get SEO data (similar to GWT) in exchange for letting them crawl site? If so, they give you an access code to allow only their authentic bot. I did this for awhile but didn't see the benefit so I stopped allowing them.
Well, ###. So if I have no idea what you're talking about, it means that every "MJ12" bot I've ever met is fake? :)
Not at all.
The access code is just so you can allow the real one and block the others. If I remember correctly, you sign-up for an account at MJ12 and during that process you choose an access code (example: lucy24) then he confirms via email. Then you use that access code in a mod_access filter allowing MJ12 bots that have the code in their UA string, but denying other MJ12 bots that don't. Then I think you need to specifically allow his bot in robots.txt as well (never did understand this part.)
This opt-in filter worked very well. I just discontinued it since I didn't utilize the seo data he offers, you may. He obviously uses your collected site data for business, but at least he is up front about it and offers a trade-off unlike a lot of domain-info-scrapers out there.
I didn't subscribe to majestic's philosophy and I also decided it was easier to block all MJ bots with both robots.txt (for legit ones) and by 403 (for illegal ones). Have to say the legit ones seem to obey robots.txt.