Forum Moderators: open

Message Too Old, No Replies

Blue Communication AS

Majestic-12?

         

Bewenched

1:41 am on Nov 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not really sure if this is a good guy or bad guy.

IP: 81.191.159.***
Company Name: Blue Communication AS
Location: Norway
Time Spent: 01:42:47
Browser Type: Unknown
Referring URL: [majestic12.co.uk...]

[edited by: volatilegx at 1:20 am (utc) on Nov. 8, 2007]
[edit reason] obfuscated ip address [/edit]

wilderness

1:43 am on Nov 8, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



He'll be along shortly to tell you what an angel he is ;)

BTW, this UA changed recently as well.

Lord Majestic

12:30 pm on Dec 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apologies for not responding earlier...

This appears to be our bot and I like to think that we are the good guys :)

Note one thing here though - in the last 2 months a lot of fake MJ12bot appeared, they claim to be MJ12bot v1.0.8, but this is fake user-agent - those guys appear to run their own crawler on what appears to be a pretty big botnet of some kind. Now those are the bad guys for sure - they don't (unlike us) obey robots.txt and their crawling appears be create bigger pressure on sites than ours :(

So, to sum up - the bot that visted you is good, we will actually be doing huge seacrh engine release next month, the kind of index that will be interesting to any webmaster who does SEO ;)

Oh forgot one more thing: wilderness we are no angels, but we obey robots.txt and listen to webmasters. We generally were getting 1 webmaster email per 1 bln crawled urls, I don't think this is a bad ratio. Only recently we started getting lots of emails, but all of them talk about fake MJ12bot v1.0.8. We are responsible members of the Internet community, this does not make us angels, but I think this firmly keeps us on the good side of things.

The UA has indeed changed recently to be more inline with those used by modern search engines:

Mozilla/5.0 (compatible; MJ12bot/v1.2.1; [majestic12.co.uk...]

This was done probably because of the same reason bigger search engines changed theirs - some sites really do want Mozilla bit in the user-agent. Robots.txt rules are not affected by this change in anyway.

[edited by: Lord_Majestic at 12:39 pm (utc) on Dec. 16, 2007]

webdude

4:31 pm on Jan 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Mr. Lord Majestic sir,

One of my servers was pretty much killed yesterday while being crawled by several bots from several IPs.

82.246.152.***
87.68.146.***
86.142.233.***
91.64.50.***

All of these bots were marked as [majestic12.co.uk...]

Is there a way of slowing this puppy down a bit? CPU consumption was at 80% - 99% for about a half hour while these slurped up one of my forums.

Any help would be greatly appreciated. I am currently blocking these IPs.

Thanks!

[edited by: volatilegx at 2:32 am (utc) on Jan. 17, 2008]
[edit reason] obfuscated ip addresses [/edit]

Lord Majestic

5:07 pm on Jan 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What was the exact user-agent? If it's not current: Mozilla/5.0 (compatible; MJ12bot/v1.2.1; [majestic12.co.uk...] then it is fake (details on this are on our page).

A few exact log requests would be handy.

The IP 91.64.50.xx (please obscure it) is ours actually - you probably mixed fake and good bot requests together, which is understandable when you are under stress from the fake bot, but we have got nothing to do with those - anyone can fake user-agent, it's the same thing like email spammers fake From: email fields, I often get spam from "myself" :(

webdude

7:19 pm on Jan 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is all the info I have in the logs. I switched out my IP and Domain name...

2008-01-15 14:31:39 78.58.** - W3SVC25 123.123.123.123 80 GET /example.uri 200 0 2364 219 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -
2008-01-15 18:23:49 78.58.85.** - W3SVC25 123.123.123.123 80 GET /example.uri 200 0 2405 220 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -

2008-01-15 20:35:39 87.68.146.*** - W3SVC25 123.123.123.123 80 GET /example.uri 200 0 12678 255 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -
2008-01-15 20:35:52 87.68.146.*** - W3SVC25 123.123.123.123 80 GET /example.uri 200 0 14032 253 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -

2008-01-15 21:21:36 82.246.152.*** - W3SVC25 123.123.123.123 80 GET /example.uri - 200 0 40937 160 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -
2008-01-15 21:21:44 82.246.152.*** - W3SVC25 123.123.123.123 80 GET /example.uri 200 0 2345 219 MyWebSite.com MJ12bot/v1.0.8+(http://majestic12.co.uk/bot.php?+) -

[edited by: volatilegx at 2:35 am (utc) on Jan. 17, 2008]
[edit reason] obfuscated ip addresses [/edit]

Lord Majestic

7:33 pm on Jan 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Those that you show are fake for sure - MJ12bot/v1.0.8 is not ours, if you see it you can ban IP on sight - it is 100% fake.

I am curious what was the user-agent on 91.64.50.xx - this is one of ours, have you got some requests in log for this IP to see that user-agent is good?

If you check our bots page you will find some specific advice how to ban this fake bot for Apache.

webdude

1:54 pm on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We got hit hard again last night on this "fake" bot. It did about a gig in a very short amount of time.

I went to your site and all relevant links. The solutions are well and good for Apache, unfortunately, I am running IIS. Do you know of any solutions for IIS? I would really like to fix this problem. I suppose I could block all the IPs listed, but that doesn't seem feasable... being a virus, the number of IPs could expand exponentionally.

Lord Majestic

3:03 pm on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately IP block is not feasible, as you say number of IPs is way way too high - it's basically a botnet and there is really no viable solution against those :(

I don't know how to do user-agent based pattern exclusion on IIS, however perhaps you site is script based? In this case you should be able to access user-agent and issue deny 403 when you match MJ12bot/v1.0.8.

One thing you can be certain about is that it is not us who does it - it is a virus that is now trapped by Kaspersky and AVG, but problem is that not everyone runs those, if they were then we'd have no botnets :(

[edited by: Lord_Majestic at 3:04 pm (utc) on Jan. 17, 2008]

wilderness

4:13 pm on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



unfortunately, I am running IIS. Do you know of any solutions for IIS?

Here's an old thread on IIS

[webmasterworld.com...]

I provided another IIS link that was located on the web recent;y which I failed to bookmark.

[webmasterworld.com...]

Lord Majestic

4:32 pm on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh, good one wilderness, I will link to those pages from our bots page to help block this baddie in IIS - all is forgiven ;)

webdude

7:27 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I loaded an ISAPI filter called WebKnight. It works great and actually has a lot more built into it then just a referrer block. It even has global header replace functions. Very slick.

Good by bad bots! Thanks for your input, Lord Majestic, et.all.

By the way, in the robots black and white list, they list your robot as one of the good guys.

Lord Majestic

7:45 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



oh, good to hear you found solution to your problem! :)

I am going to reference good WebKnight people from our bots page to help others get themselves protected from bad bots, it is a relief to know they classified us as good ones! :)

mcneely

1:29 am on Feb 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



About a year ago, or a little after, the majectic id began to show up across multiple sites here.

I'm a fuddy-duddy then, in that unless I can actually see a searchable db, I won't let new found UA id's in.

Going to a web page that *explains a bot isn't good enough until I can see the results of the parsing efforts directly, either by our own sites, or someone elses. Parsing utilities run around sometimes for years before they ever show anything that remotely resembles a searchable db. Some never do put on search at all.

The majestic id or ua has only just recently gotten through, and nevermind that it's in the robots.txt. It checks robots.txt, (it's denied of course) and then proceeds to get after any other pages it can find.

So, version 8, 1 or other versions that just happen to pop up are denied via .htaccess.

The current Majestic Search DB is fine I suppose. Short of the *handler/exception errors broadcast across the top of the page after performing a query, it provides, though not exactly on spot, results.

I've nothing but respect for you and your ongoing project Lord Majestic, but I would prefer to wait.

One day, hopefully soon, your project will mature beyond the abuses it's currently faced with.

Best of the best of luck.

If I see anything else unusual, that hasn't been discussed here or on your site, I'll pop in and tell you and everyone else about it.

Of that, you can be sure.

Lord Majestic

1:36 am on Feb 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks! :)

This fake bot sure caused some major grief - user-agent are easily faked unfortunately, just like From email fields :(

We have just launched super big DB that is a big stepping stone forward, very big stone ;)