Welcome to WebmasterWorld Guest from 54.167.174.11

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MJ12bot

When you're a hammer, everything looks like a nail

   
2:47 am on Jun 21, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



This distributed bot has been discussed (& kicked; & its creator argued-with) for years [google.com].* As the following sampling shows, the MJ12bot's misbehaviors remain unchanged:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.5; [majestic12.co.uk...]

06-12: From: Server Farm
robots.txt? NO

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.4; [majestic12.co.uk...]

05-18: From: Server Farm
robots.txt? NO

05-07: From: . (that's a dot; misconfigured server)
robots.txt? NO

05-07: From: Server Farm
robots.txt? NO

05-02: From: 205.209.158.num
robots.txt? YES
Fake referer? YES

04-11: From: . (misconfigured server)
robots.txt? NO

Seems to me the long-standing, oft' proffered claim that the bot obeys robots.txt is specious/misleading when the MJ12bot doesn't ask for it in the first place.

[edited by: incrediBILL at 3:22 am (utc) on June 21, 2009]
[edit reason] Removed URL [/edit]

9:00 pm on Jun 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let me know when it's ready too.

[edited by: incrediBILL at 9:28 pm (utc) on June 25, 2009]
[edit reason] fixed formatting [/edit]

9:40 pm on Jun 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'll post in this thread guys, thanks for your feedback, I hope once this is done you will have some extra confidence that we are the good guys in this...
1:24 am on Jul 26, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Any news?
3:33 pm on Jul 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sent PM to Bill, but will post here too - have not forgotten about this stuff, it is a commitment that I will deliver, just had lots of others to finish this month - very busy :(

I am going to send Bill info on proposed scheme, it can be discussed on here as well (if anyone interested) - once you are happy that the proposed way is acceptable I'll get it implemented. I think there is a good chance that we'll have this working (in test phase) by mid-August.

7:13 pm on Aug 31, 2009 (gmt 0)

redhat



The following 6 messages were cut out to new thread by incredibill. New thread at: search_engine_spiders/3983454.htm [webmasterworld.com]
8:52 am on Sep. 3, 2009 (PST -8)
7:13 pm on Aug 31, 2009 (gmt 0)

redhat



The following 41 messages were cut out to new thread by incredibill. New thread at: search_engine_spiders/3985384.htm [webmasterworld.com]
12:12 am on Sep. 7, 2009 (PST -8)
7:26 am on Sep 7, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Okay, I'm confused, sorry. I've yet to register anywhere, to either opt-in or opt-out of anything. A few hours ago, a UA named "MJ12bot" from two disparate servers hit two sites 90 minutes apart, each time asking for robots.txt. Faked referers contained different 27-number strings and each site's name --

SITE #1:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.4; [majestic12.co.uk...]
Host: [yada-yada].craw.blueyonder.co.uk

Referers:

[majestic12.co.uk...]
[majestic12.co.uk...]

SITE #2:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.5; [majestic12.co.uk...]
IP: 205.209.170.nn

Referer:

[majestic12.co.uk...]

QUESTIONS:

If those two different 27-character strings (10 nos hyphen 16 nos) were/are each site's 'key' -- how'd two, erm, somebody elses get them? Or were those just more fake bots and fake referers? Or--?

TIA

[edited by: incrediBILL at 8:15 am (utc) on Sep. 7, 2009]
[edit reason] clean up [/edit]

7:18 pm on Sep 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If those two different 27-character strings (10 nos hyphen 16 nos) were/are each site's 'key'

These numbers are not site keys - they are debug params that we only send when getting robots.txt, it allows us to identify "bucket" in which those urls were as well as exact crawler that crawler it. A bucket can contain up to 10k urls from all sort of sites, and same domain can be present in multiple "buckets", hence these numbers can't be used for crawler/site identification even though I've never seen fakes show such referer (they usually don't bother getting robots.txt in the first place).

The new "ident" feature is user-configurable and it will be send consistently by new version of the bot: v1.3.0 (or higher).

This 38 message thread spans 2 pages: 38