Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MJ12bot

When you're a hammer, everything looks like a nail

     
2:47 am on Jun 21, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


This distributed bot has been discussed (& kicked; & its creator argued-with) for years [google.com].* As the following sampling shows, the MJ12bot's misbehaviors remain unchanged:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.5; [majestic12.co.uk...]

06-12: From: Server Farm
robots.txt? NO

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.4; [majestic12.co.uk...]

05-18: From: Server Farm
robots.txt? NO

05-07: From: . (that's a dot; misconfigured server)
robots.txt? NO

05-07: From: Server Farm
robots.txt? NO

05-02: From: 205.209.158.num
robots.txt? YES
Fake referer? YES

04-11: From: . (misconfigured server)
robots.txt? NO

Seems to me the long-standing, oft' proffered claim that the bot obeys robots.txt is specious/misleading when the MJ12bot doesn't ask for it in the first place.

[edited by: incrediBILL at 3:22 am (utc) on June 21, 2009]
[edit reason] Removed URL [/edit]

9:00 pm on June 25, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Let me know when it's ready too.

[edited by: incrediBILL at 9:28 pm (utc) on June 25, 2009]
[edit reason] fixed formatting [/edit]

9:40 pm on June 25, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


I'll post in this thread guys, thanks for your feedback, I hope once this is done you will have some extra confidence that we are the good guys in this...
1:24 am on July 26, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14622
votes: 87


Any news?
3:33 pm on July 26, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Sent PM to Bill, but will post here too - have not forgotten about this stuff, it is a commitment that I will deliver, just had lots of others to finish this month - very busy :(

I am going to send Bill info on proposed scheme, it can be discussed on here as well (if anyone interested) - once you are happy that the proposed way is acceptable I'll get it implemented. I think there is a good chance that we'll have this working (in test phase) by mid-August.

System

7:13 pm on Aug 31, 2009 (gmt 0)

redhat

 
 


The following 6 messages were cut out to new thread by incredibill. New thread at: search_engine_spiders/3983454.htm [webmasterworld.com]
8:52 am on Sep. 3, 2009 (PST -8)

System

7:13 pm on Aug 31, 2009 (gmt 0)

redhat

 
 


The following 41 messages were cut out to new thread by incredibill. New thread at: search_engine_spiders/3985384.htm [webmasterworld.com]
12:12 am on Sep. 7, 2009 (PST -8)
7:26 am on Sept 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Okay, I'm confused, sorry. I've yet to register anywhere, to either opt-in or opt-out of anything. A few hours ago, a UA named "MJ12bot" from two disparate servers hit two sites 90 minutes apart, each time asking for robots.txt. Faked referers contained different 27-number strings and each site's name --

SITE #1:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.4; [majestic12.co.uk...]
Host: [yada-yada].craw.blueyonder.co.uk

Referers:

[majestic12.co.uk...]
[majestic12.co.uk...]

SITE #2:

UA: Mozilla/5.0 (compatible; MJ12bot/v1.2.5; [majestic12.co.uk...]
IP: 205.209.170.nn

Referer:

[majestic12.co.uk...]

QUESTIONS:

If those two different 27-character strings (10 nos hyphen 16 nos) were/are each site's 'key' -- how'd two, erm, somebody elses get them? Or were those just more fake bots and fake referers? Or--?

TIA

[edited by: incrediBILL at 8:15 am (utc) on Sep. 7, 2009]
[edit reason] clean up [/edit]

7:18 pm on Sept 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


If those two different 27-character strings (10 nos hyphen 16 nos) were/are each site's 'key'

These numbers are not site keys - they are debug params that we only send when getting robots.txt, it allows us to identify "bucket" in which those urls were as well as exact crawler that crawler it. A bucket can contain up to 10k urls from all sort of sites, and same domain can be present in multiple "buckets", hence these numbers can't be used for crawler/site identification even though I've never seen fakes show such referer (they usually don't bother getting robots.txt in the first place).

The new "ident" feature is user-configurable and it will be send consistently by new version of the bot: v1.3.0 (or higher).

This 38 message thread spans 2 pages: 38