MJ12conan - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

MJ12conan

Browser, not bot

Lord Majestic

2:02 pm on Nov 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

User-agent: MJ12conan/1.0.0 (browser, not a bot) (http://www.example.com?+)

Note: URL removed to comply with T&Cs here, here is the explanation text from site:

-----------------------------------------------------
MJ12conan is a specialised browser that is used to test visual content analysis technology that is under development by Majestic-12.

If you came to this page from a link in your log file then please be aware that MJ12conan is NOT a bot or crawler: it works no differently from browser and requires manual input of URL into equivalent of a "location" field. Since its just a browser robots.txt standard is not supported as it does not apply to browsers that are driven by humans.

-----------------------------------------------------

I am posting this with hope to get my assumption of being right in not complying with robots.txt in this case. I know that some of you have very strong feelings about bots not complying with robots.txt (and I agree with you insofar bots are concerned), but this tool is not a bot but a browser of special kind. It can't and won't be used for crawling (analysed data will be discarded - its mainly for visual validation of content analysis technology) and I expect number of actual requests being done to be pretty close to 0, so there is a very good chance you won't see it ever.

Its still not too late to change the tool to support robots.txt but I wanted to validate my theory that since its a browser rather than bot, then robots.txt does not apply to it.

GaryK

9:36 pm on Nov 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

If I understand you correctly this specialized browser will be used by a human to check pages that were crawled by your bot. If so then the browser should not be looking at anything that's disallowed by robots.txt, right?

Lord Majestic

9:39 pm on Nov 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

If I understand you correctly this specialized browser will be used by a human to check pages that were crawled by your bot.

Incorrect -- this specialised browser is used to look at pages chosen by the human (they have to be typed or pasted into equivalent of a location bar), just like Firefox and IE do. It is totally unrelated to my bot (MJ12bot) that supports robots.txt and will continue to do so. The reason I picked special user-agent is that because I know you guys watch people who don't request images and I don't want anybody to derive incorrect conclusions about the nature of requests made to your servers.

jdMorgan

1:31 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hmm...

I'd like:

"Mozilla/5.0 (compatible; Conan/1.0.0 browser; http://example.com/conan.html)"

or something similar. It might save you some trouble with sites that require "Mozilla/" for browser sniffing. Also I'd suggest a distinct page (conan.html in the example above) telling webmasters/log-checkers just what you posted above. To do otherwise is to invoke the wrath of Crom! ;)

Edited -- Almost forgot the main point: If a human types or cut/pastes the URL into this Conan thing, then it's not a 'bot. Only automated user-agents can reasonably be required to request and honor robots.txt. As a matter of fact, I get suspicious when I see browsers (or alleged browsers) looking at robots.txt.

Jim

[edited by: jdMorgan at 1:34 am (utc) on Nov. 3, 2005]

GaryK

1:34 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

To do otherwise is to invoke the wrath of Crom!

Or worse, wilderness. ;)

jdMorgan

1:37 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

or those who use his well-documented A/P block lists... :)

Jim

wilderness

1:53 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Or worse, wilderness.

No dissention will be allowed from the ranks ;)

Lord Majestic

1:59 am on Nov 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"Mozilla/5.0 (compatible; Conan/1.0.0 browser; http://example.com/conan.html)"

Ummm, suppose since its not a bot, but a browser then it would be logical to use Mozilla's like user string - thanks for this idea :)

My point about robots.txt bot/browser issue was the main one, thanks for sharing your opinion too!