homepage Welcome to WebmasterWorld Guest from 54.234.141.47
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MSNBot posing as Internet Explorer 7 and not reading robots.txt?
robzilla

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4587076 posted 9:33 am on Jun 24, 2013 (gmt 0)

It's running Javascript and throwing off my statistics. So far coming from these IP addresses, which appear to be in legitimate MS-blocks:

65.55.212.93
65.55.213.45
65.55.213.46
65.55.213.52
65.55.213.58
65.55.215.242
65.55.215.249
131.253.24.54
131.253.24.74

What are they up to? Hasn't MSNBot been retired?

65.55.212.93 - - [21/Jun/2013:22:24:36 +0200] "GET /[removed] HTTP/1.1" 200 7215 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)"

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 7:31 pm on Jun 24, 2013 (gmt 0)

Could be screen shots, could be anything as things other than spiders aren't technically required to ask for robots.txt and even if it is a spider, it can technically share the cached robots.txt already requested by BingBot. Googlebot and the Google Media thing used for AdSense share cache and one may ask for robots while the other doesn't, nothing wrong with that.

The easiest way to find out is set a simple spider trap and exclude BingBot from the page or folder with a robots.txt rule and then see if the other MS stuff honors that rules or falls into the spider trap along with all the others.

Personally, these things don't bother me as I only allow Bingbot and anything else coming from those ranges get bounced off on their ass.

BTW, don't forget MS now has cloud computing just like Amazon does so you may be seeing something not written by MS crawling from their IP space.

What's the rDNS of the IPs being used? If they say they're for bingbot then it's probably something internal to MS.

robzilla

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4587076 posted 9:36 am on Jun 25, 2013 (gmt 0)

It's something like msnbot-65-55-213-52.search.msn.com for all these IPs.

Sure hope they're not taking screenshots with IE7 ;-)

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 10:03 am on Jun 25, 2013 (gmt 0)

Not so surprisingly, they may use that version just because it's least likely to be blocked and most commonly supported.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 12:23 pm on Jun 25, 2013 (gmt 0)

robzilla,
you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

131.253.24.47 - - [24/Jun/2013:03:07:39 -0600] "GET / HTTP/1.1" 403 606 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;   SLCC1;   .NET CLR 1.1.4325;   .NET CLR 2.0.40607;    .NET CLR 3.0.30729;    .NET CLR 3.5.30707)"

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4587076 posted 2:31 pm on Jun 25, 2013 (gmt 0)

This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.

robzilla

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4587076 posted 4:49 pm on Jun 25, 2013 (gmt 0)

you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

They are, indeed. Thanks for that, makes it easier to block these requests from my statistics.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 6:22 pm on Jun 25, 2013 (gmt 0)

I've posted these a couple of times.
The forum breaks one or more due to spaces. I've tried alt-insertion.

SetEnvIf User-Agent " ; " keep_out
SetEnvIf User-Agent " \( " keep_out
SetEnvIf User-Agent "; " keep_out

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 10:39 pm on Jun 25, 2013 (gmt 0)

Yeah, the spaces around a semi-colon, the oldest UA bugs out there but still effective trap.

This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.


Share so everyone can benefit.

I'd say it's worth a thread of it's own IMO.

Now I'm back to work on a new honeypot site ;)

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 11:04 pm on Jun 25, 2013 (gmt 0)

I'd say it's worth a thread of it's own IMO.


There's a couple of old and similar threads.
lucy started one about the plain-Jane browsers that was MS focused.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4587076 posted 2:01 am on Jun 26, 2013 (gmt 0)

SetEnvIf User-Agent "; " keep_out


FWIW, the correct syntax is with two trailing spaces.

SetEnvIf User-Agent ";  " keep_out

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved