Welcome to WebmasterWorld Guest from 54.159.50.111

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSNBot posing as Internet Explorer 7 and not reading robots.txt?

     
9:33 am on Jun 24, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:966
votes: 68


It's running Javascript and throwing off my statistics. So far coming from these IP addresses, which appear to be in legitimate MS-blocks:

65.55.212.93
65.55.213.45
65.55.213.46
65.55.213.52
65.55.213.58
65.55.215.242
65.55.215.249
131.253.24.54
131.253.24.74

What are they up to? Hasn't MSNBot been retired?

65.55.212.93 - - [21/Jun/2013:22:24:36 +0200] "GET /[removed] HTTP/1.1" 200 7215 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)"
7:31 pm on June 24, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Could be screen shots, could be anything as things other than spiders aren't technically required to ask for robots.txt and even if it is a spider, it can technically share the cached robots.txt already requested by BingBot. Googlebot and the Google Media thing used for AdSense share cache and one may ask for robots while the other doesn't, nothing wrong with that.

The easiest way to find out is set a simple spider trap and exclude BingBot from the page or folder with a robots.txt rule and then see if the other MS stuff honors that rules or falls into the spider trap along with all the others.

Personally, these things don't bother me as I only allow Bingbot and anything else coming from those ranges get bounced off on their ass.

BTW, don't forget MS now has cloud computing just like Amazon does so you may be seeing something not written by MS crawling from their IP space.

What's the rDNS of the IPs being used? If they say they're for bingbot then it's probably something internal to MS.
9:36 am on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:966
votes: 68


It's something like msnbot-65-55-213-52.search.msn.com for all these IPs.

Sure hope they're not taking screenshots with IE7 ;-)
10:03 am on June 25, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Not so surprisingly, they may use that version just because it's least likely to be blocked and most commonly supported.
12:23 pm on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


robzilla,
you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

131.253.24.47 - - [24/Jun/2013:03:07:39 -0600] "GET / HTTP/1.1" 403 606 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;   SLCC1;   .NET CLR 1.1.4325;   .NET CLR 2.0.40607;    .NET CLR 3.0.30729;    .NET CLR 3.5.30707)"
2:31 pm on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.
4:49 pm on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:966
votes: 68


you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

They are, indeed. Thanks for that, makes it easier to block these requests from my statistics.
6:22 pm on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


I've posted these a couple of times.
The forum breaks one or more due to spaces. I've tried alt-insertion.

SetEnvIf User-Agent " ; " keep_out
SetEnvIf User-Agent " \( " keep_out
SetEnvIf User-Agent "; " keep_out
10:39 pm on June 25, 2013 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Yeah, the spaces around a semi-colon, the oldest UA bugs out there but still effective trap.

This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.


Share so everyone can benefit.

I'd say it's worth a thread of it's own IMO.

Now I'm back to work on a new honeypot site ;)
11:04 pm on June 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


I'd say it's worth a thread of it's own IMO.


There's a couple of old and similar threads.
lucy started one about the plain-Jane browsers that was MS focused.
2:01 am on June 26, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


SetEnvIf User-Agent "; " keep_out


FWIW, the correct syntax is with two trailing spaces.

SetEnvIf User-Agent ";  " keep_out