Welcome to WebmasterWorld Guest from 54.162.240.235

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSNBot posing as Internet Explorer 7 and not reading robots.txt?

     
9:33 am on Jun 24, 2013 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



It's running Javascript and throwing off my statistics. So far coming from these IP addresses, which appear to be in legitimate MS-blocks:

65.55.212.93
65.55.213.45
65.55.213.46
65.55.213.52
65.55.213.58
65.55.215.242
65.55.215.249
131.253.24.54
131.253.24.74

What are they up to? Hasn't MSNBot been retired?

65.55.212.93 - - [21/Jun/2013:22:24:36 +0200] "GET /[removed] HTTP/1.1" 200 7215 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)"
7:31 pm on Jun 24, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Could be screen shots, could be anything as things other than spiders aren't technically required to ask for robots.txt and even if it is a spider, it can technically share the cached robots.txt already requested by BingBot. Googlebot and the Google Media thing used for AdSense share cache and one may ask for robots while the other doesn't, nothing wrong with that.

The easiest way to find out is set a simple spider trap and exclude BingBot from the page or folder with a robots.txt rule and then see if the other MS stuff honors that rules or falls into the spider trap along with all the others.

Personally, these things don't bother me as I only allow Bingbot and anything else coming from those ranges get bounced off on their ass.

BTW, don't forget MS now has cloud computing just like Amazon does so you may be seeing something not written by MS crawling from their IP space.

What's the rDNS of the IPs being used? If they say they're for bingbot then it's probably something internal to MS.
9:36 am on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



It's something like msnbot-65-55-213-52.search.msn.com for all these IPs.

Sure hope they're not taking screenshots with IE7 ;-)
10:03 am on Jun 25, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Not so surprisingly, they may use that version just because it's least likely to be blocked and most commonly supported.
12:23 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



robzilla,
you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

131.253.24.47 - - [24/Jun/2013:03:07:39 -0600] "GET / HTTP/1.1" 403 606 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;   SLCC1;   .NET CLR 1.1.4325;   .NET CLR 2.0.40607;    .NET CLR 3.0.30729;    .NET CLR 3.5.30707)"
2:31 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.
4:49 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



you might recheck your logs and see if these are using the broken UA with the double-trialing spaces.

They are, indeed. Thanks for that, makes it easier to block these requests from my statistics.
6:22 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've posted these a couple of times.
The forum breaks one or more due to spaces. I've tried alt-insertion.

SetEnvIf User-Agent " ; " keep_out
SetEnvIf User-Agent " \( " keep_out
SetEnvIf User-Agent "; " keep_out
10:39 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yeah, the spaces around a semi-colon, the oldest UA bugs out there but still effective trap.

This bot is used to test JavaScript on a page. It can be detected using a JavaScript timing event. I can post a code example if you like.


Share so everyone can benefit.

I'd say it's worth a thread of it's own IMO.

Now I'm back to work on a new honeypot site ;)
11:04 pm on Jun 25, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'd say it's worth a thread of it's own IMO.


There's a couple of old and similar threads.
lucy started one about the plain-Jane browsers that was MS focused.
2:01 am on Jun 26, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



SetEnvIf User-Agent "; " keep_out


FWIW, the correct syntax is with two trailing spaces.

SetEnvIf User-Agent ";  " keep_out
 

Featured Threads

Hot Threads This Week

Hot Threads This Month