Forum Moderators: open

Message Too Old, No Replies

MSN's many cloaked bots.

Mass undocumented activity in search.msn.com ranges

         

Pfui

5:46 pm on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What with the so many ongoing threads about MSN's msnbot-related crawlers and their various (mis)behaviors, I wasn't sure where to put yet another example of a cloaked UA. So here's a new thread containing a bunch of of MSN's stealth UAs, including this one I just found prowling around, and in a CGI-related directory that's explicitly denied to all bots six ways to Sunday:

msnbot-65-55-165-15.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)

Are these actually deceptive "cloak detectors"? Hmm. Here are just some of the cloaked UAs mentioned in recent threads:

From: "MSN's cloak-crawling again: Twitter / Tweets [webmasterworld.com]"

70.37.13.98
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

From: "Mozilla/4.0: MSN strikes (out) again. [webmasterworld.com]"

65.55.234.160
Mozilla/4.0

From: "MSN fakes referrers [webmasterworld.com]" (see thread for loads more)

msnbot-65-55-104-70.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)

msnbot-65-55-104-60.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)

Last but not least...

Here's the Official Word on MSNBot: "Bing Webmaster Center Help [help.live.com]". As of this post, "The web crawler used by Bing is also known as MSNBot" -- a.k.a.:

msnbot
msnbot-media
msnbot-newsblogs
msnbot-products

There's nary a hint of the countless cloaked, bot-acting UAs hailing from bare MSN IPs and .search.msn.com. Looks like when it comes to our own sites, we're not supposed to fool them, but it's okay for them to fool us. Tsk.

dstiles

8:43 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's included in the registry as "MICROSOFT-DYNAMIC-HOSTING" so I assume it's at least similar.

caribguy

7:48 am on Mar 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a surefire way to get banned:

70.37.164.92 - - [26/Feb/2010:06:34:58 -0600] "GET / HTTP/1.1" 403 274 "-" "-"

caribguy

1:55 am on Mar 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This visit resulted from a Facebook fan page post which contained links to both pages:

www.example.com 65.52.16.152 - - [06/Mar/2010:19:27:40 -0600] "GET /folder/page HTTP/1.1" 403 289 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
www.example.com 65.52.16.152 - - [06/Mar/2010:19:27:41 -0600] "GET /folder/otherpage HTTP/1.1" 403 291 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

dstiles

10:51 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If anyone is using hard coded IPs to filter msnbot: they have recently added a LOT of bot IPs to the 207.46.n.n range.

[edit dstiles]

Forgot to mention: the rDNS entries are in two different formats:

msnbot-207-46-nnn-nnn.msn.com

msnbot-207-46-nnn-nnn.search.msn.com

caribguy

7:24 pm on Mar 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And Microsoft Research is back at it too... With a "new and improved [webmasterworld.com]" UA ;)

131.107.151.126 - - [25/Mar/2010:00:11:22 -0800] "GET /robots.txt HTTP/1.0" 403 277 "-" "MSRBOT"

Pfui

2:16 am on Mar 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This isn't cloaked but look who's/what's using HEAD requests now...

msnbot-65-55-37-159.search.msn.com - - [24/Mar/2010:21:47:54 -0700] "HEAD /dir/file.html HTTP/1.1" 403 0 "-" "-"

Just the one. More to come?

Pfui

4:20 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep. Another new whatsit from .search.msn.com. Check out the end of the UA. Emphasis mine:

msnbot-65-55-3-199.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

robots.txt? Yes

1.) Considering msnbot usually hits a few pages at a time, tops, from multiple IPs, this one was also different in that it hit almost every page. Crawl rate was ~1 page/sec.

2.) How atypical is this critter? Only six Goo results for "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._" as of this posting.

3.) I may reroute this until we know more because I neither like the looks of that ._ quasi-suffix nor the fact this this is apparently verrrry new (...or newly, improperly altered).

tpeacock

8:12 am on Apr 6, 2010 (gmt 0)

10+ Year Member



I had the same experience on April 1. I noted the "._" at the end of the User Agent string because it took all the files of the site (230 - 240) at the same Crawl rate Pfui mentioned and from the same IP. Quite a bit different behavior than the typical msnbot!

Thomas

tpeacock

8:19 am on Apr 6, 2010 (gmt 0)

10+ Year Member



Correction: this one came from msnbot-65-55-3-139.search.msn.com.

Thomas

Pfui

6:19 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looks like it's officially propagating. From another site:

msnbot-65-55-3-210.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

robots.txt? Yes

Followed one link, not another. (The site is a person's very brief one-pager.)

If I do say so myself, the UA name 'change' looks/is seriously stupid. Why not, oh, "msnbot/2.0c"? Or "msnbot/2.01"? Whatever.

Pfui

3:09 am on Apr 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



See also Jim's March 18th-onward sightings of the same "._" variation:

Wanted: Crawler Quality Assurance Engineer
[webmasterworld.com...]

Pfui

8:28 am on Apr 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If it's not one oddity, it's another --

msnbot-65-55-24-143.search.msn.com
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

robots.txt? NO
This 42 message thread spans 2 pages: 42