homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

MSN's Stealth Missions

 3:40 pm on Oct 8, 2011 (gmt 0)

Reports of stealth and abuse by MSN/Bing --

MSN's many cloaked bots. [webmasterworld.com...]
MSN's many cloaked bots. Again. [webmasterworld.com...]

-- keep getting a little long in the page count so here we go, again, after a brief recap of top problems...

1.) Cloaked / bare (no rDNS) IPs from (partial listing):


2.) Atypical hit patterns like this now-common 'no UA, no robots.txt, no referrer, 11-hits-to-same-file' visit from


3.) Atypical, 'unofficial' UAs from .search.msn.com domains akin to this morning's visit from:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)

robots.txt? NO

Cloaked UAs include:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)

robots.txt? NO

And last but not least, the ongoing oddity:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)._



 7:36 pm on Oct 8, 2011 (gmt 0)

A while ago bingdude visited the bing forum hereabouts and I mentioned this. He promised to get it looked into. No recent activity from him so we can only assume bing has begun taking on the google policy of popping in once and then departing for ever. :(


 7:46 pm on Oct 8, 2011 (gmt 0)

I was hoping this lot was going to resolved several months ago.

I am very close to blocking the whole lot for good. When I have some spare time the trigger will be pulled.


 6:58 am on Oct 9, 2011 (gmt 0)

I banned the ones ending in )._ ages ago.

And I am already gradually banning the ones with 'unofficial' UAs, started with my high volume sites first.

Haven't decided about the "no rDNS" IPs yet.


 3:51 pm on Oct 17, 2011 (gmt 0)

During last night's wee hours, a cloaked IP and basically a scraper UA:

robots.txt? NO

Absolutely not okay.


 8:58 pm on Oct 18, 2011 (gmt 0)

Further to my posting 8th Oct above:

On that day I stickied bingdude asking him to reappear. Seems like my genie-invocation spell failed: no reply from him nor has he been seen in the Bing forum for a long time. Rapped knuckles for being helpful?

Gone the way of all the google visitors. Sad, I was beginning to have hope! :(


 11:05 pm on Oct 18, 2011 (gmt 0)

Thank you for trying. But you know, in the major-SE scheme of things, I reckon we're but fleas on the rumps of elephants: insignificant, annoying, and dependent on the ride.


 10:45 am on Oct 23, 2011 (gmt 0)
Firefox 7.0

robots.txt? NO


 6:45 pm on Oct 24, 2011 (gmt 0)

Ten minutes ago, again w/ Wget from a kin IP of the last Wget (

robots.txt? NO

Anyone else seeing any repeatedly/intentionally rogue UAs?


 9:07 pm on Oct 24, 2011 (gmt 0)

Yes, I see variations of those all the time and have for years. I've dismissed them long ago and they're filtered from what I actually spend time following-up on; maybe naively but nonetheless.

The flags I used to research would take 2-3 hours every morning. I now filter the usual suspects (defined or not) and only spend time on the actual threats. Got it down to about an hour now :)


 7:30 pm on Nov 7, 2011 (gmt 0)

In a twitter swarm:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO

grandma genie

 4:30 am on Nov 9, 2011 (gmt 0)

I'm getting the stealth visits too from microsoft:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

Not acting anything like a bot.


 4:26 am on Nov 13, 2011 (gmt 0)

This just in. Note the (in)famous UA. Am amazed they're still using it:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

robots.txt? Yes


 8:28 pm on Nov 13, 2011 (gmt 0)

I still have a block on that UA. I thought they would have fixed it along with the DNS update but hadn't yet checked it. Ah, well.


 10:26 pm on Nov 14, 2011 (gmt 0)

They're hammering my server right now using the following UA:
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)"

More details here: [webmasterworld.com...]


 2:18 pm on Nov 15, 2011 (gmt 0)

Exact same IP and UA as reported on 11-07 above, but not post-tweet this time. I give.
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO


 7:00 pm on Nov 15, 2011 (gmt 0)

Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

Just thought I'd point it out in case someone wasn't paying attention ;)


 8:03 pm on Nov 15, 2011 (gmt 0)

Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

It's joined the ranks of Mozilla. AppleWebKit is even used in the UA string of Android (their arch rival.) Guess it comes down to who was there first gets to name the mountain.


 7:38 pm on Nov 16, 2011 (gmt 0)

Sneaky little thing MSN, from a visit today

1n:24:33 /robots.txt - - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:05 /dir1/page1.asp - - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:07 /dir1/page1.asp - - Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534~~(KHTML, like Gecko)

comes as bingbot and gets the page, two seconds later comes as "regular UA" and gets nothing.

Instead of using their resources on frivolities they should crawl as a proper bot and get peoples web sites indexed.

(ps : ~~ = double space)


 8:54 pm on Nov 16, 2011 (gmt 0)

I suspect the webkit UAs are doing something like google's web preview OR checking for viruses OR... Who knows? It's probably bot-ish but not pure bot.

Pity bingdude won't visit here. He's back in the Bing forum at present but for how long, who knows?


 2:24 am on Dec 31, 2011 (gmt 0)

Beats heck outta me what MSN was doing today with this laughable UA:

Mozilla/4.0 (compatible

msnbot-157-55-17-117.search.msn.com [projecthoneypot.org...]

1n:28:33 /dir/filename.html [302'd to...]
1n:28:34 /botbait/ [403]

robots.txt? NO


 3:13 am on Dec 31, 2011 (gmt 0)

@ Pfui

Yup, I reported that a couple days ago:



 2:47 pm on Jan 9, 2012 (gmt 0)

'No UA, no robots.txt, no referrer, 11-hits-to-same-file' visits from Microsoft's Dynamic Hosting IPs now:

NetRange: -

During the first week of Jan., two days apart:
04:19:33 /dir/filename20.html
04:19:45 /dir/filename20.html
04:19:56 /dir/filename20.html
04:20:07 /dir/filename20.html
04:20:19 /dir/filename20.html
04:20:30 /dir/filename20.html
04:20:42 /dir/filename20.html
04:20:53 /dir/filename20.html
04:21:04 /dir/filename20.html
04:21:16 /dir/filename20.html
04:21:27 /dir/filename20.html
21:35:43 /dir/filename41.html
21:35:54 /dir/filename41.html
21:36:06 /dir/filename41.html
21:36:17 /dir/filename41.html
21:36:29 /dir/filename41.html
21:36:40 /dir/filename41.html
21:36:51 /dir/filename41.html
21:37:03 /dir/filename41.html
21:37:14 /dir/filename41.html
21:37:25 /dir/filename41.html
21:37:37 /dir/filename41.html


 8:01 pm on Jan 9, 2012 (gmt 0)

The MS equivalent of AWS? I've had the range - blocked for two years now.


 12:22 am on Jan 10, 2012 (gmt 0)

If only blocks stopped them from coming...


 10:45 pm on Jan 11, 2012 (gmt 0)

Oh, come on:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MSN 9.0;MSN 9.1;MSN 9.6;MSN 10.0;MSN 10.2; MSNbMSNI; MSNmen-us; MSNcIA)

robots.txt? NO
Referrer? YES

The referrer was legit. But the hit from a bare-IP MSN IP? Beats heck outta me. Employee, maybe -- 403'd because MSN plays fast and loose with its hordes, and hoards, of cloaked bots.


 8:01 pm on Jan 24, 2012 (gmt 0)

Came across a new (to me) MS IP range today... -

One of the IPs was used as a proxy for an unspecified forwarding IP so the range could include proxies or it could be a "broadband" range (with local proxy/firewall). I could get no DNS information about the range, only the whois data.


 4:48 am on Feb 20, 2012 (gmt 0)

Continuing the theme of wtf-ness:

I'd assumed that in the course of January [webmasterworld.com] I got to know all the major robot players. Today during a routine check of Bing/MS IPs, which normally results in dead silence*, I ran smack into a pile of msnbots.

Nothing new about msnbot/2.0b-- and that's just the point. Its owners [onlinehelp.microsoft.com] say it's been put out to pasture, replaced by the bingbot.** The specialized msnbot-media is still on the job, but I had to go all the way back to May of 2011 for the last vanilla msnbot. What's up? Does the msnbot know something about the Social Security system that it's not telling? Was the MSN retirement package not all that it expected?

In the middle of the msnbots-- did it think it could hide?-- was a whole slew of msnbot-NewsBlogs (their plural). They too have been around for years; they're mentioned in assorted WebmasterWorld threads. I have never met one before. (Never = since April 2011 when I started saving raw logs.)

They made a total of 16 successful requests. Half were for robots.txt, always taken in pairs. The other half were for...

Let me backtrack here. For a long time I had one unusually fat file that was inordinately popular with the wrong kind of robots. It also got the occasional search-engine hit, most of them from humans who were clearly looking for something else. Wasted time and bandwidth on all sides. A couple weeks back I cut off the first 5% of the file and saved it under the name of the original fat version. The old one got tucked away behind a new name, a nofollow link and a noindex meta tag. If humans want to read the whole thing they're welcome. Robots can jolly well go on a diet.

The newly arrived blogbot read this slimmed-down file eight separate times.

The newly pulled-from-retirement msnbot puttered around here and there-- including a single serving of robots.txt-- presumably hoping I wouldn't notice when it, too, read the slim file twice... followed by the fat file.

Well, hey. It's not google. It doesn't have to pay attention to the "nofollow" directive. And the file's already indexed, so it's not like it's seeing anything it hasn't seen dozens of times before.

* Figure of speech. It's really the computer's "Bzzt!" sound meaning "Nope, nothing here." The bingbot and the msnbot-media have already been filtered out; the plainclothes bot is blocked.
** They also say, quote, "Bing does not share IP addresses for our crawlers." I'll trade you a 65\.5[2-5]\. for a 157\.(5[4-9]|60). Anyone got a spare 207\.46\.?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved