Moderators-- I consider this an SSID topic. But you may decide it's more suited to a bing/msn forum. This paragraph will self-destruct in 60 seconds.
I am not the only person who has been alternately puzzled and vexed by this humanoid. Can we take it as read that the Microsoft corporation does not employ an army of humans with elderly computers and too much time on their hands? The thing is a robot. But, unlike all other bing-affiliated robots, it doesn't ask for robots.txt every five minutes. Instead it asks for it ... never. Admittedly it could be reading over the bingbot's shoulder, and it has never yet asked for a roboted-out page. But it's the principle of the thing.
Starting in early October I've been tracking it. This involves a two-pronged approach: first unblocking the plainclothes bingbot globally-- including letting it run wild in piwik-- and then flagging it in logs. Disclaimer: I suspect its behavior changed during the time I've been tracking it. I mean by coincidence, not for Schroedinger's-cat reasons. We Shall See.
First discovery: there are two of them. Possibly two and a half.
User Agents
#1: MSIE 7. The exact configuration varies from one visit to the next. It's always Windows NT 5.0 or 5.1 (i.e. Windows 2000 or XP, dating back to before 2006) with assorted NET CLR add-ons seemingly at random. According to piwik, it always has a resolution of 800x600. (!)
#2 MSIE 9. This one is always exactly
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
(extra spaces to either side of "WOW64;") with resolution of 1024x768
#3 plausible exception: In October, one visited page included a midi file. For this, the robot switched to the "contype" UA. (I looked this up. It's what earlier MSIE versions used for some types of media files.)
IP
#1 "older ranges" 65.55.211-213, ..215, ..217-218 and 131.253.23-26, ..36 (I checked back: they really do seem to skip .214 and .216 at all times, and they're selective in the 131.253 area). The first is the identical range used by msnbot-media. Usually MSIE 7, sometimes MSIE 9.
#2 "newer range" 199.30.24-25 always MSIE 9. I never saw this range before April, and then only for Preview until early October when it started doing plainclothes duty as well.
Behavior
Both UAs from both IPs pick up non-page files: css, js, midi, pdf. Only MSIE9, and then only from 199.30, picks up images. All non-page files give the page as referer. They never ask for the favicon.
Older ranges: These are more leisurely visits, ranging from a second or two to (rarely) up to 20 seconds from beginning to end. On rare occasions this IP doesn't pick up, or doesn't act on, javascript (in my case generally piwik). IPs can vary within a visit, even hopping between 65.55 and 131.253.
199.30 range: Fully humanoid apart from favicon. Typically in and out within a second or two, exactly like a human; piwik.js is typically requested after all images, reflecting its physical location in the html. Identical IP for the duration of each visit.
Around the middle of October there was a cluster of 199.30 visits (humanoid with images) coming pretty exactly 30 seconds after a visit to the same page by Bing Preview, in each case using the identical IP for both. Each preview seems to have been triggered by text search, not image. (This is assuming that image search is always accompanied by an image fetch.)
Aside: I honestly don't know what the scoop is with Bing Preview. There's never any information about a search-- or indeed any referer at all. And, as noted elsewhere, I can't even figure out how to get a Preview when searching in my own persona. And, finally, am I the only one who thinks it's funny that Bing Preview uses webkit rather than some form of MSIE?
Javascript
Both UAs and both IPs act on javascript, whether or not they get images. In my case, this generally means piwik. They send a full information packet, not the administrative pixel sent out to visitors with scripting turned off.
Especially interesting detail: One of the mid-month Preview-plus-plainclothes visits-- and also an earlier Preview alone-- was to one of the rare pages that uses javascript for something other than analytics. Thanks to this page, I know that both Preview and the MSIE 9 robot claim to have the Euphemia font, but not a third-party font that the page also tests for. (Is there any way to fake this? I guess theoretically yes, but surely more trouble than it's worth.) Exactly what I'd expect of a human with the same UA.
Conclusion
... none, actually. I still have no idea what the thing(s) is/are for. But so far it hasn't done anything really egregious.