lucy24 - 11:06 pm on Nov 7, 2013 (gmt 0)
Moderators-- I consider this an SSID topic. But you may decide it's more suited to a bing/msn forum. This paragraph will self-destruct in 60 seconds.
I am not the only person who has been alternately puzzled and vexed by this humanoid. Can we take it as read that the Microsoft corporation does not employ an army of humans with elderly computers and too much time on their hands? The thing is a robot. But, unlike all other bing-affiliated robots, it doesn't ask for robots.txt every five minutes. Instead it asks for it ... never. Admittedly it could be reading over the bingbot's shoulder, and it has never yet asked for a roboted-out page. But it's the principle of the thing.
Starting in early October I've been tracking it. This involves a two-pronged approach: first unblocking the plainclothes bingbot globally-- including letting it run wild in piwik-- and then flagging it in logs. Disclaimer: I suspect its behavior changed during the time I've been tracking it. I mean by coincidence, not for Schroedinger's-cat reasons. We Shall See.
First discovery: there are two of them. Possibly two and a half.
#1: MSIE 7. The exact configuration varies from one visit to the next. It's always Windows NT 5.0 or 5.1 (i.e. Windows 2000 or XP, dating back to before 2006) with assorted NET CLR add-ons seemingly at random. According to piwik, it always has a resolution of 800x600. (!)
#2 MSIE 9. This one is always exactly
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
(extra spaces to either side of "WOW64;") with resolution of 1024x768
#3 plausible exception: In October, one visited page included a midi file. For this, the robot switched to the "contype" UA. (I looked this up. It's what earlier MSIE versions used for some types of media files.)
#1 "older ranges" 65.55.211-213, ..215, ..217-218 and 131.253.23-26, ..36 (I checked back: they really do seem to skip .214 and .216 at all times, and they're selective in the 131.253 area). The first is the identical range used by msnbot-media. Usually MSIE 7, sometimes MSIE 9.
#2 "newer range" 199.30.24-25 always MSIE 9. I never saw this range before April, and then only for Preview until early October when it started doing plainclothes duty as well.
Both UAs from both IPs pick up non-page files: css, js, midi, pdf. Only MSIE9, and then only from 199.30, picks up images. All non-page files give the page as referer. They never ask for the favicon.
199.30 range: Fully humanoid apart from favicon. Typically in and out within a second or two, exactly like a human; piwik.js is typically requested after all images, reflecting its physical location in the html. Identical IP for the duration of each visit.
Around the middle of October there was a cluster of 199.30 visits (humanoid with images) coming pretty exactly 30 seconds after a visit to the same page by Bing Preview, in each case using the identical IP for both. Each preview seems to have been triggered by text search, not image. (This is assuming that image search is always accompanied by an image fetch.)
Aside: I honestly don't know what the scoop is with Bing Preview. There's never any information about a search-- or indeed any referer at all. And, as noted elsewhere, I can't even figure out how to get a Preview when searching in my own persona. And, finally, am I the only one who thinks it's funny that Bing Preview uses webkit rather than some form of MSIE?
... none, actually. I still have no idea what the thing(s) is/are for. But so far it hasn't done anything really egregious.