Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- msnbot-media


lucy24 - 7:57 pm on Dec 13, 2012 (gmt 0)


:: bump ::

Do we have an ongoing thread about the plainclothes bingbot? Things are getting a bit tangled.

In today's "D'oh!" moment I went over to the Bing WMT discussion boards in search of enlightenment. (Link will probably only work if you are signed on to bing.)

From March 2010 [bing.com], final post in a long thread, from a non-bing-affiliated human:

It turns out that the traffic you're seeing isn't really the MSNBot search indexer - it's Bing Translator (AKA Microsoft Translator / Windows Live Translator).

If a user crawls your site and then translates the page into their local language through this tool then you will see the request coming from a 65.55 IP address which MAY (not always) reverse DNS to say "msnbot". However it's a real human requesting this page, and you should not really attempt to block it unless "msnbot" is in the user-agent string.

The translate server is proxying the request and you will therefore see the user's user-agent string - not the MSNBOT one.

It seems microsoft are repurposing IP addresses and not updating the reverse DNS names for them, so many translate server IP addresses reverse lookup to a MSN bot address.

There are several different ways of using their service - you can use Page > Translate with Live Search in IE8, or from the Windows Live Toolbar, or you can click "translate this page" from a bing.com search result screen.

For people whose approach to translators is Shoot To Kill, this makes things easy.

Problem is, I am not sure I believe it. I pored over logs from the visit I fortuitously quoted above, and there were no human requests for images during the relevant time period-- and both of those pages come with lots of keyboard diagrams. In fact the only related image requests from around that time came from the YandexBot, whose attention seems to have been caught by a Yandex search. And if Yandex is in collusion with Bing Translate it is definitely Stop The Presses time.

Next I got Bing to dig up one of my non-English pages-- which was not easy, in spite of the <lang> tags that they claim to recognize-- and asked for a translation. Logs say:

131.253.36.194 - - [13/Dec/2012:11:48:18 -0800] "GET /ebooks/perez/PerezEsp.html HTTP/1.1" 200 12818 "-" "{my browser here}"

All associated files-- CSS, images etc-- are logged as:

{my IP} - - [13/Dec/2012:11:48:19 -0800] "GET /ebooks/ebookstyles.css HTTP/1.1" 200 2999 "http://131.253.14.66/proxy.ashx?h=KJ1kesesgM4REwrijiOEFN1Hnv3Pe7dI&a={percent-encoded filename}" "{my browser}"

-- a format guaranteed to yield a page rich with No Hotlinks images.

So... Nice try, but I don't think it's the answer. At least not this month, this year. There may be more recent Bing threads that I couldn't find. If not, I guess the next step is to try asking again. Noteworthy that in the earlier thread, nobody from Bing/MSN stepped in to explain.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4470273.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com