Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- msnbot-media


lucy24 - 9:53 pm on Jun 27, 2012 (gmt 0)


Stop me if you've heard this one. While experimenting with an alternative log-wrangling script I ran smack dab into:

131.253.41.45 - - [26/Jun/2012:06:20:22 -0700] "GET /robots.txt HTTP/1.1" 200 533 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
131.253.41.45 - - [26/Jun/2012:06:20:22 -0700] "GET /hovercraft/images/kabloona.jpg HTTP/1.1" 200 44328 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
131.253.41.45 - - [26/Jun/2012:06:20:22 -0700] "GET /hovercraft/caribou.html HTTP/1.1" 200 10970 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"


and

131.253.41.223 - - [26/Jun/2012:07:53:18 -0700] "GET /robots.txt HTTP/1.1" 200 533 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
131.253.41.223 - - [26/Jun/2012:07:53:18 -0700] "GET /hovercraft/images/yesno.jpg HTTP/1.1" 200 38878 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
131.253.41.223 - - [26/Jun/2012:07:53:19 -0700] "GET /hovercraft/caribou.html HTTP/1.1" 200 10970 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"


That is obviously The Real Thing; I'd recognize that pattern anywhere. robots.txt, one image, page the image lives on. For comparison purposes, the same day's logs include

207.46.199.163 - - [26/Jun/2012:08:50:38 -0700] "GET /robots.txt HTTP/1.1" 200 533 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
207.46.199.163 - - [26/Jun/2012:08:50:38 -0700] "GET /images/perez.jpg HTTP/1.1" 200 5781 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
207.46.199.163 - - [26/Jun/2012:08:50:38 -0700] "GET / HTTP/1.1" 200 2180 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"


But what the bleep bleep is 131.253? We've met 131.107.0; there have been occasional threads about it, most recently in March 2012 [webmasterworld.com].

Turns out 131.253.21-47 (really: I checked the adjacent numbers on both sides) belongs to Microsoft. Somewhere along the line they must have subleased it from the company that owns the rest of the 131.253 block. Further cursory research tells me I have never* met this address before.

What gives? Anyone else seen recent visits from this neighborhood?


* I didn't bother to unzip & check older logs, so "never" = within the past year.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4470273.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com