Forum Moderators: open
It's still crawling from 131.107.65.nn
It's still not honoring robots.txt
But now, it's stated purpose has changed from (helping) Live Search understand the rate of change of web pages and understand non-404 error pages, to (analyzing) the web for Microsoft's Search and Ads services.
Somewhat nebulous details found here:
It's not impossible.It's certainly possible to deny access to many of google's IP's and/or "tools" without affecting the crawls by their primary bot.
Sure, but with Google, there are multiple tools and user agents for the exact same IP. Some IP addresses rotate beween any of Google Wireless Transcoder or translate.google.com or Google Keyword Tool or Google Site Verification or AppEngine-Google or blank user agent or regular browser user agent. So if I block the IP, I don't know exactly what I'm blocking. Plus these IPs are scattershot all over the place, like guerrilla warfare.
At least Microsoft has the courtesy to quarantine all its rogue agents under one IP range.
www.example.com 131.107.0.ab "GET /webpage.html HTTP/1.1" 200 10418 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.aa "GET /somefolder HTTP/1.1" 200 11242 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /thispage.html HTTP/1.1" 200 10458 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /oldfolder/redirect HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /newfolder/redirect/target HTTP/1.1" 200 4352 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /nicetry HTTP/1.1" 403 281 "-" "-"
www.example.com 131.107.0.abc "GET /some/hotlinked/image.jpg HTTP/1.1" 302 28372 "http://forums.example.net/index.php?topic=123456.0" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; MS-RTC EA 2)"
www.example.com 131.107.0.abc "GET / HTTP/1.1" 200 8573 "http://forums.example.net/index.php?topic=123456.0" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; MS-RTC EA 2)"
www.example.com 131.107.0.aa "GET /robots.txt HTTP/1.1" 200 411 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET / HTTP/1.1" 200 9899 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET /robots.txt HTTP/1.1" 200 411 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET / HTTP/1.1" 200 9875 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.65.nnn "GET /robots.txt HTTP/1.1" 200 204 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /myfolder/mypage.html HTTP/1.1" 200 3297 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /styles.css HTTP/1.1" 200 1709 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /robots.txt HTTP/1.1" 200 204 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /myfolder/mypage.html HTTP/1.1" 200 3319 "-" "MSR-ISRCCrawler"
And the "human looking" traffic that comes in without a referer from 65.55.n.n is now subject to closer scrutiny.
As always, YMMV...