Msg#: 4003514 posted 5:33 am on Oct 8, 2009 (gmt 0)
This spider was discussed here [webmasterworld.com] sometime ago.
It's still crawling from 131.107.65.nn It's still not honoring robots.txt
But now, it's stated purpose has changed from (helping) Live Search understand the rate of change of web pages and understand non-404 error pages, to (analyzing) the web for Microsoft's Search and Ads services.
Msg#: 4003514 posted 2:11 pm on Oct 8, 2009 (gmt 0)
They write that MSR-ISRCCrawler is "typically from 18.104.22.168" but they don't disclose the range. That IP still doesn't have a reverse DNS entry. And they don't explain the other crap coming from 131.107.*
Msg#: 4003514 posted 7:16 pm on Oct 8, 2009 (gmt 0)
It's not impossible.
It's certainly possible to deny access to many of google's IP's and/or "tools" without affecting the crawls by their primary bot.
Sure, but with Google, there are multiple tools and user agents for the exact same IP. Some IP addresses rotate beween any of Google Wireless Transcoder or translate.google.com or Google Keyword Tool or Google Site Verification or AppEngine-Google or blank user agent or regular browser user agent. So if I block the IP, I don't know exactly what I'm blocking. Plus these IPs are scattershot all over the place, like guerrilla warfare.
At least Microsoft has the courtesy to quarantine all its rogue agents under one IP range.