Welcome to WebmasterWorld Guest from 54.204.74.171

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSR-ISRCCrawler repurposed

Now with analysis "for Microsoft's Search and Ads services"

   
5:33 am on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



This spider was discussed here [webmasterworld.com] sometime ago.

It's still crawling from 131.107.65.nn
It's still not honoring robots.txt

But now, it's stated purpose has changed from (helping) Live Search understand the rate of change of web pages and understand non-404 error pages, to (analyzing) the web for Microsoft's Search and Ads services.

Somewhat nebulous details found here:

[research.microsoft.com...]

2:11 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



They write that MSR-ISRCCrawler is "typically from 131.107.65.41" but they don't disclose the range. That IP still doesn't have a reverse DNS entry. And they don't explain the other crap coming from 131.107.*
5:15 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



MS will never garnish any credibility from this IP range and should abandon the range.
6:57 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



MS will never garnish any credibility from this IP range and should abandon the range.

I wish I had the impunity to dare suggest the same for some of Google's IPs.

7:05 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I wish I had the impunity to dare suggest the same for some of Google's IPs.

It's not impossible.

It's certainly possible to deny access to many of google's IP's and/or "tools" without affecting the crawls by their primary bot.

7:16 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



It's not impossible.

It's certainly possible to deny access to many of google's IP's and/or "tools" without affecting the crawls by their primary bot.

Sure, but with Google, there are multiple tools and user agents for the exact same IP. Some IP addresses rotate beween any of Google Wireless Transcoder or translate.google.com or Google Keyword Tool or Google Site Verification or AppEngine-Google or blank user agent or regular browser user agent. So if I block the IP, I don't know exactly what I'm blocking. Plus these IPs are scattershot all over the place, like guerrilla warfare.

At least Microsoft has the courtesy to quarantine all its rogue agents under one IP range.

12:16 am on Oct 9, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I have an equal-opportunity policy when it comes to blocking. A bot's privilege of accessing my server is granted based on its perceived merits. That privilege has just been revoked for 131.107/16


www.example.com 131.107.0.ab "GET /webpage.html HTTP/1.1" 200 10418 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.aa "GET /somefolder HTTP/1.1" 200 11242 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /thispage.html HTTP/1.1" 200 10458 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /oldfolder/redirect HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /newfolder/redirect/target HTTP/1.1" 200 4352 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; WOW64; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"
www.example.com 131.107.0.ab "GET /nicetry HTTP/1.1" 403 281 "-" "-"
www.example.com 131.107.0.abc "GET /some/hotlinked/image.jpg HTTP/1.1" 302 28372 "http://forums.example.net/index.php?topic=123456.0" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; MS-RTC EA 2)"
www.example.com 131.107.0.abc "GET / HTTP/1.1" 200 8573 "http://forums.example.net/index.php?topic=123456.0" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; MS-RTC EA 2)"
www.example.com 131.107.0.aa "GET /robots.txt HTTP/1.1" 200 411 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET / HTTP/1.1" 200 9899 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET /robots.txt HTTP/1.1" 200 411 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.0.aa "GET / HTTP/1.1" 200 9875 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
www.example.com 131.107.65.nnn "GET /robots.txt HTTP/1.1" 200 204 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /myfolder/mypage.html HTTP/1.1" 200 3297 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /styles.css HTTP/1.1" 200 1709 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /robots.txt HTTP/1.1" 200 204 "-" "MSR-ISRCCrawler"
www.example.com 131.107.65.nnn "GET /myfolder/mypage.html HTTP/1.1" 200 3319 "-" "MSR-ISRCCrawler"

And the "human looking" traffic that comes in without a referer from 65.55.n.n is now subject to closer scrutiny.

As always, YMMV...