Forum Moderators: open

Message Too Old, No Replies

MSR-ISRCCrawler

Looking for music royalty violations?

         

keyplyr

10:07 am on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Amazed that a WW search did not return any mention of this Microsoft research crawler. Been on my sites for a week taking only HTML files at 5 second intervals, 20 or 30 pages then leaving for a day or two and returning.

131.107.151.93 - - [09/May/2008:01:28:41 -0400] "GET /robots.txt HTTP/1.1" 200 6115 "-" "MSR-ISRCCrawler"

What is ISRC?

The ISRC (International Standard Recording Code) is the international identification system for sound recordings and music videorecordings. Each ISRC is a unique and permanent identifier for a specific recording which can be permanently encoded into a product as its digital fingerprint. Encoded ISRC provide the means to automatically identify recordings for royalty payments.

wilderness

4:14 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another plagiarism bot?

incrediBILL

5:18 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This actually comes from Microsoft and I believe it's MSR (MicroSoft Reasearch) ISRC (Internet Study [of] Result Codes) Crawler. Supposedly this thing is performing a study of soft 404 responses to help msnbot properly detect whether it's getting a real results page or a 404.

wilderness

5:24 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This actually comes

Perhaps on your sites ;)
The Class B eats 403's on my sites since 2003.

incrediBILL

5:30 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With all the high level blocks you have, how your site gets crawled is a mystery!

... or gets visitors for that matter! ;)

[edited by: incrediBILL at 5:30 pm (utc) on May 9, 2008]

wilderness

5:58 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My widgets are simply too interesting and in addition not available in other places.

Course ya have to be interested in widgets ;)

keyplyr

7:52 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This actually comes from Microsoft and I believe it's MSR (MicroSoft Reasearch) ISRC (Internet Study [of] Result Codes) Crawler. Supposedly this thing is performing a study of soft 404 responses to help msnbot properly detect whether it's getting a real results page or a 404 - incrediBILL

Thanks Bill, however if this M$ bot was only checking response codes, I don't think it would hit 3 of my music sites a dozen times for a week and not follow the links to my other non-music sites. Still, it's a mystery.

incrediBILL

7:57 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It beat up my site pretty good recently and my site has nothing to do with music.

keyplyr

11:43 pm on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Everything is about music :)

Ocean10000

4:12 am on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




<Headers>
<header name="From" value="isrc-bot@microsoft.com" />
<header name="User-Agent" value="MSR-ISRCCrawler" />
</Headers>

Well this bot, takes Robots.txt and ignore its, and gets an instant prize of a 403 status code.

As for music my site has none what so ever. Only thing I can guess is my site like some of others mentioned here have bot blocking code active to stop unwanted pests. And its goal might be to see how different applications respond to its requests for what ever reason.

[edited by: Ocean10000 at 4:12 am (utc) on May 10, 2008]

keyplyr

12:18 am on May 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well this bot, takes Robots.txt and ignore it... Ocean10000

Yup it ignores robots,txt, so now it gets banned.

pavlovapete

6:11 am on May 21, 2008 (gmt 0)

10+ Year Member



I too have desires to block this MSR bot.

Do you think banning it will have a negative impact on our (very small) MS search traffic?

wilderness

6:54 am on May 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you think banning it will have a negative impact on our (very small) MS search traffic?

None whatsoever!
I've had the Class B denied since 2003 and have many pages listed with MSN.

koan

6:38 am on May 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's been fast crawling my site for hours now with an average of a page every 4-5 seconds or so. From reading this thread, I see no benefit for me to let eat my bandwith. I'm blocking it for all my sites.

What is it with Microsoft and their disregard for webmasters' interests? That and their referer spamming... they're not getting any love from me.

blend27

1:53 pm on Aug 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We just had this thing plowed thru on of the sites I oversee.

The funny thing it had requested every URI but in a lower case.

e.g. instead of /Widgets/BlueWidgets.cfm it requested /widgets/bluewidgets.cfm