homepage Welcome to WebmasterWorld Guest from 54.205.188.59
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Rogue bot from microsoft
Msrbot
marodhum




msg:3582349
 7:30 pm on Feb 22, 2008 (gmt 0)

Today i got hits from MSRBOT of microsoft and getting 404's. My log excerpts,
131.107.151.zzz - "GET /not-there-1.txt HTTP/1.0" 404 42 "-" "MSRBOT (http://research.microsoft.com/research/sv/msrbot/"
131.107.151.zzz - "GET /not-there-2.txt HTTP/1.0" 404 42 "-" "MSRBOT (http://research.microsoft.com/research/sv/msrbot/"
The last number in the ip range is 3 digit, don't know, why 3 x's change to these things!

There is some strange thing in its behaviour,
a) it is asking for non-existent page
b) When i tried to view the URL it supplied in UA string, i can't reach anywhere.

After google-ing i found one page here [research.microsoft.com].
This entry is from that page
Why is MSRBot trying to download incorrect links from my server? Or from a server that doesn't exist? Because MSRBot obtains the list of links to crawl by extracting them from documents on the web, there must be an incorrect link available on the web. To determine the location of this links, look at the referral field in your web server log.

Now, how i am suppose to look for the referer, if it is blank, is anybody's guess. ;)
i found this post [webmasterworld.com] in WebmasterWorld.
Note, before they were crawling from above.net, ip - 209.249.11.x and now they are coming from ip owned by microsoft.
Since their behaviour is suspicious i banned them through .htaccess.
anybody else seeing this?

 

wilderness




msg:3582504
 11:06 pm on Feb 22, 2008 (gmt 0)

There's hordes of this IP in the archives.
They betting eating 403's from my sites steadily for weeks.

There's a recent thread where Brett advised that the range was utiized by MSN's research team.

Here's somethings very old:
[webmasterworld.com...]
[webmasterworld.com...]

marodhum




msg:3582615
 3:32 am on Feb 23, 2008 (gmt 0)

Thanks Don, Not only those two posts, i have visited some others from there. So they were doing this from 2003? and nobody care to answer anything from microsoft.
Well, at least in my case, i am going to ban the full 131.107 range.
As

Romeo




msg:3583073
 11:50 pm on Feb 23, 2008 (gmt 0)

... at least that MSRBOT seems to obey the robots.txt.

Just seen one lonely "GET /robots.txt HTTP/1.0" of MSRBOT in last week's logs. As that bot was disallowed /, it apparently moved on and left my sites alone.

Megaclinium




msg:3584166
 7:35 pm on Feb 25, 2008 (gmt 0)

I got a 'not-there-2.txt' hit also. Thought it was just me m$ was after :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved