msnbot/2.0b

Forum Moderators: open

Message Too Old, No Replies

msnbot/2.0b

New crawler from MS in the works?

GaryK

5:04 pm on Feb 1, 2009 (gmt 0)

msnbot/2.0b ( [search.msn.com...]
131.107.0.95
tide525.microsoft.com

Read robots.txt and then left, so I don't know if it actually obeys it or not.

Is this really a new msnbot in beta?

wilderness

3:37 pm on Feb 13, 2009 (gmt 0)

Consistency at its best ;)

65.55.106.132 - - [13/Feb/2009:12:12:13 +0000] "GET /robots.txt HTTP/1.1" 200 5023 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.132 - - [13/Feb/2009:12:13:14 +0000] "GET /MyFolder/MyPage.html HTTP/1.0" 200 41524 "-" "msnbot/2.0b"

GaryK

9:25 pm on Feb 13, 2009 (gmt 0)

Do they get credit for being consistent in their inconsistency?

wilderness

9:34 pm on Feb 13, 2009 (gmt 0)

Only credit I'm able to extend is "eat 403's"

santapaws

9:00 am on Feb 16, 2009 (gmt 0)

the last two days this bot from range 65.52.0.0 - 65.55.255.255 has been ignoring robots and hitting the spider trap. is this for real? It gets banned on one ip it hits with another. I guess i need to block the whole range? A major company like this throwing out the rule book for ethical spidering?

GaryK

5:49 pm on Feb 16, 2009 (gmt 0)

ethical spidering

That's a term I'm unfamiliar with. ;)

It was back again the last few days. This time it read not only robots.txt but the default root page as well. So for me it's been entirely respectful of robots.txt.

santapaws

9:08 am on Feb 19, 2009 (gmt 0)

still coming in off different ips and hitting the bot trap. is this thing for real?

GaryK

4:21 pm on Feb 19, 2009 (gmt 0)

We've already established that MS says the user agent is legit. From there though you have to take your own steps to ensure it's not being spoofed.

santapaws

5:10 pm on Feb 19, 2009 (gmt 0)

"is this thing for real" is a figure of speech.

SoyDevon

3:20 am on May 25, 2009 (gmt 0)

Hi, this is a very interesting discussion to me as I've gotten quite annoyed with msnbot 2.0 and the MSIE-from-MS-IP's that puts fake referers in my logs. This should've been resolved months ago. It was bad enough the bot never listened to my 301's, but this is ridiculous.

I recently noticed that msnbot 2.0b has actually been hitting some files that my robots.txt tells it to ignore! That kept happening so I just 403'd the bot and later the entire MSIE IP range these keep coming from (due to the fake referer spam from an IE browser at the same IP range). Now look what I find in my logs today (and this is just a sample!):

65.55.106.203 - - [24/May/2009:13:16:09 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:17:20 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:19:28 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:22:31 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:26:34 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.51.112 - - [24/May/2009:13:31:27 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:31:59 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:13:38:15 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.231 - - [24/May/2009:13:45:27 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.231 - - [24/May/2009:13:53:48 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:14:03:11 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.51.115 - - [24/May/2009:14:03:22 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.104.16 - - [24/May/2009:14:06:51 -0600] "GET /robots.txt HTTP/1.1" 403 787 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.104.16 - - [24/May/2009:14:06:52 -0600] "GET /poetry/ HTTP/1.1" 403 784 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.106.231 - - [24/May/2009:14:13:21 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.203 - - [24/May/2009:14:25:05 -0600] "GET /robots.txt HTTP/1.1" 403 825 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.231 - - [24/May/2009:14:26:26 -0600] "GET /about/ HTTP/1.1" 403 821 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

I block bad Java bots, and they don't even do this to me.

caribguy

4:27 am on May 25, 2009 (gmt 0)

SoyDevon, be glad Spinn3r hasn't found you yet. 403's are like crystal meth to them.

Mokita

6:20 am on Jun 2, 2009 (gmt 0)

I've just been researching this beta bot due to its bad or strange behaviour and stumbled across a message from mid-April in the Microsoft Webmaster Forums with a title of "msnbot is using MY robots.txt to crawl YOUR site".

(I'm not sure that quoting from other forums is allowed at WebmasterWorld, so refrained from doing so.)

A mod replied on April 15 saying the issue should be fixed shortly. But I was still seeing requests for disallowed directories up right up till 17 May.

The original message makes scary reading, and you have to wonder how M$ unleashed such a badly flawed product on unsuspecting webmasters in Feb and then took over a month to fix it after they were notified in April.

The weird behaviour I am still seeing is that the bot will request a page successfully, then about one minute later request it again with no change in any of the details shown in logs (user agent, IP etc), but receive a redirect as a result of not providing an "Accept-Encoding" header.

So if they know that not sending that header is causing problems, why don't they fix it instead of requesting every page twice?

[edited by: Mokita at 6:27 am (utc) on June 2, 2009]

This 41 message thread spans 2 pages: 41

msnbot/2.0b

New crawler from MS in the works?

GaryK

wilderness

GaryK

wilderness

santapaws

GaryK

santapaws

GaryK

santapaws

SoyDevon

caribguy

Mokita

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week