Forum Moderators: open
Poor Microsoft. They can't even spell compatible or format a ua properly. :)
Looks like I owe you an apology Don. The week after I told you MS has never pulled any sneaky stuff with me they went and did just that!
No robots.txt. It looks like they're going after default pages only. Hit me twice two days apart. Sure wish I could ban them by IP but they share a similar range with MSN's legitimate bots. I can probably ban at the C class. Would this be a dumb move on my part?
Anybody aware of what UA is their main crawler?
I grew weary of the MSN bot varitions and began by denying the newest in the 74 Class.
Then added all the following to my robots a few days later:
ser-agent: MSNPTC
Disallow: /
User-agent: msnbot-MM
Disallow: /
User-agent: msnbot-products
Disallow: /
User-agent: MSRBOT
Disallow: /
User-agent: msnbot-media
Disallow: /
As a result I'm not getting spidered by nothing at MS.
lanshanbot
msnbot-news
msnbot-NewsBlogs
msnbot
msnbot/1.0-MM
EDIT: I found the thread.
[webmasterworld.com...]
[edited by: GaryK at 12:57 am (utc) on Oct. 16, 2006]
65.55.212.184 - - [16/Oct/2006:04:20:53 -0700] "GET /robots.txt HTTP/1.0" 200 4527 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.54.188.148 - - [15/Oct/2006:23:21:56 -0700] "GET /robots.txt HTTP/1.0" 200 4540 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
-----------
left the IP ranges alone.
added the following three lines:
SetEnvIf UserAgent "msnbot-MM" keep_out
SetEnvIf UserAgent "msnbot-products" keep_out
SetEnvIf UserAgent "msnbot-media" keep_out
I'll update on the results.
Has no disctinction between the different bots as advised in:
[webmasterworld.com...]
#:3028975
You might want to use SetEnvIfNoCase -- MS doesn't enforce a corporate User-agent-name standard, apparently, so sometimes the second part of the UA name is capitalized in the request header, and sometimes it isn't.
I have noticed that the msnbots have apparently been "improved" sometime in the past 48 hours and that msnbot/1.0 (search) will now accept records specifying "msnbot/" while "msnbot-media", msnbot-news", or "msnbot-Products" will no longer match that string (because of the trailing slash). In other words, the following now works correctly to allow msnbot, but deny all of the MSN specialty bots:
# Allow msnbot/0.9 and msnbot/1.0 (search bots) except for cgi-bin
User-agent: msnbot/
Disallow: /cgi-bin
#
# Disallow all others
User-agent: *
Disallow: /