Forum Moderators: open

Message Too Old, No Replies

Msn?

         

wilderness

3:32 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



66.240.231.89 - - [21/Mar/2006:22:35:01 -0800] "GET /myfolder/mypage.html HTTP/1.1" 200 39596 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

grabbed many pages in 3-second intervals.
All without robots.txt or image,

I care less if it's a valid MSN bot or not.
MSN has far too many bots spidering already.
This one is not an accepted method of crawling and is indifferent to MSN's standard compliance.

Pfui

8:35 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think it's a fake msnbot because I've never seen legit versions hailing from other than Microsoft domains, and the IP you provided tracks back to "California Regional Intranet, Inc." (Cari.net) -- an apparent server farm.

FWIW, after I saw my first fake msnbot, I restricted the UA to MS-only hosts:

RewriteCond %{HTTP_USER_AGENT} ^msnbot 
RewriteCond %{REMOTE_HOST}!^[^.]+\.search\.msn\.com$
RewriteCond %{REMOTE_HOST}!^[^.]+\.msn\.com$
RewriteCond %{REMOTE_HOST}!^[^.]+\.phx\.gbl$
# hotmail: 64.4.0.0 - 64.4.63.255
RewriteCond %{REMOTE_ADDR} ^64\.4\.$
RewriteRule ^.*$ - [F]

Notes:

The preceding code works for me but my code can always be streamlined and/or more properly/efficiently written:)

If copy-pasting, be sure there's a space before each exclamation mark -- this forum's programming strips them out.