More info here : [webmasterworld.com ]
Thanks marcs... note the new User Agent. I saw 131.107.137.xxx referenced in some of the other threads... has anybody seen any IPs other than the one referenced above?
Doesn't act any different than it did when they weren't identifying themslves.
Tripped my dime-store trap.
I had a few visits today but all from a different address:
I am not a LS advertiser.
So far, just one hit:
184.108.40.206 - - [18/Jun/2003:23:53:06 -0600] "GET /robots.txt HTTP/1.1" 200 5282 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
The earlier version (before it had a UA or name) came to one of our sites several weeks ago, but this latest fully identified as msnbot visited from: 220.127.116.11 and took quite a few of our pages today.
It will be interesting to see what they do with it...
Looks like they've got a range of IP's going here.
Grabbed robots.txt, then index.html, then robots gain, then went deep - all with the same IP. No robots.txt violations.
18.104.22.168 - - [19/Jun/2003:02:05:37 -0400] "GET /robots.txt HTTP/1.1" 200 2507 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
22.214.171.124 - - [19/Jun/2003:02:05:38 -0400] "GET / HTTP/1.1" 200 32464 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
Seems well behaved vis a vis robots.txt, but they do grab files of type other than HTML. I note that of the 275 requests MsNBOT made to my office webserver today, 40 were for PDF documents and there scattered others for Postscript files and some binary datasets with odd filename extensions. No sign yet, though, they will be grabbing GIFs, JPEGs, etc.
If you don't want MSNBOT grabbing images, PDFs, etc., then you'll need to modify your RewriteRules appropriately. See the discussion in [webmasterworld.com...] about how to do so.
Going deep over here, and is well behaved, seems they are going to spider widely as I'm not in Looksmart or any other paid directory/program either.
I've decided to ban MSNbot for the moment. It seems to generate a lot of rubbish like requests for
which are all 404s. I've sent them some site logs, but it's happened on various sites and I can't be bothered to be used as a guinea pig for their problems.