Welcome to WebmasterWorld Guest from 50.19.190.144

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Msnbot/0.1

First time I've seen this one.

     
1:50 pm on Jun 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 22, 2001
posts:2450
votes: 0


UA "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
IP 131.107.137.47

It left a referring URL. Submitting to LookSmart's paid submit will get you crawled by this bot. Obeys robots.txt. The index is not yet on search.msn.com, but they say they do intend to add it in the future.

5:28 pm on June 17, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 11, 2003
posts:442
votes: 0


More info here : [webmasterworld.com ]
8:13 pm on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 22, 2001
posts:2450
votes: 0


Thanks marcs... note the new User Agent. I saw 131.107.137.xxx referenced in some of the other threads... has anybody seen any IPs other than the one referenced above?
12:16 pm on June 18, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Doesn't act any different than it did when they weren't identifying themslves.
Tripped my dime-store trap.
11:45 am on June 19, 2003 (gmt 0)

Moderator from AU 

WebmasterWorld Administrator anallawalla is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 3, 2003
posts:3701
votes: 3


I had a few visits today but all from a different address:

131.107.163.47
MSNBOT/0.1 (http://search.msn.com/msnbot.htm)

I am not a LS advertiser.

- Ash

12:15 am on June 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 1, 2002
posts:774
votes: 0


So far, just one hit:

131.107.163.49 - - [18/Jun/2003:23:53:06 -0600] "GET /robots.txt HTTP/1.1" 200 5282 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"

dave

12:26 am on June 20, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 7, 2003
posts:540
votes: 0


The earlier version (before it had a UA or name) came to one of our sites several weeks ago, but this latest fully identified as msnbot visited from: 131.107.163.57 and took quite a few of our pages today.

It will be interesting to see what they do with it...

LisaB

2:01 am on June 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Looks like they've got a range of IP's going here.

Grabbed robots.txt, then index.html, then robots gain, then went deep - all with the same IP. No robots.txt violations.

131.107.163.58 - - [19/Jun/2003:02:05:37 -0400] "GET /robots.txt HTTP/1.1" 200 2507 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
131.107.163.58 - - [19/Jun/2003:02:05:38 -0400] "GET / HTTP/1.1" 200 32464 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"

Jim

2:14 am on June 20, 2003 (gmt 0)

New User

10+ Year Member

joined:May 31, 2003
posts:16
votes: 0


Seems well behaved vis a vis robots.txt, but they do grab files of type other than HTML. I note that of the 275 requests MsNBOT made to my office webserver today, 40 were for PDF documents and there scattered others for Postscript files and some binary datasets with odd filename extensions. No sign yet, though, they will be grabbing GIFs, JPEGs, etc.

If you don't want MSNBOT grabbing images, PDFs, etc., then you'll need to modify your RewriteRules appropriately. See the discussion in [webmasterworld.com...] about how to do so.

2:32 am on June 20, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Dec 13, 2002
posts:275
votes: 0


Going deep over here, and is well behaved, seems they are going to spider widely as I'm not in Looksmart or any other paid directory/program either.
3:27 pm on June 21, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 7, 2002
posts:47
votes: 0


I've decided to ban MSNbot for the moment. It seems to generate a lot of rubbish like requests for

www.site.com/path/to/file/9c2
www.site.com/path/to/file/4a3
www.site.com/path/to/file/0c9

which are all 404s. I've sent them some site logs, but it's happened on various sites and I can't be bothered to be used as a guinea pig for their problems.