homepage Welcome to WebmasterWorld Guest from 54.226.173.169
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Msnbot/0.1
First time I've seen this one.
volatilegx




msg:401629
 1:50 pm on Jun 17, 2003 (gmt 0)

UA "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
IP 131.107.137.47

It left a referring URL. Submitting to LookSmart's paid submit will get you crawled by this bot. Obeys robots.txt. The index is not yet on search.msn.com, but they say they do intend to add it in the future.

 

marcs




msg:401630
 5:28 pm on Jun 17, 2003 (gmt 0)

More info here : [webmasterworld.com ]

volatilegx




msg:401631
 8:13 pm on Jun 17, 2003 (gmt 0)

Thanks marcs... note the new User Agent. I saw 131.107.137.xxx referenced in some of the other threads... has anybody seen any IPs other than the one referenced above?

wilderness




msg:401632
 12:16 pm on Jun 18, 2003 (gmt 0)

Doesn't act any different than it did when they weren't identifying themslves.
Tripped my dime-store trap.

anallawalla




msg:401633
 11:45 am on Jun 19, 2003 (gmt 0)

I had a few visits today but all from a different address:

131.107.163.47
MSNBOT/0.1 (http://search.msn.com/msnbot.htm)

I am not a LS advertiser.

- Ash

carfac




msg:401634
 12:15 am on Jun 20, 2003 (gmt 0)

So far, just one hit:

131.107.163.49 - - [18/Jun/2003:23:53:06 -0600] "GET /robots.txt HTTP/1.1" 200 5282 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"

dave

bunltd




msg:401635
 12:26 am on Jun 20, 2003 (gmt 0)

The earlier version (before it had a UA or name) came to one of our sites several weeks ago, but this latest fully identified as msnbot visited from: 131.107.163.57 and took quite a few of our pages today.

It will be interesting to see what they do with it...

LisaB

jdMorgan




msg:401636
 2:01 am on Jun 20, 2003 (gmt 0)

Looks like they've got a range of IP's going here.

Grabbed robots.txt, then index.html, then robots gain, then went deep - all with the same IP. No robots.txt violations.

131.107.163.58 - - [19/Jun/2003:02:05:37 -0400] "GET /robots.txt HTTP/1.1" 200 2507 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"
131.107.163.58 - - [19/Jun/2003:02:05:38 -0400] "GET / HTTP/1.1" 200 32464 "-" "MSNBOT/0.1 (http://search.msn.com/msnbot.htm)"

Jim

rbs10025




msg:401637
 2:14 am on Jun 20, 2003 (gmt 0)

Seems well behaved vis a vis robots.txt, but they do grab files of type other than HTML. I note that of the 275 requests MsNBOT made to my office webserver today, 40 were for PDF documents and there scattered others for Postscript files and some binary datasets with odd filename extensions. No sign yet, though, they will be grabbing GIFs, JPEGs, etc.

If you don't want MSNBOT grabbing images, PDFs, etc., then you'll need to modify your RewriteRules appropriately. See the discussion in [webmasterworld.com...] about how to do so.

DavidT




msg:401638
 2:32 am on Jun 20, 2003 (gmt 0)

Going deep over here, and is well behaved, seems they are going to spider widely as I'm not in Looksmart or any other paid directory/program either.

mvl22




msg:401639
 3:27 pm on Jun 21, 2003 (gmt 0)

I've decided to ban MSNbot for the moment. It seems to generate a lot of rubbish like requests for

www.site.com/path/to/file/9c2
www.site.com/path/to/file/4a3
www.site.com/path/to/file/0c9

which are all 404s. I've sent them some site logs, but it's happened on various sites and I can't be bothered to be used as a guinea pig for their problems.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved