homepage Welcome to WebmasterWorld Guest from 54.237.98.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MSN bot Crawlers Renamed
jake66

5+ Year Member



 
Msg#: 3029007 posted 4:31 am on Jul 31, 2006 (gmt 0)

did a search but found 0 results for this

the ip resolves back to msn (spoofed?) or is this a legit bot, and what's it for?

msn working on a froogle type database?

Name: msnbot-products/1.0 (+http://search.msn.com/msnbot.htm)
IP Address: 207.68.154.139
User Agent: msnbot-products/1.0 (+http://search.msn.com/msnbot.htm)

 

MaxM

10+ Year Member



 
Msg#: 3029007 posted 11:38 am on Jul 31, 2006 (gmt 0)

I just had "msnbot-NewsBlogs/1.0 (+http://search.msn.com/msnbot.htm)" from 207.68.146.79 checking out robots.txt

Then a few minutes later it came from 65.55.233.155 to get the rss feed.

Looks like a bunch of new bots have been unleashed by MS.

msndude

10+ Year Member



 
Msg#: 3029007 posted 5:17 pm on Jul 31, 2006 (gmt 0)

These are all Microsoft bots that have been around for a while. Up until recently they were all called "msnbot," but that was getting confusing, so we asked the other groups to append something to the name so people could tell them apart. Here's a few:

The MSN Shopping bot is msnbot-products.
The MSN News bot is msnbot-news.
The MSN Image Search bot is msnbot-MM.
The MSN Search bot is still just plain msnbot.

By the way, this change was partly precipitated by people here at Webmaster World complaining that we were crawling them a lot but never indexing them; it always turned out that it wasn't MSN Search doing the crawling -- it was some other team at MSN. Now it should be much easier for people to see what's really going on -- and to block or restrict other bots (without blocking MSN Search) if they have to.

volatilegx

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3029007 posted 5:35 pm on Jul 31, 2006 (gmt 0)

Thanks for the info, msndude :)

Also, thanks for asking the other groups to be more specific with their agent names.

philaweb

5+ Year Member



 
Msg#: 3029007 posted 6:03 pm on Jul 31, 2006 (gmt 0)

How about this:

msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)

bobothecat



 
Msg#: 3029007 posted 6:06 pm on Jul 31, 2006 (gmt 0)

By the way, this change was partly precipitated by people here at Webmaster World complaining that we were crawling them a lot but never indexing them; it always turned out that it wasn't MSN Search doing the crawling -- it was some other team at MSN. Now it should be much easier for people to see what's really going on -- and to block or restrict other bots (without blocking MSN Search) if they have to.

A move in the right direction... thanks for the response MSNDude.

ionchannels

5+ Year Member



 
Msg#: 3029007 posted 6:18 pm on Jul 31, 2006 (gmt 0)

How will this influence robots.txt. Will my msnbot disallow \ still work or will I have msnbot-#*$!xx crashing my server left right and center again?

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3029007 posted 7:39 pm on Jul 31, 2006 (gmt 0)

I certainly wish search bot/product bot would visit us more often. We only have about 200 pages in msn, but others show 20k+

msndude

10+ Year Member



 
Msg#: 3029007 posted 8:45 pm on Jul 31, 2006 (gmt 0)

Correction: The MSN Image Search bot is msnbot-media.

Sorry about that! :-)

eyecaredr

5+ Year Member



 
Msg#: 3029007 posted 9:22 pm on Jul 31, 2006 (gmt 0)

I too received a visit from this mysterous spider. Since its visit, my MSN referrals have increased significantly. Maybe this is a good thing.

System
redhat


 
Msg#: 3029007 posted 9:30 pm on Jul 31, 2006 (gmt 0)

The following message was cut out to new thread by volatilegx. New thread at: search_engine_spiders/3031574.htm [webmasterworld.com]
9:11 am on Aug. 2, 2006 (CDT -6)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3029007 posted 11:36 pm on Jul 31, 2006 (gmt 0)

Thanks for the update, msndude.

Will the MSNBot info page for Webmasters [search.msn.com] be updated to reflect these newly-announced 'bots? I'm currently looking for answers to the following questions:

On sites with no non-proprietary multimedia files, and with no news or shopping content, would the following construct allow or deny msnbot-media, msnbot-news, etc.?

# Allow unrestricted access for msnbot
User-agent: msnbot
Disallow:

# Disallow all others not 'allowed' above
User-agent: *
Disallow: /




If the above would allow the various media/news/shopping bots, would the following work any better?

# Disallow all MSN specialty robots
User-agent: msnbot-
Disallow: /

# Allow untrestricted access for msnbot search robot
User-agent: msnbot
Disallow:

# Disallow all others not 'allowed' above
User-agent: *
Disallow: /


Thanks in advance for a reply, or for a pointer to an authoritative document that will answer this question.

Jim

[edited by: jdMorgan at 11:38 pm (utc) on July 31, 2006]

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3029007 posted 8:43 am on Aug 1, 2006 (gmt 0)

That's why I allows anything that contains msnbot to crawl as long as the IP belongs to Microsoft just to avoid worrying about anything new that comes out.

I can swat it at leisure later ;)

wmuser

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3029007 posted 8:46 pm on Aug 1, 2006 (gmt 0)

I got this waproxyb10.msn.com

msndude

10+ Year Member



 
Msg#: 3029007 posted 4:01 pm on Aug 2, 2006 (gmt 0)

jd: The site isn't updated with the “official rules” yet, but the net of it is the following:

· MSNBot obeys robots.txt for MSNBot

· MSNBot-NAME obeys robots.txt for MSNBot *and* MSNBot-NAME.

This allows site owners to do no extra work for our additional crawlers and also gives them the flexibility to limit for specific crawlers.

Hope that helps.

malachite

5+ Year Member



 
Msg#: 3029007 posted 10:42 pm on Aug 2, 2006 (gmt 0)

How about this:

msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)

I blocked this one a while back because it didn't look kosher, and ended up with the whole site de-indexed and MSN search bot stopped spidering the site.

It also took me a while to realise what was happening, to un-block it and get the site re-indexed. Doh!

AlexK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3029007 posted 5:16 pm on Aug 4, 2006 (gmt 0)

Any chance that MS can get these bots to start accepting compressed pages?

Both Google & Slurp! have accepted compressed pages for years now (although with G it was the "Mozilla/5" bot, which is now the standard bot). msnbot never has, and consequently consumes more bandwidth on my site than G & Y together, although both of the former take far more pages each than msnbot.

The inability to accept compressed pages really does give the impression of an old, out-dated technology being employed at MS. Time to join the 21st Century, no?

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3029007 posted 4:36 pm on Aug 5, 2006 (gmt 0)

Compressed pages are nice for you static web page sites but I could care less about compressed pages.

Why?

With a dynamic website it's bandwidth vs. CPU time and compressing the page on my site increases the time to deliver the page and chews up more CPU cycles meaning I can deliver fewer pages in the same amount of time.

I won't be sending compressed pages anytime soon, guess I'm using out-dated technology too ;)

AlexK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3029007 posted 11:03 pm on Aug 7, 2006 (gmt 0)

incrediBILL:
With a dynamic website it's bandwidth vs. CPU time

Dynamic, load-balanced compression. My early testing was < 0.002 secs on a twin Xeon 2.4 GHz, Linux 2.6. The routine is encapsulated within the Conteg Content-Negotiation Class [webmasterworld.com] (v0.11 :- v0.12.1 is available via my site; includes cache-control settings).

Compressed pages are nice for you static web page sites

My site is fully dynamic (PHP).

guess I'm using out-dated technology too ;)

Er, yes!

CPU is cheap now. Times have changed.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3029007 posted 5:58 am on Aug 8, 2006 (gmt 0)

OK, you didn't get my point Alex....

If you just let the dynamic page be delivered as it's being created the overall process is faster as waiting to generate the whole page, then zip it and ship it, means the overall page time processing the page is longer as it doesn't start transmitting until the entire process is completed.

Not sure how you can get around that fact and my server is just too busy to risk it.

[edited by: incrediBILL at 5:58 am (utc) on Aug. 8, 2006]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved