homepage Welcome to WebmasterWorld Guest from 54.205.241.107
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
new bot?
msnbot/0.3 (+http://search.msn.com/msnbot.htm)
bull

10+ Year Member



 
Msg#: 186 posted 4:45 pm on Sep 18, 2004 (gmt 0)

207.46.98.121 - - [18/Sep/2004:04:22:21 +0200] "GET /robots.txt HTTP/1.0" 200 2918 www.-.net "-" "msnbot/0.3 (+http://search.msn.com/msnbot.htm)" "-"

Is this new?

 

pendanticist

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 186 posted 4:53 pm on Sep 18, 2004 (gmt 0)

65.54.188.50 is where my visitor is coming from overnight.

Me thinks it is an upgraded 0.2, which was running around for awhile. I did read in another thread bull that apparently MSN is ready to release...or sumptin like dat.

bull

10+ Year Member



 
Msg#: 186 posted 8:36 am on Sep 19, 2004 (gmt 0)

I today got some unfriendly visits by a MS bot:

207.46.141.155 - - [19/Sep/2004:10:24:14 +0200] "GET /dir-where-all-deepfiles-are-linked/ HTTP/1.1" 200 1900 - "-" "HTTP: User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3" "-"
Very professional!
...

This one feeded the following:

207.155.199.163 - - [19/Sep/2004:10:30:05 +0200] "GET /deepfile.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
65.164.129.91 - - [19/Sep/2004:10:30:12 +0200] "GET /deepfile2.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
207.155.199.163 - - [19/Sep/2004:10:30:18 +0200] "GET /deepfile3.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
65.164.129.91 - - [19/Sep/2004:10:30:26 +0200] "GET /deepfile4.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"

and so on. It definitely is a bot, as it does not seem to be impressed by a bunch of 403s. There is a MS guy in this forum, dare to explain please?
Of course, no robots.txt was fetched, and the bot trap was hit.

msndude

10+ Year Member



 
Msg#: 186 posted 4:50 pm on Sep 19, 2004 (gmt 0)

msnbot/0.3 is the latest version of the MSN Search crawler. This crawler is gathering pages to power an algorithmic search engine that we are building.

Bull, the requests that you show in your post do not appear to come from MSNBot. If you could E-mail us at msnbot@microsoft.com with more information such as the domain on which this issue occured that would be quite helpful.

thanks,
-msndude

RobinK

10+ Year Member



 
Msg#: 186 posted 5:39 pm on Sep 19, 2004 (gmt 0)

msndude,

Can you give us any idea when the new search engine will be completed and go live?

bull

10+ Year Member



 
Msg#: 186 posted 5:40 pm on Sep 19, 2004 (gmt 0)

Thank you.
I do not think that domain details would be helpful.

Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 ) did exactly 500 requests on the domain, most of them returning 403 errors.
I know in fact that the official MSN spider is msnbot, so I hoped to get any information about this bot. As you could not give any clarification on this issue, from now on any User-agent from Microsoft ranges apart from msnbot is blocked.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 186 posted 7:00 pm on Sep 19, 2004 (gmt 0)

> HTTP: User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3

I've been seeing that one for months coming from either China or Korea... I don't remember which.

msndude, it might be a good idea to have the IP addresses appearing above checked to be sure that they are not configured as open proxies and/or have not been infected by a trojan. They are listed in ARIN as belonging to MS, or as being used by MS, so unless you've got a rogue employee, these are probably either open proxies or zombie machines.

Jim

msndude

10+ Year Member



 
Msg#: 186 posted 10:49 pm on Sep 19, 2004 (gmt 0)

RobinK. Thanks for the interest. Unfortunately, we cannot provide a specific timeline for the release of our search engine beyond what we have stated publicly. We have stated publicly that we will release our own algorithmic search engine within one year of our Technology Preview which launched July 1st, 2004.

Jim, we are aware that these are MS owned IP addresses. We are investigating this issue.

-msndude

eyezshine

10+ Year Member



 
Msg#: 186 posted 8:36 am on Oct 13, 2004 (gmt 0)

msndude,

Why doesn't your crawler obey the robots.txt file on my site?

I have...

User-Agent: msnbot
Disallow: /cgi-bin
Crawl-Delay: 30

But msnbot clearly is spidering and indexing pages from my cgi-bin which are not even pages. It is spidering links that run through a click tracking script and redirect to another site.

Like this...

207.46.98.78 www.mysite.com - [13/Oct/2004:03:53:38 -0400] "GET /cgi-bin/awredir.pl?tag=CLK&url=http://someothersite.edu/someotherpage.html HTTP/1.0" 302 316 "-" "msnbot/0.3 (+http://search.msn.com/msnbot.htm)"

I have tried everything?

msndude

10+ Year Member



 
Msg#: 186 posted 3:56 pm on Oct 13, 2004 (gmt 0)

Eyezshine,
The best thing to do is to E-mail us either at msnbot@microsoft.com or using a "sticky message". Please include the details of the domain on which the issue occured as this will help us investigate. We are committed to fixing these problems.

thanks,
-msndude (msd)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved