188.8.131.52 is where my visitor is coming from overnight.
Me thinks it is an upgraded 0.2, which was running around for awhile. I did read in another thread bull that apparently MSN is ready to release...or sumptin like dat.
I today got some unfriendly visits by a MS bot:
184.108.40.206 - - [19/Sep/2004:10:24:14 +0200] "GET /dir-where-all-deepfiles-are-linked/ HTTP/1.1" 200 1900 - "-" "HTTP: User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3" "-"
This one feeded the following:
220.127.116.11 - - [19/Sep/2004:10:30:05 +0200] "GET /deepfile.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
18.104.22.168 - - [19/Sep/2004:10:30:12 +0200] "GET /deepfile2.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
22.214.171.124 - - [19/Sep/2004:10:30:18 +0200] "GET /deepfile3.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
126.96.36.199 - - [19/Sep/2004:10:30:26 +0200] "GET /deepfile4.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
and so on. It definitely is a bot, as it does not seem to be impressed by a bunch of 403s. There is a MS guy in this forum, dare to explain please?
Of course, no robots.txt was fetched, and the bot trap was hit.
msnbot/0.3 is the latest version of the MSN Search crawler. This crawler is gathering pages to power an algorithmic search engine that we are building.
Bull, the requests that you show in your post do not appear to come from MSNBot. If you could E-mail us at firstname.lastname@example.org with more information such as the domain on which this issue occured that would be quite helpful.
Can you give us any idea when the new search engine will be completed and go live?
I do not think that domain details would be helpful.
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 ) did exactly 500 requests on the domain, most of them returning 403 errors.
I know in fact that the official MSN spider is msnbot, so I hoped to get any information about this bot. As you could not give any clarification on this issue, from now on any User-agent from Microsoft ranges apart from msnbot is blocked.
> HTTP: User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3
I've been seeing that one for months coming from either China or Korea... I don't remember which.
msndude, it might be a good idea to have the IP addresses appearing above checked to be sure that they are not configured as open proxies and/or have not been infected by a trojan. They are listed in ARIN as belonging to MS, or as being used by MS, so unless you've got a rogue employee, these are probably either open proxies or zombie machines.
RobinK. Thanks for the interest. Unfortunately, we cannot provide a specific timeline for the release of our search engine beyond what we have stated publicly. We have stated publicly that we will release our own algorithmic search engine within one year of our Technology Preview which launched July 1st, 2004.
Jim, we are aware that these are MS owned IP addresses. We are investigating this issue.
Why doesn't your crawler obey the robots.txt file on my site?
But msnbot clearly is spidering and indexing pages from my cgi-bin which are not even pages. It is spidering links that run through a click tracking script and redirect to another site.
188.8.131.52 www.mysite.com - [13/Oct/2004:03:53:38 -0400] "GET /cgi-bin/awredir.pl?tag=CLK&url=http://someothersite.edu/someotherpage.html HTTP/1.0" 302 316 "-" "msnbot/0.3 (+http://search.msn.com/msnbot.htm)"
I have tried everything?
The best thing to do is to E-mail us either at email@example.com or using a "sticky message". Please include the details of the domain on which the issue occured as this will help us investigate. We are committed to fixing these problems.