Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: mack
126.96.36.199 - - [19/Sep/2004:10:24:14 +0200] "GET /dir-where-all-deepfiles-are-linked/ HTTP/1.1" 200 1900 - "-" "HTTP: User-Agent = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3" "-"
This one feeded the following:
188.8.131.52 - - [19/Sep/2004:10:30:05 +0200] "GET /deepfile.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
184.108.40.206 - - [19/Sep/2004:10:30:12 +0200] "GET /deepfile2.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
220.127.116.11 - - [19/Sep/2004:10:30:18 +0200] "GET /deepfile3.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
18.104.22.168 - - [19/Sep/2004:10:30:26 +0200] "GET /deepfile4.htm HTTP/1.0" 403 1286 - "-" "Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )" "-"
and so on. It definitely is a bot, as it does not seem to be impressed by a bunch of 403s. There is a MS guy in this forum, dare to explain please?
Of course, no robots.txt was fetched, and the bot trap was hit.
Bull, the requests that you show in your post do not appear to come from MSNBot. If you could E-mail us at firstname.lastname@example.org with more information such as the domain on which this issue occured that would be quite helpful.
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 ) did exactly 500 requests on the domain, most of them returning 403 errors.
I know in fact that the official MSN spider is msnbot, so I hoped to get any information about this bot. As you could not give any clarification on this issue, from now on any User-agent from Microsoft ranges apart from msnbot is blocked.
I've been seeing that one for months coming from either China or Korea... I don't remember which.
msndude, it might be a good idea to have the IP addresses appearing above checked to be sure that they are not configured as open proxies and/or have not been infected by a trojan. They are listed in ARIN as belonging to MS, or as being used by MS, so unless you've got a rogue employee, these are probably either open proxies or zombie machines.
Jim, we are aware that these are MS owned IP addresses. We are investigating this issue.
Why doesn't your crawler obey the robots.txt file on my site?
But msnbot clearly is spidering and indexing pages from my cgi-bin which are not even pages. It is spidering links that run through a click tracking script and redirect to another site.
22.214.171.124 www.mysite.com - [13/Oct/2004:03:53:38 -0400] "GET /cgi-bin/awredir.pl?tag=CLK&url=http://someothersite.edu/someotherpage.html HTTP/1.0" 302 316 "-" "msnbot/0.3 (+http://search.msn.com/msnbot.htm)"
I have tried everything?