Forum Moderators: open
131.107.151.*** - - [09/Apr/2009:03:35:24 -0700] "GET www.example.com/page.html HTTP/1.0" 200 8697 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
Took 100 pages in about 10 minutes, no request for robots.txt
The topic has been picked apart with possibilities and even with relayed insights from MSN.
In any event, I've yet to see anything useful or constructive from MSN, which has changed my 2003 decision to deny the Class B.
Don
Perhaps it's a test of sorts?
In any event their "page bookmark" requests generate 404's.
MSN in their requests is using %23 rather than #, although they both may be the same character. (I've no clue).
These requests all had the a normal browser:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
The exact IP was 131.107.151.157
I did look after posting, however not hex.
I found Alt 33 to equal #. (I use Alt 0151, 0188, 0189 and 0190 frequently in pages and sometimes Alt 0176).
Hex may indeed be (%23), however it generates a 404 for something that actually exists. No clue why.
Perhaps I should implement Alt 33 just to appease a Class B range that is NEVER going to get into my sites ;)
After all, as webmasters we perform so many otherwise unnecessary acts, that another would only lose more time and deviation from constructive efforts ;)
After all this is MSN and they could retaliate and put the whammy on my OS or stop the constant flow of updates.
Worst scenario is that MS would add ten more more NET updates to their browsers UA's!
65.55.106.nnn
Non-authoritative answer:
166.106.55.65.in-addr.arpa name = msnbot-65-55-106-166.search.msn.com.
I presume it is genuine msbot but why is it looking for non existent pages?
65.55.106.241 - - [05/Apr/2009:09:21:46 +0100] "GET /news/viewpr.html?pid=27394 HTTP/1.0" 404 15839 "-" "msnbot/2.0b" 0 example.com "-" "-"
65.55.106.169 - - [11/Apr/2009:01:21:17 +0100] "GET /body.cfm?id=105&action=list&limit_startdate=07%2F20%2F2008&dtGraphicCalendar=07%2F12%2F2008 HTTP/1.0" 404 13331 "-" "msnbot/2.0b" 0 example.com "-" "-"
There are also 404's for
/clipframe.htm
Usually stuff like that is someone looking for targets for exploits. Why would an MS IP be looking for these pages?
[edited by: Frank_Rizzo at 9:22 am (utc) on April 11, 2009]
65.55.106.236 - - [11/Apr/2009:11:43:09 +0100] "GET /delivery/ck.php?n=ad9a19c0&cb=73604 HTTP/1.0" 404 12788 "-" "msnbot/2.0b"
65.55.106.192 - - [11/Apr/2009:12:28:01 +0100] "GET /calendar/index.php?y=2008&d=26&m=5&v=d HTTP/1.0" 404 13331 "-" "msnbot/2.0b"
65.55.106.204 - - [11/Apr/2009:15:09:52 +0100] "GET /Event-Categories-Florida-Tamarac.asp HTTP/1.0" 404 13331 "-" "msnbot/2.0b"
65.55.106.110 - - [11/Apr/2009:15:13:53 +0100] "GET /topic1784824_49910.html HTTP/1.0" 404 12775 "-" "msnbot/2.0b"
65.55.106.165 - - [11/Apr/2009:15:27:53 +0100] "GET /node/116?q= HTTP/1.0" 404 13331 "-" "msnbot/2.0b"
What the heck is going on here? Spoofed IPs? Badly configured msnbot?
[edited by: Frank_Rizzo at 2:34 pm (utc) on April 11, 2009]
And even worse, they were asking for those pages using an incorrect and totally-unrelated domain/hostname. Since this was on an IP-based server, this means they were spoofing the hostname in the HTTP Host: request header, while sending the request to my server's IP address (This would not be possible on a name-based virtual server, which would require the Host header to be correct).
MSN/Live has and has had so many basic problems that I can't tell if this is an error, an intentional "error-handling" scan, or some third party using an MSN server to scrape or probe sites for vulnerabilities. The way msnbot (or perhaps a third party pretending to be msnbot) behaves normally is so fouled-up that you just can't tell.
For example, reviewing some of the regular expressions I use to classify msnbots, I was reminded of one recent version that sends "If-Modified-Since" plus a date in the HTTP From: request header. The If-Modified-Since line is perfectly-valid, but the From: Header is supposed to contain an e-mail address... Bug? Spoofer? I dunno.
Anyway, my main point is that I'm seeing these bogus requests as well.
Jim