Forum Moderators: open
131.107.x.xx - - [03/Jul/2004:18:57:43 -0400] "GET / HTTP/1.0" 403 862 "-" "MSNPTC/1.0"
I hope it's not an important robot, because unknown 'bots are blocked from this particular site to avoid abuse, and they just get a 403.
Robot authors/users: Please provide contact info, a link, and/or a meaningful user-agent name (MSN proxy tester/checker?), you know, like Google does... Thanks.
Jim
207.46.238.143 - - [30/May/2004:16:26:30 -0400] "GET / HTTP/1.0" 200 20684 "-" "MSNPTC/1.0"
*added* Never did check for robots.txt
(there are two very extensive threads in the archives when MS first began crawling. One I recall being in the 15-20 page range.)
I have some directories and pages on my largest site that have been in existence some five and half years. These folders and page names have mixed case names from when I began and didn't know other wise.
Over time I've been able to utilize the wrong case spidering of some of these directories and pages in either identifying unknown, malicious or even badly programmed bots.
These malicious bots will freqently only visit a solitary page without reading robots and in most instances will have a void in either the referal or ua.
Upon detailing the visit for my own records it is either referred to as a "snoop or probe."
I suppose it's entirely posssible MSN has begun with a badly programmed bot? (and after all their research and perhaps two years worth of Mr. Gates money?)
I'm more inlcined to believe that this bot however is a fake.
Add to this that MSN is filling my logs daily and simultaneoulsy from a variety of bots and IP's with little chance of daybreak or benefit and I'm not a happy camper.
I've only had two visits from this UA. One in July and another in August. Two of the three pages crawled were 404's as a result of case errors. Robots.txt was not read nor were there any referrals.
July
207.46.238.143 - - [18/Jul/2004:21:58:52 -0700] "GET /folder/mypage.html
HTTP/1.0" 404 - "-" "MSNPTC/1.0"
131.107.3.84 - - [18/Jul/2004:21:58:52 -0700] "GET /OthwerFolder/anotherPage.html
HTTP/1.0" 200 31872 "-" "MSNPTC/1.0"
August
207.46.238.142 - - [12/Aug/2004:19:10:21 -0700] "GET /folder/differentPage.htm
HTTP/1.0" 404 - "-" "MSNPTC/1.0"
If it's a fake, then MSN must have an open proxy -- that 207.46/16 IP range resolves to Microsoft.
I suppose it's possible, though. I had a badly-behaved 'bot show up from a leading anti-virus company one time. I emailed them, and their network admin replied that he shut it down because it was an unauthorized employee project -- I was surprised and grateful for the forthright reply.
Jim