Anyone know anything about this user-agent? I've gotten a dozen or so hits in two distinct visits for the same page on one of my sites from it in the last 24 hours, all from a .gov address, concurrent with MSIE 6.0 / Windows NT 5.1 visits. Checked for robots.txt, which is empty, so I can't comment on whether it *obeys* it, but...
sahp4058.sandia.gov - - [13/Dec/2004:20:07:14 +0000] "GET /robots.txt HTTP/1.0" 200 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; compatible; StanleyWebSpider 1.0) StanleyWebSpider/1.0"
sahp4058.sandia.gov - - [13/Dec/2004:20:07:16 +0000] "GET /osama.html HTTP/1.0" 200 7931 "-" "Mozilla/4.0 (compatible; MSIE 6.0; compatible; StanleyWebSpider 1.0) StanleyWebSpider/1.0"
sahp4058.sandia.gov - - [13/Dec/2004:20:12:04 +0000] "GET /osama.html HTTP/1.0" 200 7931 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
sahp4058.sandia.gov - - [13/Dec/2004:20:18:12 +0000] "GET /index.html HTTP/1.0" 200 5423 "-" "Mozilla/4.0 (compatible; MSIE 6.0; compatible; StanleyWebSpider 1.0) StanleyWebSpider/1.0"
Makes me wonder what they're up to over there at Sandia. :)
Well..if the file name is really "osama" then I can guess what they're up to.
From their site:
"Helping Our Nation Secure a Peaceful and Free World Through Technology"
Well, yes, the page really is named osama.html, and it really is, you know, about Bin Laden. I don't know if I'm ready to believe that a web spider (that requests robots.txt!) really is part of our national intelligence programme. The name really intrigues me, as I can't find any reference anywhere to a "Stanley" spider, and if the intelligence community were going to be going a-spidering, I'd expect them to use a commercial product, or spoof Googlebot, or... something a little less attention-getting than a mystery spider.
Though a "unique" user-agent is a first, the pattern of requests does remind me of the behaviour of some cacheing proxies I've seen in the .nipr.mil range, where (surmising) someone behind the proxy requests the page from a site, and the proxy makes repeated requests for the page, presumably in an attempt to see if the content is dynamic or static.
That's just an observation, though. It'll be interesting to see if anyone else observes this "spider" in the wild, and if there's any discernable similarities between pages it's "spidering". :)