Forum Moderators: DixonJones
204.4.XX.X - - [20/Dec/2003:00:55:52 +0000] "GET /robots.txt HTTP/1.1" 200 5276 "-" "MSRBOT/0.1 (http://research.microsoft.com/research/sv/msrbot/)"
204.4.XX.X - - [20/Dec/2003:00:55:53 +0000] "GET /somepage.shtml HTTP/1.1" 301 345 "-" "MSRBOT/0.1 (http://research.microsoft.com/research/sv/msrbot/)"
204.4.XX.X - - [20/Dec/2003:01:24:21 +0000] "GET /some-page.shtml HTTP/1.1" 200 17734 "http://www.mydomain.com:80/somepage.shtml" "MSRBOT/0.1 (http://research.microsoft.com/research/sv/msrbot/)"
Obviously grabbed robots.txt, and grabbing the page that it did doesn't violate my robots.txt - but it's still early in the game. ;)
Interesting that it added a referrer when it grabbed the correct page, I don't see very many bots do that. Also interesting is the addition of a port to the URL...
Then there was this somewhat related discussion [webmasterworld.com].
Pendanticist.
I lay blame on relying on the site search and not Google...
<added>
From the thread you mentioned, quoting you...
> this new one only crawls one very remote, very obsure file
Funny... I don't know if you would call it "remote & obscure," since it is accessible from every page, but they hit my privacy page - the least visited page on my site, unfortunately.
</added>