I recently became suspicious that msnbot was ignoring robots.txt and spoofing a human browser ID on some of my sites. The following code was installed in Apache at a test:
# Deny msnbot IP block access to any file
# when it's not being honest in its browser ID field
RewriteCond %{REMOTE_ADDR} ^207\.46\.12\. [OR]
RewriteCond %{REMOTE_ADDR} ^207\.46\.195\. [OR]
RewriteCond %{REMOTE_ADDR} ^207\.46\.199\. [OR]
RewriteCond %{REMOTE_ADDR} ^207\.46\.204\.
RewriteCond %{HTTP_USER_AGENT} !msnbot [NC]
RewriteRule .* /msnbot.html [L]
This delivers a unique page to anything emanating from an msnbot block that is not identifying itself as msnbot.
After searching Bing today for the terms in the test page, sure enough, there it was. Why msnbot should feel that it's exempt from robots.txt and properly identifying itself, I can't imagine.