incrediBILL - 9:13 am on May 17, 2013 (gmt 0) [edited by: incrediBILL at 9:18 am (utc) on May 17, 2013]
but the 403 isn't intrinsic to robots.txt. That's the difference.
No 403s for the actual robots.txt request, it's 100% according to spec. I let them have the robots.txt and it responds to the robots.txt request with a valid robots.txt file that states they are DENIED or ALLOWED with the appropriate robots.txt content for either scenario.
However, if they are denied, making any other requests to the site are also denied.
Basically, don't ask for the robots.txt if you aren't a robot because it could get ugly although I put a turing test on the robots.txt 403 page so those denied just for being nost can still say "YES! I'M A HUMAN!" and get out of jail free.
I also leave the "contact us" page wide open like the robots.txt file. and it's linked from the r403 page, so if someone is being locked out by accident they can drop an email. Every now and then I get an email, not too often, and more often than not I see in my log file that something odd was going on and I ask them to explain themselves.
The email I get contains their IP address and a link to an admin page to pull up all their activity on demand.
Work smart, not hard ;)
[edited by: incrediBILL at 9:18 am (utc) on May 17, 2013]