incrediBILL - 10:43 pm on May 16, 2013 (gmt 0)
Robots.txt There is no law, it's just a guideline ;-)
Maybe on YOUR servers.
On my servers any violation of robots.txt gets a 403 forbidden. Even Googlebot asking for pages it's told not to access get a 403 forbidden and the script to do it is pretty easy really using robots.txt processing rules readily available in open source. The same PHP functions with sets of rules crawlers use to process the robots.txt page and also be reversed and used by the site being crawled. When a bot crawls the page you use their user agent and just as the robots.txt function, using your robots.txt file, if it's allowed or not. Easy peasy.
Also, just asking for robots.txt and being denied puts you on the list so come back with any user agent you like, the IP has been blocked. Basically the rule for robots.txt is "asked and answered" and the answer is either "pass or fail" for that IP.