I tried fetching one of the robots excluded pages as googlebot. It just did happily totally ignored robots.txt but hit a 500 error because the page is not intended for visitors.
here is the result:
=============
Fetch as Google
This is how Googlebot fetched the page.
URL: http://www.example.com/forums/ips_kernel/HTMLPurifier/HTMLPurifier/PercentEncoder.php
Date: Friday, August 9, 2013 at 8:56:57 AM PDT
Googlebot Type: Web
Download Time (in milliseconds): 96
HTTP/1.1 500 Internal Server Error
Date: Fri, 09 Aug 2013 15:56:57 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.0-fips mod_bwlimited/1.4
Accept-Ranges: bytes
Content-Length: 2716
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><HTML><HEAD><TITLE>500 Internal Server Error</TITLE>...
=====================
google says the referring page is sitemap, actually all the error pages are not in sitemap.
What can I do in this case?
[edited by: phranque at 1:17 pm (utc) on Aug 10, 2013]
[edit reason] exemplified hostname [/edit]