It appears that there is nothing you can do about this. As much as google likes to talk about rules, they are the biggest violator there is.
Example: I have a website without a sitemap. I have a directory, which is disallowed in robots.txt, all links to the pages in the directory are nofollow and there are no external links to those pages.
Yet, one of them made it's way to the index. When I use the "site:" command, the page shows with the URL as the title and the description is "A description for this result is not available because of this site's robots.txt – learn more"