indyank - 3:31 am on Jun 29, 2013 (gmt 0)
i understand why 401 URLs aren't shown in the SERPS. But even roboted out URLs should not figure in their index in any form.
The reason why most webmasters would block bots that claim to obey robots.txt is to ensure they don't send any visitors to their site from their platforms. When a webmaster is instructing googlebot through robot.txt, not to crawl the content of a page(though there is no foolproof way to ensure that they aren't crawling and having the content in their DB), googlebot is supposed to ensure their visitors never find that page in any form and by any mean, via their SERPS. How are they helping their genuine visitors by showing links with a boilerplate description? The fact that links to such pages exist on other sites does in no way authorize them to bypass/discard robots.txt for such pages and show them on their SERPS with a boilerplate description.