---- Pages are indexed even after blocking in robots.txt
Robert_Charlton - 10:33 am on Sep 2, 2012 (gmt 0)
Example: I have a website without a sitemap. I have a directory, which is disallowed in robots.txt, all links to the pages in the directory are nofollow and there are no external links to those pages. Yet, one of them made it's way to the index.
atlrus, basically you shouldn't be using robots.txt to keep a url out of the index. Use either password protection or the noindex robots meta tag. Again, note that if you use both robots.txt and noindex, Googlebot won't spider the page and won't see noindex. It's a little confusing at first, but it starts making sense if you give it some thought.
Regarding how Google found the page in the first place, consensus on this forum and elsewhere is that it's most likely publicly available server logs. The topic is coming up with frightening frequency, so you are not alone. Here's the most recent discussion on the topic. I suggest you read all the references I link to in my post in the thread...