Forum Moderators: goodroi
This is of course an educated guess as to how Google's architecture works.
However, Google, Yahoo, and Ask -- and perhaps MSN, have taken to listing some pages as a URL-only listing, or using link text from the page where the link was discovered, for pages that your robots.txt tells them not to fetch. Since the robots standard was written soley with bandwidth-conservation in mind, they're essentially taking advantage of this to "Spider the Deep Web." Some Webmasters think it's OK, some think it's part of the Grand Conspiracy. I think it's just a nuisance.
If you want a page "not mentioned," then allow it to be fetched in robots.txt, and then use the on-page HTML <name="robots" content="noindex"> tag to prevent it being listed.
I should note that while this used to work perfectly, I've had some trouble with Google still listing a few of those pages, and have resorted to periodically using their URL removal tool to de-list those pages.
Notwithstanding this (hopefully temporary) problem, it's a trade-off depending on what you hope to accomplish -- To reduce bandwidth consumed by spiders, or to keep low-value or non-optimal-landing pages out of the index.
If you're looking for security, then password-protecting the page is the way to go, or using user-agent- *and* IP-address-sensitive redirection or rewriting to keep the 'bots out of certain pages/directories. This latter approach is closely-related to cloaking, so be careful that no-one would interpret it as an attempt to deceive search engines or their users. You've also got to keep a sharp eye out for new spider User-agent names and new IP address ranges.
Jim