Forum Moderators: Robert Charlton & goodroi
First of all, remember that robots.txt controls crawl and the NOINDEX directive controls indexing
When you block a page in robots.txt it also doesn't index that page blocked in robots.txt.
As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site can appear in Google search result.
You can prevent this by combining this robots.txt with other URL blocking methods, such as password-protecting the files on your server, or inserting meta tags into your HTML.
To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index.
--
When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it.
A robots.txt file is a text file that stops web crawler software, such as Googlebot, from crawling certain pages of your site.
[edited by: JD_Toims at 1:42 am (utc) on Jul 19, 2014]
Which is ridiculous, since if GoogleBot follows the robots.txt directive and doesn't access the page the noindex on the page won't ever be seen, meaning their own support information is absolutely, unquestionably wrong, because the initial robots.txt support page says:
Got all that?