Forum Moderators: open
But...Google's adherence to robots meta tag and robots.txt is a bit different than most SEs. If it knows that a page exist it will probably list it in some fashion, usually with just the url and no snippet or description.
The only way I've found to be semi-successful to have Google not list a page (besides a ban) is to both disallow in robots.txt and use noindex, nofollow in robots meta tag. But sometimes even that doesn't work.
Oh, and instead of using ALL, just don't include the robot meta tag at all.
Jim
But...Google's adherence to robots meta tag and robots.txt is a bit different than most SEs. If it knows that a page exist it will probably list it in some fashion, usually with just the url and no snippet or description.
I have seen Google do this often, BUT when I tracked back through the logs... Google had not crawled the page in question, but was listing the URL because it had crawled a page that linked to it. The pages were not crawled because the sites were new and had very few incoming links to inspire G to crawl deep.
Invariably the next month those pages were crawled and either listed or not as their robots tags required.
If they find any link to your page, they will list it. If it is Disallowed in robots.txt, they will not fetch it. However, they will still list it in their SERPs using the URL for the title. Because it has not been crawled, it will not show up in the SERPs for any search terms, except for those used as link text in the links they found. It will also show up for any search terms matching the keywords-in-URL for that page, and for domain searches.
JimB, the way to keep pages completely out of these two SE's is to Allow (by not Disallowing) them in robots.txt, and use the <meta name="robots" content="noindex"> tag or any valid variant. You have to allow the page to be fetched by the spider in order for it to read the <robots> tag.
Google and AJ/T apparently interpret the word "index" to mean "fetch" when applied to robots.txt, but interpret the same word to mean "include in index" when applied to the <robots> meta tag.
I can't figure out why, but 'tis not for me to ask. That's how it works, and there is a work-around, so it's off my priority list. It took me four months to get my e-mail contact forms de-listed, but that's how I did it. To see if it's working or not, search for your own domain name and/or any applicable keywords-in-URL.
HTH,
Jim
That sure as heck makes sense, though think that's a variation I had until a few months ago and did not work. But only takes a second to take out the disallow to see what happens next deep crawl.
Thanks JD,
Jim