I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols:
The 1996 protocol for "index and follow".
The 2008 additions for "noarchive, noodp, nosnippet, noydir", which are all specific to certain brands of bots.
It's fine - better avoid having extra tag though: </meta>, if you want XML validation you can do:
<meta name="robots" content="noindex, nofollow, noarchive"/>
|I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols: |
I've not been able to find any specific references to the mixing of noarchive either. I've been following this whole robots thing for years and thought I had it figured out.
I can tell you from experience, within the last 7 days that the above does not work. Googlebot ignores the first two directives of noindex, nofollow and defaults to noarchive. How do I know? I just had a page using the above configuration appear in the top ten for its targeted term. I didn't want that page to get indexed and it did. :(
A flaw? You would think that you could use one robots metadata element to achieve the intended directives, yes? I mean, you can mix these four and a comma separated list of values are allowed...
About the Robots <META> tag
|Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW", so there's no need to spell that out. |
Robots and the META element
|The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX. Note. In early 1997 only a few robots implement this, but this is expected to change as more public attention is given to controlling indexing robots. |
Take the above and then read this page at Google...
Prevent or remove cached pages
|To prevent all search engines from showing a "Cached" link for your site, place this tag in the <HEAD> section of your page: |
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
Now that I've tested this and have been bitten once, I guess I need to utilize two robots metadata elements to achieve the desired goal. I just don't understand why I should have to do that. I feel that I listed the elements in their proper order, or maybe I didn't. Maybe noarchive should be first in the comma separated list of values?
I thought so too. Apparently it is not. :(
|Better avoid having extra tag though: </meta>, if you want XML validation you can do: |
Actually the </meta> validates XHTML. It is an option to the /> closing element. I've been slowly switching over to the /> method as time permits. Both validate just fine. The /> closing element presented some challenges back in the beginning so I chose the optional </meta> element.
I guess moving forward I'll have to do this...
<meta name="robots" content="noindex, nofollow" />
<meta name="robots" content="noarchive" />
You're probably wondering why I even have the noarchive there with the noindex, nofollow? It's more of a fail safe to keep the page from being cached in case something goes wrong. Heh! It worked in this instance. I have a top ten position for a product name that I was attempting to avoid. ;)
I'm going to give it another 48-72 hours before breaking them into two. I'd like to see some more input on this. I would think that one element is all that is needed.
There shouldn't be a need for a noarchive directive along with a noindex directive. If the page isn't indexed, it shouldn't be listed in the serps at all.
In fact, I wouldn't even think of noarchive as a robots.txt directive. It's just a legal umbrella used by certain search engines to offer them some protections from lawsuits over copyright violations. Consider it as an option for pages you do want indexed.
|I would think that one element is all that is needed |
Ideally that would be robots="none" wouldn't it?
Unfortunately the search engines all seem to do things differently.
Agreeing that "noindex" means what it says would be a start.
But they hate taking no for an answer.