noindex, nofollow, noarchive

Forum Moderators: phranque

Message Too Old, No Replies

noindex, nofollow, noarchive

pageoneresults

4:36 pm on Aug 16, 2008 (gmt 0)

<meta name="robots" content="noindex, nofollow, noarchive"></meta>

Can someone tell me if there is anything wrong with the above Robots Meta Element? Would noarchive trump the noindex, nofollow directives? I was always under the impression that the first directive will override any subsequent directives if there is a conflict, am I correct?

wilderness

11:49 pm on Aug 16, 2008 (gmt 0)

I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols:

The 1996 protocol for "index and follow".
The 2008 additions for "noarchive, noodp, nosnippet, noydir", which are all specific to certain brands of bots.

Lord Majestic

11:55 pm on Aug 16, 2008 (gmt 0)

It's fine - better avoid having extra tag though: </meta>, if you want XML validation you can do:

pageoneresults

2:10 am on Aug 17, 2008 (gmt 0)

I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols:

I've not been able to find any specific references to the mixing of noarchive either. I've been following this whole robots thing for years and thought I had it figured out.

I can tell you from experience, within the last 7 days that the above does not work. Googlebot ignores the first two directives of noindex, nofollow and defaults to noarchive. How do I know? I just had a page using the above configuration appear in the top ten for its targeted term. I didn't want that page to get indexed and it did. :(

A flaw? You would think that you could use one robots metadata element to achieve the intended directives, yes? I mean, you can mix these four and a comma separated list of values are allowed...

About the Robots <META> tag
[robotstxt.org...]

Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW", so there's no need to spell that out.

Robots and the META element
[w3.org...]

The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX. Note. In early 1997 only a few robots implement this, but this is expected to change as more public attention is given to controlling indexing robots.

Take the above and then read this page at Google...

Prevent or remove cached pages
[google.com...]

To prevent all search engines from showing a "Cached" link for your site, place this tag in the <HEAD> section of your page:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Now that I've tested this and have been bitten once, I guess I need to utilize two robots metadata elements to achieve the desired goal. I just don't understand why I should have to do that. I feel that I listed the elements in their proper order, or maybe I didn't. Maybe noarchive should be first in the comma separated list of values?

It's fine.

I thought so too. Apparently it is not. :(

Better avoid having extra tag though: </meta>, if you want XML validation you can do:

Actually the </meta> validates XHTML. It is an option to the /> closing element. I've been slowly switching over to the /> method as time permits. Both validate just fine. The /> closing element presented some challenges back in the beginning so I chose the optional </meta> element.

I guess moving forward I'll have to do this...

<meta name="robots" content="noindex, nofollow" />

<meta name="robots" content="noarchive" />

You're probably wondering why I even have the noarchive there with the noindex, nofollow? It's more of a fail safe to keep the page from being cached in case something goes wrong. Heh! It worked in this instance. I have a top ten position for a product name that I was attempting to avoid. ;)

I'm going to give it another 48-72 hours before breaking them into two. I'd like to see some more input on this. I would think that one element is all that is needed.

Key_Master

3:14 am on Aug 17, 2008 (gmt 0)

There shouldn't be a need for a noarchive directive along with a noindex directive. If the page isn't indexed, it shouldn't be listed in the serps at all.

In fact, I wouldn't even think of noarchive as a robots.txt directive. It's just a legal umbrella used by certain search engines to offer them some protections from lawsuits over copyright violations. Consider it as an option for pages you do want indexed.

Samizdata

3:59 am on Aug 17, 2008 (gmt 0)

I would think that one element is all that is needed

Ideally that would be robots="none" wouldn't it?

Unfortunately the search engines all seem to do things differently.

Agreeing that "noindex" means what it says would be a start.

But they hate taking no for an answer.

...