homepage Welcome to WebmasterWorld Guest from 54.204.231.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
noindex, nofollow, noarchive
pageoneresults




msg:3724517
 4:36 pm on Aug 16, 2008 (gmt 0)

<meta name="robots" content="noindex, nofollow, noarchive"></meta>

Can someone tell me if there is anything wrong with the above Robots Meta Element? Would noarchive trump the noindex, nofollow directives? I was always under the impression that the first directive will override any subsequent directives if there is a conflict, am I correct?

 

wilderness




msg:3724662
 11:49 pm on Aug 16, 2008 (gmt 0)

I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols:

The 1996 protocol for "index and follow".
The 2008 additions for "noarchive, noodp, nosnippet, noydir", which are all specific to certain brands of bots.

Lord Majestic




msg:3724665
 11:55 pm on Aug 16, 2008 (gmt 0)

It's fine - better avoid having extra tag though: </meta>, if you want XML validation you can do:

<meta name="robots" content="noindex, nofollow, noarchive"/>

pageoneresults




msg:3724717
 2:10 am on Aug 17, 2008 (gmt 0)

I was unable to locate any suggestion of the mixing (i.e., addition of noarchive) of approved protocols:

I've not been able to find any specific references to the mixing of noarchive either. I've been following this whole robots thing for years and thought I had it figured out.

I can tell you from experience, within the last 7 days that the above does not work. Googlebot ignores the first two directives of noindex, nofollow and defaults to noarchive. How do I know? I just had a page using the above configuration appear in the top ten for its targeted term. I didn't want that page to get indexed and it did. :(

A flaw? You would think that you could use one robots metadata element to achieve the intended directives, yes? I mean, you can mix these four and a comma separated list of values are allowed...

About the Robots <META> tag
[robotstxt.org...]

Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW", so there's no need to spell that out.

Robots and the META element
[w3.org...]

The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX. Note. In early 1997 only a few robots implement this, but this is expected to change as more public attention is given to controlling indexing robots.

Take the above and then read this page at Google...

Prevent or remove cached pages
[google.com...]

To prevent all search engines from showing a "Cached" link for your site, place this tag in the <HEAD> section of your page:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Now that I've tested this and have been bitten once, I guess I need to utilize two robots metadata elements to achieve the desired goal. I just don't understand why I should have to do that. I feel that I listed the elements in their proper order, or maybe I didn't. Maybe noarchive should be first in the comma separated list of values?

It's fine.

I thought so too. Apparently it is not. :(

Better avoid having extra tag though: </meta>, if you want XML validation you can do:

Actually the </meta> validates XHTML. It is an option to the /> closing element. I've been slowly switching over to the /> method as time permits. Both validate just fine. The /> closing element presented some challenges back in the beginning so I chose the optional </meta> element.

I guess moving forward I'll have to do this...

<meta name="robots" content="noindex, nofollow" />
<meta name="robots" content="noarchive" />

You're probably wondering why I even have the noarchive there with the noindex, nofollow? It's more of a fail safe to keep the page from being cached in case something goes wrong. Heh! It worked in this instance. I have a top ten position for a product name that I was attempting to avoid. ;)

I'm going to give it another 48-72 hours before breaking them into two. I'd like to see some more input on this. I would think that one element is all that is needed.

Key_Master




msg:3724737
 3:14 am on Aug 17, 2008 (gmt 0)

There shouldn't be a need for a noarchive directive along with a noindex directive. If the page isn't indexed, it shouldn't be listed in the serps at all.

In fact, I wouldn't even think of noarchive as a robots.txt directive. It's just a legal umbrella used by certain search engines to offer them some protections from lawsuits over copyright violations. Consider it as an option for pages you do want indexed.

Samizdata




msg:3724748
 3:59 am on Aug 17, 2008 (gmt 0)

I would think that one element is all that is needed

Ideally that would be robots="none" wouldn't it?

Unfortunately the search engines all seem to do things differently.

Agreeing that "noindex" means what it says would be a start.

But they hate taking no for an answer.

...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved