Forum Moderators: open

Message Too Old, No Replies

Is Noarchive, No Index, No Follow bad?

         

Jill

1:02 am on Jul 31, 2002 (gmt 0)

10+ Year Member



I have a friend that has this in their meta tags:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="robots" CONTENT="INDEX,NOFOLLOW">

This apparently did not stop Google from indexing them or their following pages as they do have some PR showing. What would this do or prevent, if anything?

pageoneresults

1:09 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Document will not be archived, and no links on the page will be followed. The purpose of the NOARCHIVE tag is to allow content developers to permit indexing but forbid archiving. If you include:

<META NAME="robots" CONTENT="INDEX,NOFOLLOW">

The above tag is telling the robot to index that page only and not to follow links.

That page with these tags would get indexed but links on that page would not be followed.

The NOARCHIVE tag has been discussed here and it was mentioned that this will raise an eyebrow with some as it could be miscontrued as hiding content. Since Google will not show a cache version of a page that has this tag, it comes under scrutiny.

As far as Google following links based on the above tags, I would think Googlebot got in elsewhere because it will usually obey the directives.

I personally don't think there is enough support for these META's to use them safely. A robots.txt would be more appropriate and provide a little more comfort. Its still not a perfect solution.

pageoneresults

1:17 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I believe those two tags can be combined and read like this...

<META NAME="ROBOTS" CONTENT="NOARCHIVE, INDEX">

That would tell the spider to;

1. Not archive
2. Not follow links
3. Index that page only

It may be possible that the presence of those two META's could be confusing the spider. My understanding is if the tag is not set up properly, the robot will ignore and do what it normally does.

[edited by: pageoneresults at 1:18 am (utc) on July 31, 2002]

Macguru

1:18 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Jill,

I would put

<META NAME="robots" CONTENT="NOINDEX,NOFOLLOW">

on pages you dont want to be indexed. Google can follow links from any other pages on the net to find those pages. It can also find all pages at root level of a site. Read all about the robots meta tag here [searchengineworld.com].

You can also exclude them with the robots.txt [searchengineworld.com] file.
You can also use the page exclusion tool [google.com] to ask Google to remove those pages from its index.

pageoneresults

1:21 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmmm, its been a little while since I've read that page and I'm wondering if NOARCHIVE should be added to the list of the current 4 directives. Or, does that NOARCHIVE need to reside in a META by itself to function?

Jill

1:41 am on Jul 31, 2002 (gmt 0)

10+ Year Member



Thanks, I was just curious because they certainly want the whole site indexed. I don't think they had a clue when they stuck those meta tags up there. I appreciate the information.

Marcia

1:42 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can see noarchive on pages or sites that change a lot, so things won't end up being outdated (or a couple other reasons) but on a couple of categories I watch some just showed up, they were the first ones I went looking at to check out. And they sure were eyebrow raisers :)

pageoneresults

4:17 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some conflicting information...

I originally posted this...

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

Document will not be archived, and no links on the page will be followed. The purpose of the NOARCHIVE tag is to allow content developers to permit indexing but forbid archiving.

Google says this...

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

This tag will tell robots not to archive the page. Google will continue to index and follow links from the page, but will not present cached material to users.

If you want to allow other robots to archive your content, but prevent Google's robots from caching, you can use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

Note that the change will occur the next time Google crawls the page containing the NOARCHIVE tag (typically at least once per month). If you want the change to take effect sooner than this, the site owner must contact us and request immediate removal of archived content. Also, the NOARCHIVE directive only controls whether the cached page is shown. To control whether the page is indexed, use the NOINDEX tag; to control whether links are followed, use the NOFOLLOW tag. See the Robots Exclusion page for more information.

Can anyone give a definitive answer for the NOARCHIVE directive?

Marcia

4:25 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>a definitive answer

For Google, this is as definitive as it gets:

[google.com...]

pageoneresults

4:53 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Marcia, I saw that. So the NOARCHIVE has different meanings for different robots, or at least to Google anyway.

Hey, did you notice they have incorrect syntax on that page...

User-Agent: Googlebot
Disallow: /*.gif$

The A in Agent should be lower case. The robots.txt file is case sensitive.

Marcia

5:40 am on Jul 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>The A in Agent should be lower case.

hehe. I know, I copied and pasted out of laziness to exclude a directory and it flunked the validator.