Forum Moderators: open
Document will not be archived, and no links on the page will be followed. The purpose of the NOARCHIVE tag is to allow content developers to permit indexing but forbid archiving. If you include:
<META NAME="robots" CONTENT="INDEX,NOFOLLOW">
The above tag is telling the robot to index that page only and not to follow links.
That page with these tags would get indexed but links on that page would not be followed.
The NOARCHIVE tag has been discussed here and it was mentioned that this will raise an eyebrow with some as it could be miscontrued as hiding content. Since Google will not show a cache version of a page that has this tag, it comes under scrutiny.
As far as Google following links based on the above tags, I would think Googlebot got in elsewhere because it will usually obey the directives.
I personally don't think there is enough support for these META's to use them safely. A robots.txt would be more appropriate and provide a little more comfort. Its still not a perfect solution.
<META NAME="ROBOTS" CONTENT="NOARCHIVE, INDEX">
That would tell the spider to;
1. Not archive
2. Not follow links
3. Index that page only
It may be possible that the presence of those two META's could be confusing the spider. My understanding is if the tag is not set up properly, the robot will ignore and do what it normally does.
[edited by: pageoneresults at 1:18 am (utc) on July 31, 2002]
I would put
<META NAME="robots" CONTENT="NOINDEX,NOFOLLOW">
on pages you dont want to be indexed. Google can follow links from any other pages on the net to find those pages. It can also find all pages at root level of a site. Read all about the robots meta tag here [searchengineworld.com].
You can also exclude them with the robots.txt [searchengineworld.com] file.
You can also use the page exclusion tool [google.com] to ask Google to remove those pages from its index.
I originally posted this...
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
Document will not be archived, and no links on the page will be followed. The purpose of the NOARCHIVE tag is to allow content developers to permit indexing but forbid archiving.
Google says this...
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
This tag will tell robots not to archive the page. Google will continue to index and follow links from the page, but will not present cached material to users.
If you want to allow other robots to archive your content, but prevent Google's robots from caching, you can use the following tag:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
Note that the change will occur the next time Google crawls the page containing the NOARCHIVE tag (typically at least once per month). If you want the change to take effect sooner than this, the site owner must contact us and request immediate removal of archived content. Also, the NOARCHIVE directive only controls whether the cached page is shown. To control whether the page is indexed, use the NOINDEX tag; to control whether links are followed, use the NOFOLLOW tag. See the Robots Exclusion page for more information.
Can anyone give a definitive answer for the NOARCHIVE directive?