Forum Moderators: open
<meta name="robots" content="noarchive">
<meta name="robots" content="index,follow">
I have been using the above HTML for years and never gave it a second thought until someone I respect told me today it's wrong even though it validates.
I don't want bots to archive any of my pages so the first tag is always constant.
The second tag changes according to the page. For example, I don't always want to use index,follow.
That's why I started using two tags.
What's the best way to deal with this problem please? Is the below markup acceptable?
<meta name="robots" content="noarchive,index,follow">
Again, thanks for any help you all can offer. :)
<meta name="robots" content="noarchive">
<meta name="robots" content="index,follow">
If I was a search engine, what would I do? If I found two
<meta name="description"...tags, what would I do?! Am I expected to combine them?! Ignore the 2nd one? Override the 1st with the 2nd?
My guess is that they would *not* be combined. One or other would be ignored in my opinion. Or is the robots meta tag a special case?
And like RonPK says, '
index, follow' is the default action so does not need to be explicitly stated. The robots meta tag is only reqd if you want to restrict the spiders behaviour.... noarchive, noindex, nofollow...
The second tag changes according to the page. For example, I don't always want to use index,follow.
I do not always use the default settings which is why this problem exists for me in the first place.
With that in mind how would you approach this problem?
<meta name="robots" content="noarchive,index,follow">
This seems like the most logical way to do it but I am not sure it's valid. Is it? Thank you.
I don't always want to use index,follow
You never need to use
index, follow:)
So
<meta name="robots" content="noarchive,index,follow">is a bad example as
<meta name="robots" content="noarchive">will have the same effect.
<meta name="robots" content="noarchive,noindex,nofollow">is a better example of a valid tag with multiple arguments in the content attribute.
The second tag changes according to the page. For example, I don't always want to use index,follow.Respectfully, I think you all missed the above line in my message. :)
It wasn't missed... it was just the fact that you had a 2nd line in the first place, regardless of content. But you did use 'index,follow' in your example. :)
<meta name="robots" content="noarchive,index,follow">
As far as I understand it, the
content=""attribute of the robots meta tag can take an arbitrary number of comma separated 'flags' that tell the search robots what to do. It is not a fixed singular phrase like "index,follow" (that is two 'flags' - "index" and "follow"). There are several other 'flags', like "noimageindex" and "noimageclick", but these are more search engine specific.
<meta name="robots" content="noarchive,index,follow">
I don't see anything incorrect about this tag, even though index,follow could be omitted.
google, msn, and yahoo all document a "noarchive" in the content.
However, msn only specifically documents <meta name="msnbot" content="noarchive">
How I wish their documentation was complete and completely trustworthy.
If you try the tag on some "guinea pig" pages and it performs correctly, then you'll know. If the two tags perform exactly as you desire, I see no reason to change. (It ain't broke, so fix it -- NOT!)
Good old webmasterworld must be using some "noarchive" because there is no cache option in the SERPs.
Good old webmasterworld must be using some "noarchive" because there is no cache option in the SERPs.
I just had a look at the source of this very page, and...
<META NAME="ROBOTS" CONTENT="NOINDEX">
How does that work then...?
This prompted a quick look at the robots.txt file, and...
User-agent: *
Disallow: /
Eh?! WebmasterWorld is clearly indexed, anyone care to explain...?
Perhaps my meta tag example was a bad one since I used the defaults and I know they can be omitted.
So let's use the following user agent as an example:
<meta name="robots" content="noarchive,noindex,nofollow">
I think penders and lmo4103 have stated it's valid. Am I correct? Thanks. :)
Dynamically generated robots.txt file.... ....Brett uses a whitelisting system for robots.txt. Since your user agent isn't on his whitelist you see the dynamically created robots.txt file that disallows everything.
Ah right - I see! (Been to have a closer read...) And I guess, since these pages are dynamically created, then the
<META NAME="ROBOTS" CONTENT="NOINDEX">is inserted into the page in a similar way!