Correct Robots Meta Syntax

Forum Moderators: open

Message Too Old, No Replies

Correct Robots Meta Syntax

Am I doing this correctly?

GaryK

5:13 am on Oct 3, 2006 (gmt 0)

Hi everyone. I think it's my first post in this forum. I usually hang out in Spider ID. Thanks in advance for reading about the problem I'm having and hopefully offering a solution.

I have been using the above HTML for years and never gave it a second thought until someone I respect told me today it's wrong even though it validates.

I don't want bots to archive any of my pages so the first tag is always constant.

The second tag changes according to the page. For example, I don't always want to use index,follow.

That's why I started using two tags.

What's the best way to deal with this problem please? Is the below markup acceptable?

Again, thanks for any help you all can offer. :)

RonPK

2:01 pm on Oct 3, 2006 (gmt 0)

Yeah, /me thinks you should combine them in one tag.

'index' and 'follow' are default values: spiders will always index pages and follow links, unless either a robots.txt or a meta-tag specifies otherwise. So you could leave out those values.

penders

2:19 pm on Oct 3, 2006 (gmt 0)

<meta name="robots" content="noarchive">
<meta name="robots" content="index,follow">

If I was a search engine, what would I do? If I found two

<meta name="description"...

tags, what would I do?! Am I expected to combine them?! Ignore the 2nd one? Override the 1st with the 2nd?

My guess is that they would *not* be combined. One or other would be ignored in my opinion. Or is the robots meta tag a special case?

And like RonPK says, '

index, follow

' is the default action so does not need to be explicitly stated. The robots meta tag is only reqd if you want to restrict the spiders behaviour.... noarchive, noindex, nofollow...

GaryK

3:20 pm on Oct 3, 2006 (gmt 0)

The second tag changes according to the page. For example, I don't always want to use index,follow.

Respectfully, I think you all missed the above line in my message. :)

I do not always use the default settings which is why this problem exists for me in the first place.

With that in mind how would you approach this problem?

This seems like the most logical way to do it but I am not sure it's valid. Is it? Thank you.

RonPK

3:49 pm on Oct 3, 2006 (gmt 0)

I don't always want to use index,follow

You never need to use

index, follow

<meta name="robots" content="noarchive,index,follow">

is a bad example as

<meta name="robots" content="noarchive">

will have the same effect.

<meta name="robots" content="noarchive,noindex,nofollow">

is a better example of a valid tag with multiple arguments in the content attribute.

penders

4:15 pm on Oct 3, 2006 (gmt 0)

The second tag changes according to the page. For example, I don't always want to use index,follow.

Respectfully, I think you all missed the above line in my message. :)

It wasn't missed... it was just the fact that you had a 2nd line in the first place, regardless of content. But you did use 'index,follow' in your example. :)

<meta name="robots" content="noarchive,index,follow">

would be 'technically' valid (I guess), but as RonPK says is unnecessary (don't mean to repeat).

As far as I understand it, the

content=""

attribute of the robots meta tag can take an arbitrary number of comma separated 'flags' that tell the search robots what to do. It is not a fixed singular phrase like "index,follow" (that is two 'flags' - "index" and "follow"). There are several other 'flags', like "noimageindex" and "noimageclick", but these are more search engine specific.

lmo4103

4:18 pm on Oct 3, 2006 (gmt 0)

<meta name="robots" content="noarchive,index,follow">

I don't see anything incorrect about this tag, even though index,follow could be omitted.

google, msn, and yahoo all document a "noarchive" in the content.

However, msn only specifically documents <meta name="msnbot" content="noarchive">
How I wish their documentation was complete and completely trustworthy.

If you try the tag on some "guinea pig" pages and it performs correctly, then you'll know. If the two tags perform exactly as you desire, I see no reason to change. (It ain't broke, so fix it -- NOT!)

Good old webmasterworld must be using some "noarchive" because there is no cache option in the SERPs.

penders

4:33 pm on Oct 3, 2006 (gmt 0)

Good old webmasterworld must be using some "noarchive" because there is no cache option in the SERPs.

I just had a look at the source of this very page, and...

<META NAME="ROBOTS" CONTENT="NOINDEX">

How does that work then...?

This prompted a quick look at the robots.txt file, and...

User-agent: * 
Disallow: /

Eh?! WebmasterWorld is clearly indexed, anyone care to explain...?

Tastatura

4:50 pm on Oct 3, 2006 (gmt 0)

...
This prompted a quick look at the robots.txt file, and...
User-agent: *
Disallow: /
Eh?! WebmasterWorld is clearly indexed, anyone care to explain...?

Dynamically generated robots.txt file (well commands in the file). If you look at the file read the section at the top that is commented out.

GaryK

6:14 pm on Oct 3, 2006 (gmt 0)

Brett uses a whitelisting system for robots.txt. Since your user agent isn't on his whitelist you see the dynamically created robots.txt file that disallows everything.

Perhaps my meta tag example was a bad one since I used the defaults and I know they can be omitted.

So let's use the following user agent as an example:

I think penders and lmo4103 have stated it's valid. Am I correct? Thanks. :)

RonPK

8:29 pm on Oct 3, 2006 (gmt 0)

<meta name="robots" content="noarchive,noindex,nofollow">
I think penders and lmo4103 have stated it's valid. Am I correct? Thanks. :)

No, I said it too ;)

penders

9:39 pm on Oct 3, 2006 (gmt 0)

Dynamically generated robots.txt file.... ....Brett uses a whitelisting system for robots.txt. Since your user agent isn't on his whitelist you see the dynamically created robots.txt file that disallows everything.

Ah right - I see! (Been to have a closer read...) And I guess, since these pages are dynamically created, then the

<META NAME="ROBOTS" CONTENT="NOINDEX">

is inserted into the page in a similar way!

GaryK

12:32 am on Oct 4, 2006 (gmt 0)

No, I said it too

I'm sorry Ron. You get extra credit points too. ;)

is inserted into the page in a similar way

I don't know all of Brett's secrets but that's certainly how I handle my own whitelisting system.

Thanks folks. You all have been helpful and I appreciate it very much.

lmo4103

1:31 pm on Oct 4, 2006 (gmt 0)

To boldly go where no man has gone before... and let us know how it comes out.

GaryK

3:37 pm on Oct 4, 2006 (gmt 0)

I see you're a fan of the original series. ;)

I'll let you all know how this works out. I'm a bit worried because right now I rank #1 and #2 for my keywords with the major SEs.