Forum Moderators: mack

Message Too Old, No Replies

Meta tag question

         

cbpayne

11:28 pm on May 7, 2004 (gmt 0)

10+ Year Member



Should it be:
<meta name="robots" content="all=index,follow">

or

<meta name="robots" content="all,index,follow">

encyclo

12:12 am on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Neither ;)

<meta name="robots" content="index,follow">

You can specify the values "index" "noindex" "follow" and "nofollow".

All major search engine bots index and follow by default (otherwise they'd never index anything, because 99% of the web doesn't specify a robots.txt file or robots meta tag). So, in your case, if you want it to be indexed, you don't need the meta tag at all.

What's more, having a meta tag with a syntax error like the two versions you gave is increasing the risk of confusing the bot, and it may decide not to index to be on the safe side.

iamlost

12:15 am on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<meta name="Robots" content="all, index, follow">

is correct: you want all robots, you want them to index and you want them to follow links.

cbpayne

12:25 am on May 8, 2004 (gmt 0)

10+ Year Member



THANKS.

The reason I asked is that a lot of sites that appear to be banned from Google because they were possibly caught using the "advertising pages" from a certain company, all had this in this version in their meta tags:

<meta name="robots" content="all=index,follow">

What would the consequences of using this tag be? Could this be the reason Google dropped them.

encyclo

12:25 am on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To clarify, "all" is the same as "index,follow", and "none" is the same as "noindex,nofollow".

iamlost - the syntax you gave is wrong - you either need "all" or "index,follow", not both, and there should be no space afer the comma.

What would the consequences of using this tag be? Could this be the reason Google dropped them.

An invalid tag is worse than no tag. If googlebot wasn't sure, or was confused by the "=" sign (which has absolutely no place there), it may have chosen not to index as a precaution. As I said, if you want pages to be indexed, remove all the meta tags completely - that way, you won't need to worry.

[edited by: encyclo at 12:28 am (utc) on May 8, 2004]

iamlost

12:26 am on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Both encyclo and my answers are the same in that all is the default - but there is no harm in including a default term.

From:
[w3.org ]


ROBOTS meta-tag

<META NAME="ROBOTS"
CONTENT="ALL ¦ NONE ¦ NOINDEX ¦ NOFOLLOW">

default = empty = "ALL"
"NONE" = "NOINDEX, NOFOLLOW"

The filler is a comma separated list of terms:
ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.

Discussion: This tag is meant to provide users who cannot control
the robots.txt file at their sites. It provides a last chance to
keep their content out of search services. It was decided not to
add syntax to allow robot specific permissions within the meta-tag.

INDEX means that robots are welcome to include this page in
search services.

FOLLOW means that robots are welcome to follow links from this
page to find other pages.

So a value of "NOINDEX" allows the subsidiary links to be explored,
even though the page is not indexed. A value of "NOFOLLOW" allows the
page to be indexed, but no links from the page are explored (this may
be useful if the page is a free entry point into pay-per-view content,
for example. A value of "NONE" tells the robot to ignore the page.

ytswy

12:44 am on May 8, 2004 (gmt 0)

10+ Year Member



Just to agree with encyclo:

If you want a page indexed, just remove all robots meta tags - I really don't think that explicit follow,index or whatever is likely to have any greater effect at all.

A bit of anecdotal evidence: I recently removed the noindex tag from a page that had had it for about a year and a half - 3 days later it was in Google and has been since.

HarryM

1:03 am on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have been reading this with interest because I am using the following on pages I want to keep bot free.

<meta name="robots" content="noindex,noarchive,nofollow">

You dont mention archive/noarchive - have I got it wrong?

Second question: is there a standard order for the syntax?

iamlost

10:45 pm on May 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



archive/noarchive are not valid. index/noindex are valid and accomplish the same - the search engine "indexes" (archives) the page for future reference.

I quite agree with encyclo that the robots meta tag is technically unnecessary if you want your page indexed and your links followed as normal default bot behaviour is to index and follow everything. Leaving it out saves a tiny bit of bandwidth - all those bits can add up something aweful on a large popular site! Also, many/most bots simply ignore that meta tag anyway.

Some people - like myself! - who started way back when meta tags were new awesome control features have a hard time taking them out. Some - like me - like to include "defaults" due to unhappy experiences where expected default behaviour has been ignored if not made explicit. Some - me! - like to use the tag as a record of what is wanted - along with author, etc.

iamlost - the syntax you gave is wrong - you either need "all" or "index,follow", not both, and there should be no space afer the comma.

Your syntax correction is quite right - thank you.

There is no specification that I am aware of that requires "no space" after the comma. Indeed, all the examples I could find at w3c, as well as other informative sites and textbooks, and my own site experience use the space for clarity of reading code without bot penalty (as long as it is within the quote marks).

HarryM

12:54 pm on May 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



archive/noarchive are not valid

I'm still confused.

Google says it can be used to stop Google caching, and it is quoted without adverse comment in other threads.

Are you saying it's 'invalid' because it is not mentioned in W3C standards and therefore not a standard? Does it make other bots choke?

I take the point it is redundant if noindex is specified.

DaveAtIFG

5:39 pm on May 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AFAIK, the Robots Exclusion Protocol is a convention that most SEs follow, but it is not an adopted W3C standard.

Google has "trained" Googlebot to recognize "noarchive," see item B2 at [google.com...]

I think Yahoo offers a similar mechanism to prevent caching of pages, don't recall where that information is though. Anyone?