The Robots META Tag

Forum Moderators: mack

Message Too Old, No Replies

The Robots META Tag

Googlebot?

pageoneresults

12:41 pm on Jul 15, 2004 (gmt 0)

Can someone help me out here?

What is this tag?

<meta name="googlebot" content="index, follow">

A reference to the area on Google where the above tag is described and suggested would be appreciated.

In addition to the above, why do people utilize this Robots META Tag?

<meta name="robots" content="index, follow"> or <meta name="robots" content="all">

brakthepoet

8:48 pm on Jul 15, 2004 (gmt 0)

>><meta name="googlebot" content="index, follow">
>>reference to the area on Google

[google.com...]

Well, there's the name="googlebot" portion. I've only seen it listed on Google as a method to disallow Googlebot from crawling or indexing. As to the "index, follow": possibly over-zealous webmasters who are hoping that explicitly allowing Googlebot will improve their ranking or time between crawls?

pageoneresults

9:28 pm on Jul 15, 2004 (gmt 0)

lol, I was just waiting for someone to take the bait. ;)

Possibly over-zealous webmasters who are hoping that explicitly allowing Googlebot will improve their ranking or time between crawls?

Now we are getting somewhere. There is a whole new breed of SEOs coming into the market place that seem to want to create their own set of erroneous metadata. The

index, follow

for Googlebot being one of them.

The myth now begins. I've seen over 100 instances of that tag in the last 5 days. All of it from a particular region of the world. This happens when someone misinterprets the guidelines and then decides to insert another robots-term that they think will have an influence on Googlebot.

You watch, in a year from now, many newcomers will have that piece of metadata in their

<head></head>

jdMorgan

9:45 pm on Jul 15, 2004 (gmt 0)

There's a legitmate reason you'll see this meta, though... Think server-side includes and centralized management. :)

(No need for the space between the parameters, though.)

Jim

pageoneresults

11:08 pm on Jul 15, 2004 (gmt 0)

Okay jd, I thought, I thought again, and I'm not getting it. If the default behavior of search engine spiders is index, follow or all, what purpose does this tag serve...

<meta name="googlebot" content="index, follow">

or this one for that matter...

I've searched, researched, dug holes, knocked down walls and I still can't find any authoritative references that suggest the use of index, follow in a META Robots Tag.

If I remember correctly, some time ago Inktomi's default behavior was to index, nofollow or something odd like that. I think they did suggest the use of the index, follow directive because of that issue. I never had any issues with Ink, so I never thought twice about using it.

Do either of these tags have any influence over the spiders behavior?

jdMorgan

11:23 pm on Jul 15, 2004 (gmt 0)

index,follow is the default behaviour. However, a few of those sites may be database-driven, and have an associated index into a table used to populate the page:

0 noindex,nofollow
1 noindex,follow
2 index,nofollow
3 index,follow

So, even though the last option doesn't actually accomplish anything (except yes, for Ink), it is there as a placeholder on the page in case a future change is needed (think campaigns). That's why I mentioned SSI and a central administration function for indexing control.

Jim

brakthepoet

1:24 am on Jul 16, 2004 (gmt 0)

>> I was just waiting for someone to take the bait <<

As long as it's not prefaced by "stink" or "jail", then I'll usually take it. :)

>> sites may be database-driven, and have an associated index into a table used to populate the page <<

Certainly a legitimate and easy to manage method. I can see its uses. But, I've seen this show up on a few SEO forums & sites as a method for improving rankings. pageoneresults' prediction that it will become a SEO urban myth is already coming true.

jdMorgan

2:07 am on Jul 16, 2004 (gmt 0)

Yeah, and don't forget your all-important "Revisit-after" tag! ;)

Jim

pageoneresults

2:07 am on Jul 16, 2004 (gmt 0)

stink or jail? lol!

jd, thanks for the explanation above. It makes sense but I've never seen an implementation like that.

Okay, I've put together some information to hopefully stop this one from becoming another element for the META tag generators out there. Most of this is extracted from the Google's Webcrawler [google.com] information page with some additional information from The HTML Authors Guide to the Robots META Tag [robotstxt.org].

Googlebot Robots META Tag

The Robots META Tag for Googlebot is meant to provide users who cannot upload or control the

/robots.txt

file at their websites, with a last chance to keep their content out of Google's indexes and services.

The "robots" tag is obeyed by many different web robots. If you'd like to specify indexing restrictions just for googlebot, you may use "googlebot" in place of "robots".

<meta name="googlebot" content="robots-terms">
<meta name="robots" content="robots-terms">

Googlebot obeys the noindex, nofollow, and noarchive Robots META Tag. If you place the tag in the head of your HTML/XHTML document, you can cause Google to not index, not follow, and/or not archive particular documents on your site.

The

content="robots-terms"

is a comma separated list used in the Robots META Tag for Google that may contain one or more of the following keywords without regard to case: noindex, nofollow and/or noarchive.

noindex

Document will not be indexed by Googlebot.

nofollow

Internal and external links in the document will not be followed by Googlebot.

noarchive

Google will not archive a copy of the document (Google's Cached Page).

If this Robots META Tag is missing, or if there is no content, or the robot terms are not specified, then the robot terms will be assumed to be "index, follow" (e.g. "all") which is the default indexing behavior for most search engine spiders.

Examples of the Googlebot Robots META Tag

The tags to include and their effects are:

The robots term of noindex will produce the following effect; Googlebot will retrieve the document, but it will not index the document.

<meta name="googlebot" content="noindex">

The robots term of nofollow will produce the following effect; Googlebot will not follow any links that are present on the page to other documents.

<meta name="googlebot" content="nofollow">

The robots term of noarchive will produce the following effect; Google maintains a cache of all the documents that we fetch, to permit our users to access the content that we indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.

<meta name="googlebot" content="noarchive">

You can also combine any or all of the above robots-terms into a single Robots META Tag for Google. For example:

<meta name="googlebot" content="noarchive, nofollow">

Misinterpretation of the Standards

Googlebot's default indexing behavior is to index, follow or all. The below Robots META Tag is not required nor is it suggested in the Google guidelines which clearly state that the use of the Robots META Tag is for restricting the indexing of content.

<meta name="googlebot" content="index, follow">

Utilizing erroneous metadata elements like the example shown above may not present a professional image to both your peers and potential clients. It also adds additional weight to your pages that is not required. You shift the text to html ratio when inserting the additional code within your documents.

[edited by: pageoneresults at 2:21 am (utc) on July 16, 2004]

pageoneresults

2:13 am on Jul 16, 2004 (gmt 0)

Yeah, and don't forget your all-important "Revisit-after" tag! ;)

Ah-ha, its nice to know someone read that one. ;)

Krapulator

6:26 am on Jul 16, 2004 (gmt 0)

My favourite made up meta tag is one that someone (may have been Marcia) posted a liong time ago:

TazMania

8:57 am on Jul 16, 2004 (gmt 0)

i dont get it?....what is the "revisit after" tag?

pmkpmk

10:03 am on Jul 16, 2004 (gmt 0)

Actually this makes me think of a rather hard to find way of compromising websites...

You hear every once in a while of (prominent) websites being hacked and slogans like "Kilroy was here" inserted. Well, the hacker gets the honour and the "damage" is easily fixed.
Sometimes, hackers destroy or erase the content of the page. Also easily fixed if the backup works.

If I really want to HURT somebody, I'd make the break-in very silent, very low profile. And all I'd do was adjust the robots.txt to lock out all the spiders. I would even leave the top part of the robots.txt which usually allows all spiders, then insert 100 blank lines, and then lock out googlebot, inktomi & co.

I think it will take the average webmaster VERY long to find out about this, but in the meantime all his pages are slowly getting unindexed.

So, everybody here's cheking their robots.txt now?

pageoneresults

3:18 pm on Jul 16, 2004 (gmt 0)

i dont get it?....what is the "revisit after" tag?

Revist-After META Tag - The Myth Continues in 2004 [webmasterworld.com]

jimbeetle

3:48 pm on Jul 16, 2004 (gmt 0)

<meta name="googlebot" content="robots-terms">

Uh, oh! A new meta myth is born!

You know all those folks who rarely read beyond the first paragraph of an article, and even then only every other word?

"What does that do?"

"I dunno. I found it on Webmaster World. It's supposed to improve your ranking in Google."

I wonder how long it's going to take for this "great new tag" to start showing up. :)

jdMorgan

4:04 pm on Jul 16, 2004 (gmt 0)

Yeah, and don't forget your all-important "Revisit-after" tag! ;)

Ah-ha, its nice to know someone read that one. ;)

Yes, but it was your META Tags [webmasterworld.com] thread back in January, 2003 that I read, and posted about it [webmasterworld.com], too (msg #8, item #1).

That tag's been dead for a long time.

With the Web as large as it is now -- and 7 million new pages every day, search engines spider your site when they decide to, and when they can get around to it. Thinking you can "tell" Google to re-spider a page every day with a meta-tag is, um, delusional... Lisa's JediMindTrick tag will work just as effectively. Getting your pages to PR5 or above works a lot better. :)

Jim