Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

The robots meta tag and googlebot

         

bostonseo

2:50 pm on Mar 4, 2006 (gmt 0)



I'm curious what the consensus is on Robot settings for Google and getting pages crawled-indexed. I don't see much consistency when I check top ranking websites in multiple industries.

Many do have code such as the following:
<META NAME="robots" CONTENT="ALL">
<META NAME="GOOGLEBOT" CONTENT="INDEX, FOLLOW">
<META NAME="revisit-after" CONTENT="1 day">

Then again it appears just as many top ranked websites do not have any Robot code.

Can anyone weigh in on what, if any, importance Robot meta code has at this point.

kaled

8:09 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So far as I am aware, the only content that is observed by the major search engines is noindex,nofollow. All other content of the robots meta tag is ignored.

If a top-ranking page included <meta name="toprank" content="I'm a fish"> there are people in this world that would copy this thinking they'd found something magical.

There's a lot of irrelevant junk out there in meta-world.

[searchengineworld.com...]
The link above suggests the values all and none are also recognised but all is the default so why use it?
It's possible that search engine bots obey scheduling instructions but I would check their individual webmaster notes rather than make guesses based on the works of others.

Kaled.

g1smd

8:19 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your document should begin with a !DOCTYPE (this tells the browser what sort of HTML is in the file) followed by the <html> and <head> tags:

.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>

.

For your page to actually be valid you MUST declare the character encoding (lets the browser know whether to use A to Z letters (Latin), or Chinese, Japanese, Thai, or Arabic script, or some other character set) used for the page, with something like:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

There are also other schemes such as UTF-8 and many others.

.

It is also a good idea to declare what human language the page is in, using:

<meta http-equiv="Content-Language" content="EN-GB">

The language and country codes come from ISO 4217 and ISO 3166. This is useful for online translation tools as well. Change the "en" and "gb" to whatever language and country you need.

.

You need a <title> element for the page:

<title> Your Title Here </title>

This is displayed at the top of the browser window, and stored as the name of the bookmark if someone bookmarks the page URL in their browser. Most importantly, it is the <title> tag that is indexed and displayed by search engines in the search results page (SERPs).

.

You need the meta description tag, as this is very important for search engines, and it is useful but not vital to have a meta keywords tag:

<meta name="Description" content=" Your Description Here. ">
<meta name="Keywords" content=" your, keyword, list, here ">

.

Most search engines do obey the robots meta tag. The default robots action is index, follow (index the page, follow all outbound links) so if you want something else (3 possibilities) then add the robots tag to the page in question. If you want to exclude whole directories then use the robots.txt file for this instead of marking every HTML file with the tag.

<meta name="robots" content="noindex,follow">

If you want a page spidered just omit this tag completely.

.

The last parts of your header should have your links to external style sheets and external javascript files:

.

Use this if the stylesheet is for all browsers:

<link type="text/css" rel="stylesheet" src="/path/file.css">

Use this for style sheet that you want to hide from older browsers, as older browsers often crash on seeing CSS:

<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"> @import url(/path/file.css); </style>

.

Use this for the javascript:

<script type="text/javascript" language="javascript" src="/path/file.js"></script>

.

End the header with this:

</head>
<body>

and then continue with the body page code.

.

It is as simple as that.

lammert

10:32 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can anyone weigh in on what, if any, importance Robot meta code has at this point.

Google and the other major search engines obey the robot tag. Using these tags is often more flexible than using a robots.txt because you can decide on a URL by URL base what a search engine should do with a specific page. Furthermore the robots.txt is cached by some search engines so the version they have in the cache might not be an exact copy of your current robots.txt.

My experience is that Google, MSN, Yahoo and others correctly handle the index/noindex, follow/nofollow and noarchive in the robots meta.

Common combinations:

URL is indexed and cached:
<meta name="robots", content="index,follow">

URL is not indexed, but links on page are spidered:
<meta name="robots", content="noindex,follow">

URL is not indexed and links on the page are not spidered. This is functionally equivalent to adding the URL to your robots.txt:
<meta name="robots", content="noindex,nofollow">

URL is indexed, but without searcher accessible cache:
<meta name="robots", content="index,follow,noarchive">

Pfui

11:37 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From the "Google Information for Webmasters [google.com]" pages, here's [google.com] their specific info regarding Googlebot, robots.txt and meta tags, etc. (Note: Their acceptance of wildcards in robots.txt is atypical.)