|MSNbot ignores meta-robots-tag|
I use this tag quite a lot:
<meta name="robots" content="index,nofollow">
It should prevent spiders from following the links on the page. My logs tell me MSNbot ignores the tag. It's easy to see, because the bot kindly passes the referrer string.
I've notified MS; however no reply yet (well, I guess they're asleep;).
Anyone else notice this?
Well, robots.txt has become the defacto-standard. While some engines still (also) use the robots meta-tag, you should not rely on it. My guess is support for it will slowly fade away.
|Well, robots.txt has become the defacto-standard. While some engines still (also) use the robots meta-tag, you should not rely on it. |
I'm not sure it's that simple. Take a look at Jim Morgan's post, msg #12, in this thread about robots.txt:
This is one of the sections that applies here...
Google finds a robots.txt Disallow for a page, it will remove the page's title and description from its search results. It will also no longer match search terms to the words on that page. So, the page essentially disappears from the Google search results pages. However, if Google finds a link to that page, it will still show that page in results when someone clicks on "More results from <this domain>".
I went around and around with this, trying to find a way to tell them "don't mention my contact forms pages at all, please", and here's what I ended up with:
For Google, don't Disallow the page in robots.txt, but place a <meta name="robots" content="noindex"> tag in the head section of the page itself.
You'll also need to do this for Ask Jeeves/Teoma as well; their handling of robots.txt is the same as Google's.
All the others seem to interpret a robots.txt Disallow as "don't mention this page at all."
Jim goes on to point out why engines prefer the robots.txt... it saves bandwidth, because to see the robots meta tag, the engines have to download the page. I suggest you read his post... it's more precise than this summary here.
Interesting. Basically, both are non-optional sollutions in my view.
Robots.txt indeed says don't fetch this, but that does not imply "don't use or link to this". The problem with meta-tags is that they have to be parsed and not all bots do this (depends on their purpose). Also, they again can be interpreted widely. Nofollow does not imply "do not follow links to an other (sub)domains".
I use a combination of both, but I conlude that I may have to make some adjustments for Google and AJ/T.
|I use a combination of both, but I conlude that I may have to make some adjustments for Google and AJ/T. |
I think the point of Jim's post is that for Google and AJ/T you do need to use both. The robots.txt by itself won't do everything you might hope it will do, as someone reading just your first post could perhaps infer.