homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MSNbot ignores meta-robots-tag
RonPK




msg:1534176
 10:20 am on Jun 21, 2003 (gmt 0)

I use this tag quite a lot:
<meta name="robots" content="index,nofollow">

It should prevent spiders from following the links on the page. My logs tell me MSNbot ignores the tag. It's easy to see, because the bot kindly passes the referrer string.

I've notified MS; however no reply yet (well, I guess they're asleep;).

Anyone else notice this?

 

Orange_XL




msg:1534177
 11:51 am on Jun 21, 2003 (gmt 0)

Well, robots.txt has become the defacto-standard. While some engines still (also) use the robots meta-tag, you should not rely on it. My guess is support for it will slowly fade away.

Robert Charlton




msg:1534178
 4:47 am on Jun 22, 2003 (gmt 0)

Well, robots.txt has become the defacto-standard. While some engines still (also) use the robots meta-tag, you should not rely on it.

I'm not sure it's that simple. Take a look at Jim Morgan's post, msg #12, in this thread about robots.txt:

[webmasterworld.com...]

This is one of the sections that applies here...


Google finds a robots.txt Disallow for a page, it will remove the page's title and description from its search results. It will also no longer match search terms to the words on that page. So, the page essentially disappears from the Google search results pages. However, if Google finds a link to that page, it will still show that page in results when someone clicks on "More results from <this domain>".

I went around and around with this, trying to find a way to tell them "don't mention my contact forms pages at all, please", and here's what I ended up with:
For Google, don't Disallow the page in robots.txt, but place a <meta name="robots" content="noindex"> tag in the head section of the page itself.

You'll also need to do this for Ask Jeeves/Teoma as well; their handling of robots.txt is the same as Google's.
All the others seem to interpret a robots.txt Disallow as "don't mention this page at all."

Jim goes on to point out why engines prefer the robots.txt... it saves bandwidth, because to see the robots meta tag, the engines have to download the page. I suggest you read his post... it's more precise than this summary here.

Orange_XL




msg:1534179
 11:17 pm on Jun 22, 2003 (gmt 0)

Interesting. Basically, both are non-optional sollutions in my view.
Robots.txt indeed says don't fetch this, but that does not imply "don't use or link to this". The problem with meta-tags is that they have to be parsed and not all bots do this (depends on their purpose). Also, they again can be interpreted widely. Nofollow does not imply "do not follow links to an other (sub)domains".

I use a combination of both, but I conlude that I may have to make some adjustments for Google and AJ/T.

Robert Charlton




msg:1534180
 12:28 am on Jun 23, 2003 (gmt 0)

I use a combination of both, but I conlude that I may have to make some adjustments for Google and AJ/T.

I think the point of Jim's post is that for Google and AJ/T you do need to use both. The robots.txt by itself won't do everything you might hope it will do, as someone reading just your first post could perhaps infer.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved