homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

robots.txt and robots meta tag not mutually exclusive?
robots.txt robots meta tag

10+ Year Member

Msg#: 148 posted 7:33 pm on Jan 23, 2003 (gmt 0)

I am right in thinking aren't I that robots.txt and the robots meta tags can work together? So you can have:

User-agent: abbabot

User-agent: *
Disallow: /

and then control what `abbabot` can and cannot index via meta tags, which would be these primarily:

<META name="ROBOTS" content="all">
<META name="ROBOTS" content="none">

There are reasons I don't want to use robots.txt, the site in question adds new sections/channels on a daily basis, many of which we don't want spidered, some we do, whilst the URL structure is such that even with using say the Google extensions to robots.txt we'd end up with a very, very large robots.txt file which would be unmanageable on a daily basis. With this we can use the CMS (Content Management System) to control the state of meta tags when the pages are created. Well it sounds like a plan anyway...

I see that at some point the W3C discussed putting user agents in the meta tag standard but didn't...



WebmasterWorld Senior Member 10+ Year Member

Msg#: 148 posted 7:51 pm on Jan 23, 2003 (gmt 0)

Most well behaved bots will honor robots.txt and some bots/SEs (including Googlebot) will honor "robots" meta tags. I usually use both to control access to specific pages but I build pages with all SEs in mind.


WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 148 posted 10:09 pm on Jan 23, 2003 (gmt 0)

A very interesting effect with Google and AJ (and maybe others) is that they will list a link to a page, even if it is disallowed in robots.txt. The link is listed with no title and no page description - as you might expect, since you have told the robot not to fetch the page. However, if Googlebot or AJ find a link anywhere on the web, they will list the link in their SERPs if it is sufficiently relevant to the search terms.

The only way I have found to tell Gbot and AJ, "Please don't mention this URL at all" is to not disallow the page in robots.txt, but rather disallow it only using the on-page robots meta tag. It's the only way I've found to make "semi-private pages" stay that way.



10+ Year Member

Msg#: 148 posted 1:35 pm on Jan 24, 2003 (gmt 0)

Thanks for that!

Yes, I've seen deep links to our sites - which don't allow robots in - appear in Google. The last one I tracked back came from a Googlebot index of the drudge report.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved