robots.txt and robots meta tag not mutually exclusive?

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and robots meta tag not mutually exclusive?

robots.txt robots meta tag

Check6

7:33 pm on Jan 23, 2003 (gmt 0)

I am right in thinking aren't I that robots.txt and the robots meta tags can work together? So you can have:

User-agent: abbabot
Disallow:

User-agent: *
Disallow: /

and then control what `abbabot` can and cannot index via meta tags, which would be these primarily:

There are reasons I don't want to use robots.txt, the site in question adds new sections/channels on a daily basis, many of which we don't want spidered, some we do, whilst the URL structure is such that even with using say the Google extensions to robots.txt we'd end up with a very, very large robots.txt file which would be unmanageable on a daily basis. With this we can use the CMS (Content Management System) to control the state of meta tags when the pages are created. Well it sounds like a plan anyway...

I see that at some point the W3C discussed putting user agents in the meta tag standard but didn't...

DaveAtIFG

7:51 pm on Jan 23, 2003 (gmt 0)

Most well behaved bots will honor robots.txt and some bots/SEs (including Googlebot) will honor "robots" meta tags. I usually use both to control access to specific pages but I build pages with all SEs in mind.

jdMorgan

10:09 pm on Jan 23, 2003 (gmt 0)

A very interesting effect with Google and AJ (and maybe others) is that they will list a link to a page, even if it is disallowed in robots.txt. The link is listed with no title and no page description - as you might expect, since you have told the robot not to fetch the page. However, if Googlebot or AJ find a link anywhere on the web, they will list the link in their SERPs if it is sufficiently relevant to the search terms.

The only way I have found to tell Gbot and AJ, "Please don't mention this URL at all" is to not disallow the page in robots.txt, but rather disallow it only using the on-page robots meta tag. It's the only way I've found to make "semi-private pages" stay that way.

Jim

Check6

1:35 pm on Jan 24, 2003 (gmt 0)

Thanks for that!

Yes, I've seen deep links to our sites - which don't allow robots in - appear in Google. The last one I tracked back came from a Googlebot index of the drudge report.