Welcome to WebmasterWorld Guest from 54.166.191.159

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and robots meta tag not mutually exclusive?

robots.txt robots meta tag

     
7:33 pm on Jan 23, 2003 (gmt 0)

10+ Year Member



I am right in thinking aren't I that robots.txt and the robots meta tags can work together? So you can have:

User-agent: abbabot
Disallow:

User-agent: *
Disallow: /

and then control what `abbabot` can and cannot index via meta tags, which would be these primarily:

<META name="ROBOTS" content="all">
<META name="ROBOTS" content="none">

There are reasons I don't want to use robots.txt, the site in question adds new sections/channels on a daily basis, many of which we don't want spidered, some we do, whilst the URL structure is such that even with using say the Google extensions to robots.txt we'd end up with a very, very large robots.txt file which would be unmanageable on a daily basis. With this we can use the CMS (Content Management System) to control the state of meta tags when the pages are created. Well it sounds like a plan anyway...

I see that at some point the W3C discussed putting user agents in the meta tag standard but didn't...

7:51 pm on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most well behaved bots will honor robots.txt and some bots/SEs (including Googlebot) will honor "robots" meta tags. I usually use both to control access to specific pages but I build pages with all SEs in mind.
10:09 pm on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A very interesting effect with Google and AJ (and maybe others) is that they will list a link to a page, even if it is disallowed in robots.txt. The link is listed with no title and no page description - as you might expect, since you have told the robot not to fetch the page. However, if Googlebot or AJ find a link anywhere on the web, they will list the link in their SERPs if it is sufficiently relevant to the search terms.

The only way I have found to tell Gbot and AJ, "Please don't mention this URL at all" is to not disallow the page in robots.txt, but rather disallow it only using the on-page robots meta tag. It's the only way I've found to make "semi-private pages" stay that way.

Jim

1:35 pm on Jan 24, 2003 (gmt 0)

10+ Year Member



Thanks for that!

Yes, I've seen deep links to our sites - which don't allow robots in - appear in Google. The last one I tracked back came from a Googlebot index of the drudge report.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month