Forum Moderators: open

Message Too Old, No Replies

Robot.txt

Only allow one bot

         

chris_f

8:54 am on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

Is there any easy way to allow only one bot to crawl part of your site.

Example.

Domain.com wants every bot to crawl DirectoryA of their site but only BotB to be able to crawl DirectoryB. How would I go about this?

Thanks.
Chris.

agerhart

1:04 pm on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe it would go like this:

# (Alta Vista)
User-agent: Scooter
Disallow: Direcory B

# (Google) (bot B)
User-agent: Googlebot

If you want a spider or bot to only have access to one directory, disallow the others from that directory.

Stickymail me for a live example.

Doofus

4:52 pm on Apr 23, 2002 (gmt 0)



> Is there any easy way to allow only one bot to crawl part of your site.

Easy, yes. Foolproof, no. Your example shows the limitations of robots.txt (be sure you don't call it "robot.txt").

I'd recommend this:

User-agent: BotB
Disallow: [nothing after the disallow]

User-agent: *
Disallow: /DirectoryB/

The bots are supposed to take the first User-agent that applies. BotB should take the first two lines as definitive and not look further. Having nothing after the Disallow: is supposed to mean that everything is permitted.

All other bots would fall through the first two lines and get disallowed.

The real question is whether BotB will interpret the Disallow: with nothing after it in the manner you want. You run the risk that BotB will equate "nothing" with a single slash, and go away entirely. If you use this, you have to watch closely to make sure this isn't happening.