Welcome to WebmasterWorld Guest from 54.205.60.49

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt standard

   
4:33 pm on Dec 4, 2001 (gmt 0)

10+ Year Member



Hi all

I am having difficulty creating a robots.txt file and was wondering if anyone could help. I have scoured the resources available at www.robotstxt.org and still don't understand how to format the file to get it to do the following:

I have 3 folders:
/private
/default
/special

1. I want the Altavista spider (scooter) to crawl everything except /special and the Inktomi spider (slurp) to crawl everything except /default.

2. I also want to exclude these 2 spiders from /private.

3. Lastly I want all other spiders to crawl everything except /special and /private

My problem is that I am not sure if the robots.txt file works cumulatively or not. If it does not, I believe the file should look like:

User-agent: scooter
Disallow: /private/
Disallow: /special/

User-agent: Slurp
Disallow: /private/
Disallow: /default/

User-agent: *
Disallow: /private/
Disallow: /special/

If the robots is processed cumulatively, it would look something like this:

User-agent: *
Disallow: /private/
Disallow: /special/

User-agent: scooter
Disallow: /special/

User-agent: Slurp
Disallow: /default/

Is anyone an expert on this? Which one should I use, if either of them are correct? I really want to get this right first time so I don't have to wait for the spiders to come round again...

TIA

3:17 am on Dec 7, 2001 (gmt 0)

10+ Year Member



I would use your first example. I believe by using the User-agent names of scooter and slurp those bots will follow what is specifically written for them and disregard anything written for User-agent: *
6:40 am on Dec 7, 2001 (gmt 0)

10+ Year Member



I would go with the first one too. The robots file is checked by the spider for anything that pertains to it, all other permissions are ignored.
11:21 am on Dec 7, 2001 (gmt 0)

10+ Year Member



Thanks guys, I'll go with that.