homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt standard
count_zer0




msg:1526725
 4:33 pm on Dec 4, 2001 (gmt 0)

Hi all

I am having difficulty creating a robots.txt file and was wondering if anyone could help. I have scoured the resources available at www.robotstxt.org and still don't understand how to format the file to get it to do the following:

I have 3 folders:
/private
/default
/special

1. I want the Altavista spider (scooter) to crawl everything except /special and the Inktomi spider (slurp) to crawl everything except /default.

2. I also want to exclude these 2 spiders from /private.

3. Lastly I want all other spiders to crawl everything except /special and /private

My problem is that I am not sure if the robots.txt file works cumulatively or not. If it does not, I believe the file should look like:

User-agent: scooter
Disallow: /private/
Disallow: /special/

User-agent: Slurp
Disallow: /private/
Disallow: /default/

User-agent: *
Disallow: /private/
Disallow: /special/

If the robots is processed cumulatively, it would look something like this:

User-agent: *
Disallow: /private/
Disallow: /special/

User-agent: scooter
Disallow: /special/

User-agent: Slurp
Disallow: /default/

Is anyone an expert on this? Which one should I use, if either of them are correct? I really want to get this right first time so I don't have to wait for the spiders to come round again...

TIA

 

Son_House




msg:1526726
 3:17 am on Dec 7, 2001 (gmt 0)

I would use your first example. I believe by using the User-agent names of scooter and slurp those bots will follow what is specifically written for them and disregard anything written for User-agent: *

ssn5054




msg:1526727
 6:40 am on Dec 7, 2001 (gmt 0)

I would go with the first one too. The robots file is checked by the spider for anything that pertains to it, all other permissions are ignored.

count_zer0




msg:1526728
 11:21 am on Dec 7, 2001 (gmt 0)

Thanks guys, I'll go with that.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved