homepage Welcome to WebmasterWorld Guest from 23.22.2.150
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Only those on the list are allowed entrance
All others OUT
grnidone



 
Msg#: 218 posted 11:58 pm on Dec 14, 2003 (gmt 0)

Is there a way to write a robots.txt file to say

"If your robot's name is not listed in the list below, then you cannot crawl my site?"

Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.

 

oilman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 218 posted 12:21 am on Dec 15, 2003 (gmt 0)

robots.txt is an exclusionary protocal in that you have to exactly list who and what you don't want access too - not the other way around. I'm not an htaccess wizard but I'm certain you can do what you are asking with htaccess instead of robots.txt.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 218 posted 12:40 am on Dec 15, 2003 (gmt 0)

The 'cooperation' of the robot with robots.txt is voluntary. For those that do obey, yes, you can construct your robots.txt to list those that you wish to allow, and deny the rest. As oilman says, the rest have to be handled with mod_rewrite on Apache or ISAPI filters on Windows servers.

An allow list construct in robots.txt would look like this:

User-agent: Googlebot
User-agent: Slurp
Disallow: /cgi-bin
Disallow: /devel

User-agent: *
Disallow: /

This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.

I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt

Jim

Krapulator

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 218 posted 3:47 am on Dec 16, 2003 (gmt 0)

>>I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard.

Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D

TechMentaL

10+ Year Member



 
Msg#: 218 posted 7:10 am on Dec 22, 2003 (gmt 0)

hello,

kinda new at this robots.txt but my deadline doesnt have to know that...(!)

which web robots should i disallow and why....(?)

thanx

Essex_boy

WebmasterWorld Senior Member essex_boy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 218 posted 3:11 pm on Dec 24, 2003 (gmt 0)

Bit odd this one, why when your aim is complete exposure on the web, would anyone want to disallow a bot entirly?

Bit odd to me.....

dwilson

10+ Year Member



 
Msg#: 218 posted 5:29 pm on Jan 2, 2004 (gmt 0)

Why completely block a spider?

I've thought about blocking Baidu b/c they're only in Chinese & my site is only in English. I don't see the point in allowing them to use my bandwidth when their users will probably never visit my site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved