Welcome to WebmasterWorld Guest from 54.234.38.8

Forum Moderators: goodroi

Message Too Old, No Replies

Only those on the list are allowed entrance

All others OUT

     
11:58 pm on Dec 14, 2003 (gmt 0)

Senior Member

joined:June 27, 2000
posts:1548
votes: 0


Is there a way to write a robots.txt file to say

"If your robot's name is not listed in the list below, then you cannot crawl my site?"

Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.

12:21 am on Dec 15, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 10, 2000
posts:2151
votes: 0


robots.txt is an exclusionary protocal in that you have to exactly list who and what you don't want access too - not the other way around. I'm not an htaccess wizard but I'm certain you can do what you are asking with htaccess instead of robots.txt.
12:40 am on Dec 15, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


The 'cooperation' of the robot with robots.txt is voluntary. For those that do obey, yes, you can construct your robots.txt to list those that you wish to allow, and deny the rest. As oilman says, the rest have to be handled with mod_rewrite on Apache or ISAPI filters on Windows servers.

An allow list construct in robots.txt would look like this:


User-agent: Googlebot
User-agent: Slurp
Disallow: /cgi-bin
Disallow: /devel

User-agent: *
Disallow: /

This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.

I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt

Jim

3:47 am on Dec 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2002
posts:666
votes: 0


>>I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard.

Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D

TechMentaL

7:10 am on Dec 22, 2003 (gmt 0)

Inactive Member
Account Expired

 
 


hello,

kinda new at this robots.txt but my deadline doesnt have to know that...(!)

which web robots should i disallow and why....(?)

thanx

3:11 pm on Dec 24, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member essex_boy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 19, 2003
posts:3171
votes: 2


Bit odd this one, why when your aim is complete exposure on the web, would anyone want to disallow a bot entirly?

Bit odd to me.....

5:29 pm on Jan 2, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Mar 3, 2003
posts:306
votes: 0


Why completely block a spider?

I've thought about blocking Baidu b/c they're only in Chinese & my site is only in English. I don't see the point in allowing them to use my bandwidth when their users will probably never visit my site.