Welcome to WebmasterWorld Guest from 54.163.40.152

Forum Moderators: goodroi

Message Too Old, No Replies

Only those on the list are allowed entrance

All others OUT

     

grnidone

11:58 pm on Dec 14, 2003 (gmt 0)



Is there a way to write a robots.txt file to say

"If your robot's name is not listed in the list below, then you cannot crawl my site?"

Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.

oilman

12:21 am on Dec 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



robots.txt is an exclusionary protocal in that you have to exactly list who and what you don't want access too - not the other way around. I'm not an htaccess wizard but I'm certain you can do what you are asking with htaccess instead of robots.txt.

jdMorgan

12:40 am on Dec 15, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The 'cooperation' of the robot with robots.txt is voluntary. For those that do obey, yes, you can construct your robots.txt to list those that you wish to allow, and deny the rest. As oilman says, the rest have to be handled with mod_rewrite on Apache or ISAPI filters on Windows servers.

An allow list construct in robots.txt would look like this:


User-agent: Googlebot
User-agent: Slurp
Disallow: /cgi-bin
Disallow: /devel

User-agent: *
Disallow: /

This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.

I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt

Jim

Krapulator

3:47 am on Dec 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard.

Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D

TechMentaL

7:10 am on Dec 22, 2003 (gmt 0)

10+ Year Member



hello,

kinda new at this robots.txt but my deadline doesnt have to know that...(!)

which web robots should i disallow and why....(?)

thanx

Essex_boy

3:11 pm on Dec 24, 2003 (gmt 0)

WebmasterWorld Senior Member essex_boy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Bit odd this one, why when your aim is complete exposure on the web, would anyone want to disallow a bot entirly?

Bit odd to me.....

dwilson

5:29 pm on Jan 2, 2004 (gmt 0)

10+ Year Member



Why completely block a spider?

I've thought about blocking Baidu b/c they're only in Chinese & my site is only in English. I don't see the point in allowing them to use my bandwidth when their users will probably never visit my site.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month