homepage Welcome to WebmasterWorld Guest from 54.204.141.129
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Only those on the list are allowed entrance
All others OUT
grnidone




msg:1529211
 11:58 pm on Dec 14, 2003 (gmt 0)

Is there a way to write a robots.txt file to say

"If your robot's name is not listed in the list below, then you cannot crawl my site?"

Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.

 

oilman




msg:1529212
 12:21 am on Dec 15, 2003 (gmt 0)

robots.txt is an exclusionary protocal in that you have to exactly list who and what you don't want access too - not the other way around. I'm not an htaccess wizard but I'm certain you can do what you are asking with htaccess instead of robots.txt.

jdMorgan




msg:1529213
 12:40 am on Dec 15, 2003 (gmt 0)

The 'cooperation' of the robot with robots.txt is voluntary. For those that do obey, yes, you can construct your robots.txt to list those that you wish to allow, and deny the rest. As oilman says, the rest have to be handled with mod_rewrite on Apache or ISAPI filters on Windows servers.

An allow list construct in robots.txt would look like this:

User-agent: Googlebot
User-agent: Slurp
Disallow: /cgi-bin
Disallow: /devel

User-agent: *
Disallow: /

This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.

I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt

Jim

Krapulator




msg:1529214
 3:47 am on Dec 16, 2003 (gmt 0)

>>I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard.

Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D

TechMentaL




msg:1529215
 7:10 am on Dec 22, 2003 (gmt 0)

hello,

kinda new at this robots.txt but my deadline doesnt have to know that...(!)

which web robots should i disallow and why....(?)

thanx

Essex_boy




msg:1529216
 3:11 pm on Dec 24, 2003 (gmt 0)

Bit odd this one, why when your aim is complete exposure on the web, would anyone want to disallow a bot entirly?

Bit odd to me.....

dwilson




msg:1529217
 5:29 pm on Jan 2, 2004 (gmt 0)

Why completely block a spider?

I've thought about blocking Baidu b/c they're only in Chinese & my site is only in English. I don't see the point in allowing them to use my bandwidth when their users will probably never visit my site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved