homepage Welcome to WebmasterWorld Guest from 54.166.100.8
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Using Allow: / in robots.txt
Google says to use it?
werty




msg:1527284
 5:48 pm on May 19, 2003 (gmt 0)

I have never heard of Allow: / being used in the robots.txt, but the google help pages say to use it.

[google.com...]

11. How do I block all crawlers except Googlebot from my site?

The following robots.txt file will achieve this for all well-behaved crawlers.

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

 

pageoneresults




msg:1527285
 5:56 pm on May 19, 2003 (gmt 0)

I'd be somewhat hesitant using the Allow directive as I don't think it is supported across the board. Also, the robots.txt file validators that I've checked return this error when including the Allow directive...

Invalid fieldname. There is no Allow.

There has been some discussion on this for quite some time amongst those who govern the robots.txt protocol. As far as I know, it has not been implemented yet.

The default behavior of the robots.txt file is to allow all unless of course you have a Disallow for that resource.

User-agent: Googlebot
Disallow: /

P.S. Hehehe, I've only had to use that directive once! ;)

werty




msg:1527286
 6:03 pm on May 19, 2003 (gmt 0)

Yeah I thought it was kind of strange that they had it on there. Everything I have seen mentioning robots.txt says it does not exist. Perhaps google has implemented the use of it?

I have no reason to use the allow, just thought I would point it out.

pageoneresults




msg:1527287
 6:09 pm on May 19, 2003 (gmt 0)

It's a good catch! Google seems to always be ahead of the pack in implementing search engine specific directives. In this case, I would not place an Allow directive in my robots.txt file since the default behavior is to Allow.

Now, when I can get a valid robots.txt file using the Allow directive, I'll consider reformatting. Until the authoritative resource on the robots.txt file protocol states that Allow is now supported, I think it is best to follow the current standard.

jdMorgan




msg:1527288
 3:28 am on May 20, 2003 (gmt 0)

Google supports several kinds of extensions to the Standard for Robots Exclusion. Some of them may be life-savers under certain circumstances - making a daunting job trivial in some cases. For example, their support of wildcard filename-matching, in addition to simple (standard) prefix-matching might come in very handy under certain circumstances.

However, I would never use any of these extensions except in an exclusive User-agent: Googlebot record.

There is simply no telling what any other robot might do with those Google-specific extensions!

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved