Welcome to WebmasterWorld Guest from 54.198.69.193

Forum Moderators: goodroi

Message Too Old, No Replies

Using Allow: / in robots.txt

Google says to use it?

     
5:48 pm on May 19, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I have never heard of Allow: / being used in the robots.txt, but the google help pages say to use it.

[google.com...]

11. How do I block all crawlers except Googlebot from my site?

The following robots.txt file will achieve this for all well-behaved crawlers.

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

5:56 pm on May 19, 2003 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'd be somewhat hesitant using the Allow directive as I don't think it is supported across the board. Also, the robots.txt file validators that I've checked return this error when including the Allow directive...

Invalid fieldname. There is no Allow.

There has been some discussion on this for quite some time amongst those who govern the robots.txt protocol. As far as I know, it has not been implemented yet.

The default behavior of the robots.txt file is to allow all unless of course you have a Disallow for that resource.

User-agent: Googlebot
Disallow: /

P.S. Hehehe, I've only had to use that directive once! ;)

6:03 pm on May 19, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yeah I thought it was kind of strange that they had it on there. Everything I have seen mentioning robots.txt says it does not exist. Perhaps google has implemented the use of it?

I have no reason to use the allow, just thought I would point it out.

6:09 pm on May 19, 2003 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's a good catch! Google seems to always be ahead of the pack in implementing search engine specific directives. In this case, I would not place an Allow directive in my robots.txt file since the default behavior is to Allow.

Now, when I can get a valid robots.txt file using the Allow directive, I'll consider reformatting. Until the authoritative resource on the robots.txt file protocol states that Allow is now supported, I think it is best to follow the current standard.

3:28 am on May 20, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Google supports several kinds of extensions to the Standard for Robots Exclusion. Some of them may be life-savers under certain circumstances - making a daunting job trivial in some cases. For example, their support of wildcard filename-matching, in addition to simple (standard) prefix-matching might come in very handy under certain circumstances.

However, I would never use any of these extensions except in an exclusive User-agent: Googlebot record.

There is simply no telling what any other robot might do with those Google-specific extensions!

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month