| 2:43 pm on Jun 18, 2012 (gmt 0)|
Google supports the nonstandard 'Allow'.
However, using it is risky as some bots based on older robots.txt libraries will interp it as a "disallow"..
| 8:59 pm on Jun 18, 2012 (gmt 0)|
|Most of the SEO experts make thise mistake of adding Allow in robots.txt |
I don't know about most experts elsewhere, but site search on WebmasterWorld suggests that "expert" opinion here cautions against using "allow" unless you're very careful about it.
Here are two threads, one old, one recent, with some comments on the topic that are worth reading....
Using Allow: / in robots.txt
Google says to use it?
May , 2003
|Google supports several kinds of extensions to the Standard for Robots Exclusion. Some of them may be life-savers under certain circumstances - making a daunting job trivial in some cases. For example, their support of wildcard filename-matching, in addition to simple (standard) prefix-matching might come in very handy under certain circumstances. |
However, I would never use any of these extensions except in an exclusive User-agent: Googlebot record.
There is simply no telling what any other robot might do with those Google-specific extensions!
Lost all rankings from Google - due to robots.txt
|As this is the "Robots Exclusion Protocol" everything hinges on this being a disallow list. |
|...Even though both Bing and Google say they now support a few extensions to the standard syntax, the actual current standard is explained here: [robotstxt.org...] |
...and here is Google's Help page: [support.google.com...] If you start blocking some URLs or URL patterns, the details Google provides can become important for getting the exact results that you intended.
| 4:41 am on Jun 19, 2012 (gmt 0)|
Means post published at SEW gives wrong information, after reading this Google support page I find we can use Allow in robots.txt
| 7:16 am on Jun 19, 2012 (gmt 0)|
the more egregious error in that article is discussing robots.txt as a method of controlling indexing for a site when it is actually used to control crawling, not indexing.
and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound.
| 11:31 pm on Jun 20, 2012 (gmt 0)|
|and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound. |
Allow just makes it easier to punch small holes in the firewall, albeit non-standard ones. Besides, Google added a lot of extra crap to robots.txt that many don't support. Needs to be standardized and to this day, best I know, it's not so the whole thing is moot really.
Without a script to back up enforcing robots.txt it's quite useless really.
| 9:56 am on Jun 26, 2012 (gmt 0)|
Google itself use Allow in robots.txt [google.com ]
| 12:58 pm on Jun 26, 2012 (gmt 0)|
Use that only with Googlebot user agent.
When you have section for Googlebot, google uses only that section of the robots.txt file.