|No Allow in Robots.txt|
Thanks to Mark Jackson, [searchenginewatch.com ] who clear a big query about robots.txt.
|There's no "/allow" command in the robots.txt file, so there's no need to add it to the robots.txt file. |
Most of the SEO experts make thise mistake of adding Allow in robots.txt
Google supports the nonstandard 'Allow'.
However, using it is risky as some bots based on older robots.txt libraries will interp it as a "disallow"..
|Most of the SEO experts make thise mistake of adding Allow in robots.txt |
I don't know about most experts elsewhere, but site search on WebmasterWorld suggests that "expert" opinion here cautions against using "allow" unless you're very careful about it.
Here are two threads, one old, one recent, with some comments on the topic that are worth reading....
Using Allow: / in robots.txt
Google says to use it?
May , 2003
|Google supports several kinds of extensions to the Standard for Robots Exclusion. Some of them may be life-savers under certain circumstances - making a daunting job trivial in some cases. For example, their support of wildcard filename-matching, in addition to simple (standard) prefix-matching might come in very handy under certain circumstances. |
However, I would never use any of these extensions except in an exclusive User-agent: Googlebot record.
There is simply no telling what any other robot might do with those Google-specific extensions!
Lost all rankings from Google - due to robots.txt
|As this is the "Robots Exclusion Protocol" everything hinges on this being a disallow list. |
|...Even though both Bing and Google say they now support a few extensions to the standard syntax, the actual current standard is explained here: [robotstxt.org...] |
...and here is Google's Help page: [support.google.com...] If you start blocking some URLs or URL patterns, the details Google provides can become important for getting the exact results that you intended.
Means post published at SEW gives wrong information, after reading this Google support page I find we can use Allow in robots.txt
the more egregious error in that article is discussing robots.txt as a method of controlling indexing for a site when it is actually used to control crawling, not indexing.
and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound.
|and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound. |
Allow just makes it easier to punch small holes in the firewall, albeit non-standard ones. Besides, Google added a lot of extra crap to robots.txt that many don't support. Needs to be standardized and to this day, best I know, it's not so the whole thing is moot really.
Without a script to back up enforcing robots.txt it's quite useless really.
Google itself use Allow in robots.txt [google.com ]
Use that only with Googlebot user agent.
When you have section for Googlebot, google uses only that section of the robots.txt file.