Welcome to WebmasterWorld Guest from 54.147.220.66

Message Too Old, No Replies

No Allow in Robots.txt

   
7:28 am on Jun 18, 2012 (gmt 0)



Thanks to Mark Jackson, [searchenginewatch.com ] who clear a big query about robots.txt.

There's no "/allow" command in the robots.txt file, so there's no need to add it to the robots.txt file.


Most of the SEO experts make thise mistake of adding Allow in robots.txt
2:43 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Google supports the nonstandard 'Allow'.

However, using it is risky as some bots based on older robots.txt libraries will interp it as a "disallow"..
[tools.seobook.com...]
8:59 pm on Jun 18, 2012 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Most of the SEO experts make thise mistake of adding Allow in robots.txt

I don't know about most experts elsewhere, but site search on WebmasterWorld suggests that "expert" opinion here cautions against using "allow" unless you're very careful about it.

Here are two threads, one old, one recent, with some comments on the topic that are worth reading....

Using Allow: / in robots.txt
Google says to use it?
May , 2003
http://www.webmasterworld.com/forum93/15.htm [webmasterworld.com]

Google supports several kinds of extensions to the Standard for Robots Exclusion. Some of them may be life-savers under certain circumstances - making a daunting job trivial in some cases. For example, their support of wildcard filename-matching, in addition to simple (standard) prefix-matching might come in very handy under certain circumstances.

However, I would never use any of these extensions except in an exclusive User-agent: Googlebot record.

There is simply no telling what any other robot might do with those Google-specific extensions!


Lost all rankings from Google - due to robots.txt
June, 2012
http://www.webmasterworld.com/google/4463765.htm [webmasterworld.com]

As this is the "Robots Exclusion Protocol" everything hinges on this being a disallow list.

...Even though both Bing and Google say they now support a few extensions to the standard syntax, the actual current standard is explained here: [robotstxt.org...]

...and here is Google's Help page: [support.google.com...] If you start blocking some URLs or URL patterns, the details Google provides can become important for getting the exact results that you intended.
4:41 am on Jun 19, 2012 (gmt 0)



Means post published at SEW gives wrong information, after reading this Google support page I find we can use Allow in robots.txt
[support.google.com ]
7:16 am on Jun 19, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the more egregious error in that article is discussing robots.txt as a method of controlling indexing for a site when it is actually used to control crawling, not indexing.

and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound.
11:31 pm on Jun 20, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



and the fact that it is actually a robots exclusion protocol makes google's "Allow:" extension fundamentally unsound.


Allow just makes it easier to punch small holes in the firewall, albeit non-standard ones. Besides, Google added a lot of extra crap to robots.txt that many don't support. Needs to be standardized and to this day, best I know, it's not so the whole thing is moot really.

Without a script to back up enforcing robots.txt it's quite useless really.
9:56 am on Jun 26, 2012 (gmt 0)



Google itself use Allow in robots.txt [google.com ]
12:58 pm on Jun 26, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Use that only with Googlebot user agent.

When you have section for Googlebot, google uses only that section of the robots.txt file.