Forum Moderators: goodroi

Message Too Old, No Replies

domo arigato

mr. roboto

         

ricnut

11:09 pm on Jan 20, 2007 (gmt 0)

10+ Year Member



I want to know if by placing an "allow" command, in the robots.txt file; if that will, by default; dissallow everything else. It seems only logical that it would, however I really haven't the slightest.

e.g.
--------------------
User-agent: *
allow: index.html
--------------------

jdMorgan

11:41 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No it won't. You'll need a "Disallow: /" ahead of that Allow to make it work as desired.

Furthermore, "Allow" is only supported by Google and Yahoo!, and possibly a few other search engines, and so cannot be used with "User-agent: *".

Google is promoting their Webmaster Tools, and so has made somewhat of a fragmented mess of their robots.txt information pages, but this page [google.com] describes their proprietary "Allow" directive.

The similar page at Yahoo! is here [help.yahoo.com], showing that they support that directive as well.

MSN/Windows Live Search does not support "Allow" -- or at least, it's not mentioned on their robots.txt page here [search.msn.com].

Be sure to check the robots.txt-related pages at all of the search engines that you are concerned with -- It's probably fair to say that none of them support the same feature set as any other. If you assume that they work the same with any particular directive and get it wrong, then you could make a mess of your search rankings.

Jim

phranque

11:52 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Everything not explicitly disallowed is considered fair game to retrieve.

if you want to allow only index.html for all bots:

User-agent: *
Disallow: /
allow: index.html

here are a couple of important references.
web server admin quide:
[robotstxt.org...]
the protocol:
[robotstxt.org...]