Page is a not externally linkable
jdMorgan - 2:39 pm on Jul 29, 2009 (gmt 0)
If a directory is Disallowed, then all of its subdirectories are disallowed. And to be more specific, it any URL path-part is Disallowed, then all URL-paths beginning with that path-part are disallowed; Robots.txt handling is based on prefix-matching. You have three major solution options available: 1) If possible, use a on-page <meta name="robots" content="noindex,follow"> instead of Disallowing the top-level directory. This only works if all objects to be disallowed in that directory are HTML pages. 2) Move the allowed directory out from under any Disallowed directory. This is the better long-term solution, and works for all robots. 3) For Google and other major robots which explicitly state that they recognize it, use the new "Allow:" extension to the robots.txt protocol, and also provide a separate policy record for those robots which do not claim support for it. (Obviously, this means that either those robots will never be able to access the "allowed" directory below the Disallowed directory, or that you cannot Disallow the top-level directory to these robots.) Jim
It would be a very good idea for you to read the "Standard for Robot Exclusion [robotstxt.org]" rather than trying to guess at robots.txt syntax or functions.