Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
In robots.txt files in Joomla...by default I see that images directory is NOT allowed to be indexed...but below it , there's a "stories" sub-directory which contains lots of graphic images.So would Google index "stories" subdirectory ? or because a parent directory (images) is marked as not allowed to be indexed, then the robots would NOT index any subdirectories underneath it ?
that's how it appears in Joomla's default robots.txt file.
So I added also disallow: /images/stories
BTW, do robots mind the spaces? I mean in the original default Joomla robots.txt file there're no spaces...like:
I changed that to:
User-agent: * Disallow:.......
Disallow: /cache/Disallow: /components/Disallow:..........
You see, disallow has no space with a previous word...
Is it OK?
I get an error in Googlebot:
In Webmaster tools...:
Line 16: / Syntax not understood
Text of http://example.com/robots.txt
As you see...I disallowed a subdirectory after disallowing a parent directory...
and after I used robots.txt generator, I got this:
That doesn't make any sense to me...why some directories are disallowed and some are allowed-disallowed ? Or...maybe it means that the default on the site is to allow everything, except what's disallowed...but then why that allow command stands in between blocks and not right in the beginning?
[edited by: goodroi at 12:54 pm (utc) on July 30, 2009]
[edit reason] Please no urls [/edit]
If a directory is Disallowed, then all of its subdirectories are disallowed. And to be more specific, it any URL path-part is Disallowed, then all URL-paths beginning with that path-part are disallowed; Robots.txt handling is based on prefix-matching.
You have three major solution options available:
1) If possible, use a on-page <meta name="robots" content="noindex,follow"> instead of Disallowing the top-level directory. This only works if all objects to be disallowed in that directory are HTML pages.
2) Move the allowed directory out from under any Disallowed directory. This is the better long-term solution, and works for all robots.
3) For Google and other major robots which explicitly state that they recognize it, use the new "Allow:" extension to the robots.txt protocol, and also provide a separate policy record for those robots which do not claim support for it. (Obviously, this means that either those robots will never be able to access the "allowed" directory below the Disallowed directory, or that you cannot Disallow the top-level directory to these robots.)