Msg#: 3787816 posted 10:46 am on Nov 17, 2008 (gmt 0)
Right this is probably a stupid question but one of those which I couldn't find the answer to...
I've got a bunch of automatically generated sitemaps being put in /wiki/sitemaps/. The problem is that /wiki/ is not a content directory; rather, the content from its scripts is presented to a virtual dir /kb/. /wiki/ is then disallowed from robots.txt to keep things clean.
Will Google et al access and read the sitemaps in the /wiki/sitemaps/ dir?
If not, should I use Allow: on the subdir, or move the files somewhere else?
Msg#: 3787816 posted 8:23 pm on Jan 1, 2009 (gmt 0)
have you ever come accross a legitimate bot that doesn't support "Allow"?
Sure, how 'bout archive.org's ia_archiver to name just one?
It's a comparatively recent development that the major bots started to support Allow directives. Until Google started to recognize Allow and implemented wildcards in pattern matching (and the other majors followed), maybe about two years ago, the *only* supported directive was Disallow. There are still legitimate bots out there that don't recognize Allow directives and wildcard pattern matching. Use either, but then don't be surprised if your "blocked" content makes it into an index somewhere.
Msg#: 3787816 posted 8:45 pm on Jan 1, 2009 (gmt 0)
thanks, I wasn't aware of that. I had never really thought about it and when I finally had the opportunity to use Allow, I only checked google and the other majors and it was fine. Though I probably also didn't get in any trouble because we usually Disallow everyone and then specifically Allow the big ones in, saving on traffic for obsolete search engines that never get us any traffic (at least in the german market, that may be different in the rest of the world where google is not above 90% market share).