homepage Welcome to WebmasterWorld Guest from 54.167.11.16
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
If a dir is disallowed, will a sitemap in a subdir be read?
Sitemaps located in non-content directories
badbadmonkey




msg:3787818
 10:46 am on Nov 17, 2008 (gmt 0)

Right this is probably a stupid question but one of those which I couldn't find the answer to...

I've got a bunch of automatically generated sitemaps being put in /wiki/sitemaps/. The problem is that /wiki/ is not a content directory; rather, the content from its scripts is presented to a virtual dir /kb/. /wiki/ is then disallowed from robots.txt to keep things clean.

Will Google et al access and read the sitemaps in the /wiki/sitemaps/ dir?

If not, should I use Allow: on the subdir, or move the files somewhere else?

 

goodroi




msg:3788083
 5:33 pm on Nov 17, 2008 (gmt 0)

if you block a directory, google will not access anything in that directory or its subdirectories ... unless you include a line in the robots.txt to allow it.

For example

User-agent: *
Allow: /wiki/sitemaps/
Disallow: /wiki/

This would block example.com/wiki/ but still allow google to access example.com/wiki/sitemaps/.

Don't forget you can test how Google would react to a robots.txt by logging into Google Webmaster Central and visiting their tool section.

badbadmonkey




msg:3788476
 3:09 am on Nov 18, 2008 (gmt 0)

Right thanks. The other problem I came across though was that for the above, sitemaps don't work in parallel subdirs, i.e. they must be in the same dir or a parent dir as their URLs. Annoying!

So a bit of mod_rewrite magic fixed that problem - now /kb/sitemap.xml now actually calls the file at /wiki/sitemaps/sitemap.xml !

Re-submitted to Google, 99% confident it'll be happy.

g1smd




msg:3817383
 4:26 pm on Jan 1, 2009 (gmt 0)

"Allow" is not standard syntax. It might not work.

janharders




msg:3817415
 6:00 pm on Jan 1, 2009 (gmt 0)

"Allow" is not standard syntax. It might not work.

I know it's not in the original document, but it's in the 1997 rfc - have you ever come accross a legitimate bot that doesn't support "Allow"?

jimbeetle




msg:3817452
 8:23 pm on Jan 1, 2009 (gmt 0)

have you ever come accross a legitimate bot that doesn't support "Allow"?

Sure, how 'bout archive.org's ia_archiver to name just one?

It's a comparatively recent development that the major bots started to support Allow directives. Until Google started to recognize Allow and implemented wildcards in pattern matching (and the other majors followed), maybe about two years ago, the *only* supported directive was Disallow. There are still legitimate bots out there that don't recognize Allow directives and wildcard pattern matching. Use either, but then don't be surprised if your "blocked" content makes it into an index somewhere.

janharders




msg:3817459
 8:45 pm on Jan 1, 2009 (gmt 0)

thanks, I wasn't aware of that. I had never really thought about it and when I finally had the opportunity to use Allow, I only checked google and the other majors and it was fine.
Though I probably also didn't get in any trouble because we usually Disallow everyone and then specifically Allow the big ones in, saving on traffic for obsolete search engines that never get us any traffic (at least in the german market, that may be different in the rest of the world where google is not above 90% market share).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved