| 8:17 pm on Aug 13, 2009 (gmt 0)|
Are you seeing the sitemap actually rank for keywords? Or is it just showing up in your site: query results? The main thing I'd suspect is that somewhere there is a link pointing to this sitemap file.
Here's a related thread on Google's forums, with a reply from Google's John Mueller (JohnMu)
|It looks like the URL of your Sitemap file is included in another one of your Sitemap files -- which is likely the reason we crawled and indexed it :). If you don't want them indexed, using the x-robots-tag HTTP header element is a great idea. An alternative would be to use a different Sitemap URL and remove this one with the URL removal tool. |
| 8:15 am on Aug 14, 2009 (gmt 0)|
Thanks for the reply.
I don't see them ranked for keywords, but using the site query shows it.
I am using the server based Google Sitemap Generator and there is no HTML link to the long (& generated) sitemap filename. It is ofcourse linked to in the Sitemap index that is automatically generated.
How can I implement a x-robots-tag for XML.GZ files?
and if I dissallow using robots.txt, will it not prevent Google from accessing my sitemap altogether? thereby nullifying it...
| 8:29 am on Aug 14, 2009 (gmt 0)|
You are not creating a Disallow rule in robots.txt. You are creating a noindex directive in the X-Robots http header - it's the equivalent of a noindex meta tag in an html head section.
Google MUST spider a page in order even to read the noindex meta tag, so this approach does not stop Google from using your Sitemap for its intended purpose. It only stops it from showing in the search results.
At the same time, if you are not getting any search traffic from the Sitemap's indexing, then you may want to just not worry about it.
Here is some more information about the X-Robots header:
WebmasterWorld Thread [webmasterworld.com]
Google Blog [googleblog.blogspot.com]
| 2:31 pm on Aug 14, 2009 (gmt 0)|
I was confused where to put the header since I had no control over the generation of the file!
The .htaccess magic does the trick! :)
| 5:39 pm on Aug 14, 2009 (gmt 0)|
"tedster" has referred to the thread I started in the Google forums. In my case, Google is actually directing real search traffic to my sitemap files. Not good. I added the "X-Robots-Tag" HTTP header for my XML sitemaps (with a "noindex, follow" value) about 2 weeks ago. The sitemaps are still in the SERPs, but I assume it will take several more weeks (or months) for this change to take effect.
What concerns me, and what other webmasters might want to take note of, is that Google may not handle sitemap indexes [sitemaps.org] properly. Sitemap indexes are suppose to be a way to divide large sitemaps and then point to those sitemaps. Google seems to treat such pointers as direct links to resources to be indexed.
Perhaps my mistake was submitting my sitemap index via Google Webmasters Tools (GWT). Because GWT allows you to submit multiple individual sitemap files per site, it may be redundant to also submit a sitemap index. (However, a sitemap index may be useful for Microsoft Webmaster center which only has one URL input per site for sitemaps.)
I was hoping Google would clear up how they're handling sitemap indexes, but I've gotten no word from them on this. Do others have this problem?
| 6:57 pm on Aug 14, 2009 (gmt 0)|
|Google may not handle sitemap indexes properly |
Yes that is my concern as well. And JohnMu's comments continued my concern because he mentions links from one Sitemap file to another Sitemap file being a cause of getting it indexed - and that is the exact situation with a Sitemap index!
| 4:15 pm on Aug 16, 2009 (gmt 0)|
Last week, I noticed a sitemap in my traffic logs actually having a Google referrer. I was very surprised. I'm not aware of any kind of link to it.
I think it's not my job as a webmaster to explain to Google, by way of an X-Robots-Tag, that a sitemap shouldn't be in search results. Google should understand that themselves.
| 3:29 am on Aug 17, 2009 (gmt 0)|
|I added the "X-Robots-Tag" HTTP header for my XML sitemaps (with a "noindex, follow" value) about 2 weeks ago. |
I remember setting up instructions for the X-Robots-Tag and I don't recall seeing a follow value in the docs. There are three options with the X-Robots-Tag from what I understand...
Example of X-Robots-Tag NoIndex Directive
<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "noindex"
[Example of X-Robots-Tag NoFollow Directive
<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "nofollow"
Example of X-Robots-Tag NoIndex, NoFollow Directive
<Files ~ "(about¦contact¦privacy)\.html$">
Header append X-Robots-Tag "noindex,nofollow"
The default behavior for bots is to follow. You only need to provide directives on noindex and/or nofollow.
| 1:10 pm on Aug 17, 2009 (gmt 0)|
|I don't recall seeing a follow value in the docs |
"follow" isn't mentioned by Google when they introduced the X-Robots-Tag [googleblog.blogspot.com] HTTP header, but I presume it's based on the HTML tag which does include it (according to the original notes [w3.org]). Since I don't know of any other detailed specifications for X-Robots-Tag, I chose to be explicit.
|The default behavior for bots is to follow. You only need to provide directives on noindex and/or nofollow. |
Definitely true for the HTML robots meta tag. In this case, Google [googlewebmastercentral.blogspot.com], Yahoo [ysearchblog.com], and Microsoft [bing.com] say so. Probably true for X-Robots-Tag too.
| 3:46 pm on Aug 17, 2009 (gmt 0)|
Is there advice against just retrofitting the site map with a "top" where above the fold it looks like the home page - so the user can navigate where ever they wish from there, but below the fold, in my case, 2 or 3 screen scrolls below the home page "topper" is where the site map begins?
I get enough traffic originating from my sitemap that I figured this might be a low tech work around.
Does anything contraindicate this?
| 4:35 pm on Aug 17, 2009 (gmt 0)|
On a sitemap.html page you can do what you want. But the thread is talking about XML Sitemaps - in fact, the OP is asking about .GZ zipped files generated by a server side script. No chance to modify that directly.