The X-Robots directive belongs in the HTTP Header [w3.org] that your server uses to precede the actual xml sitemap file. It's not directly included within the file itself. To say it another way, the http header is NOT the <head> section of the file - they are two different things.
I have read in non-authoritative blogs that an x-robots directive can be added to the .htaccess file on an Apache server, but I cannot confirm that information. You might get more specific server help than I can offer by asking in our Apache Forum [webmasterworld.com].
Thanks very much Tedster. I can deal with Apache through .htaccess.
It is just that people like myself lack a basic knowledge (like about headers and how stuff really works), while we run sites and play with Apache or other servers.
That's where the question like this one come from and wait for moderator to answer it as most of other participants go "Ha?!".
Is google really stupid enough to include sitemaps in serps or was it a mistake? Obviously they have no excuse for it since they know the filename of the sitemap.
Are you certain it's not linked into one of your pages or into another site's page?
No link for sure. While searching on it, I've been finding posts from people that had the same problem.
One would expect that the "engine" of a search engine says "no" to sitemap.xml in serps.
It could say a generic No but sitemaps aren't always called sitemap.xml. But as I said, they have no excuse for not knowing what it is called since you have to tell google in WMT.
In Apache .htaccess:
# Set HTTP X-Robots-Tag response header to "noindex" for sitemap.xml requests
Header set X-Robots-Tag: "noindex"
This presumes that the Apache mod_headers module is available on your server, which is not always the case. If that module isn't available, then the options exist to output this header using a script 'wrapped around' the sitemap.xml file -- or moving to a host that allows the use of all common Apache modules.
I was wondering if one can extend Sitemap file to include
<xhtml:meta xmlns:xhtml='http://www.w3.org/1999/xhtml' name='robots' content='noindex' />
as can be done with xml feeds.
what is the harm in having your sitemap in the SERPs? Sure it's unintentional, but is it destructive?
I hope it's not ranking above your content pages...
Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear.
Unless for some reason people search for XML files in SERPs to add to their Reader, without visiting the site!
Previous thread started by me: [webmasterworld.com ]
Bing did the same thing to me after I changed my URL a couple months ago, but none of the other search engines. I couldn't figure out why, but didn't really think much of it.
We debated whether or not to leave sitemap.xml as is and to use the robots.txt Sitemap: line. In the end we decided not to rename. The only significant reason that I've heard for hiding the file is that it a complete listing of our pages makes it easier to rip our site. Since that isn't very hard to do to start with, we decided, "So what.", and stayed with sitemap.xml - We have competitors that go both ways. Some hide the file, some do not. A competitor might be interested in how we prioritize pages, but only a fool couldn't guess pretty closely.
Is there a solid value in hiding the file?
We do have certain pages on websites that we don't want in the files, but that is a separate issue and easy enough to leave them out.
One of my competitors has a sitemap.xml in a plain view. She mentions every product that is displayed on her site along with about 200 word description per product she sells = same description goes on the product page. Over 500 products on the overage Day.
She never gets above 60 mark in SERP for those pages on long tail.
Scrapers Paradise! Her Content is all over the web, and it gets spidered elsewhere before GBot hits to her site.
If a bot can get access to this information, any other computer can as well. !
|Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear. |
This raises a good question about the priority/ranks in sitemaps, time to save my sitemaps (but I cannot or I have to bann same BOT ranges)