Welcome to WebmasterWorld Guest from 54.146.246.4

Message Too Old, No Replies

How to use X-Robots-Tag to remove sitemap.xml from SERPs?

     

smallcompany

6:22 pm on Sep 23, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I saw my sitemap.xml in Google serps earlier this morning. I went through all what I could find on the web, including posts from this forum.

I still don't know if I can simply add X-Robots-Tag stright into my sitemap.xml file, or I have to do some .htaccess exercise.

If the head of my sitemap.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

Can I now put "X-Robots-Tag: noindex" somewhere or I have to serve it in some other way?
If I can, how would the new head look like, please?

I'm just confused with the syntax, as everybody refers to this specific tag like X-Robots-Tag: noindex while I see that the tags from XML file always have values under quotes.
Is all about quotes?

Thanks

[edited by: Robert_Charlton at 6:38 pm (utc) on Sep. 23, 2009]
[edit reason] fixed typo & delinked sample links [/edit]

tedster

2:34 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The X-Robots directive belongs in the HTTP Header [w3.org] that your server uses to precede the actual xml sitemap file. It's not directly included within the file itself. To say it another way, the http header is NOT the <head> section of the file - they are two different things.

I have read in non-authoritative blogs that an x-robots directive can be added to the .htaccess file on an Apache server, but I cannot confirm that information. You might get more specific server help than I can offer by asking in our Apache Forum [webmasterworld.com].

smallcompany

7:56 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Thanks very much Tedster. I can deal with Apache through .htaccess.

It is just that people like myself lack a basic knowledge (like about headers and how stuff really works), while we run sites and play with Apache or other servers.

That's where the question like this one come from and wait for moderator to answer it as most of other participants go "Ha?!".

dstiles

9:10 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Is google really stupid enough to include sitemaps in serps or was it a mistake? Obviously they have no excuse for it since they know the filename of the sitemap.

Are you certain it's not linked into one of your pages or into another site's page?

smallcompany

7:17 am on Sep 25, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



No link for sure. While searching on it, I've been finding posts from people that had the same problem.
One would expect that the "engine" of a search engine says "no" to sitemap.xml in serps.

But...

dstiles

9:31 pm on Sep 25, 2009 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



It could say a generic No but sitemaps aren't always called sitemap.xml. But as I said, they have no excuse for not knowing what it is called since you have to tell google in WMT.

jdMorgan

12:46 pm on Sep 26, 2009 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



In Apache .htaccess:

# Set HTTP X-Robots-Tag response header to "noindex" for sitemap.xml requests
<FilesMatch "^sitemap\.xml$">
Header set X-Robots-Tag: "noindex"
</FilesMatch>

This presumes that the Apache mod_headers module is available on your server, which is not always the case. If that module isn't available, then the options exist to output this header using a script 'wrapped around' the sitemap.xml file -- or moving to a host that allows the use of all common Apache modules.

Jim

wiian

1:06 pm on Sep 26, 2009 (gmt 0)

5+ Year Member



I was wondering if one can extend Sitemap file to include

<xhtml:meta xmlns:xhtml='http://www.w3.org/1999/xhtml' name='robots' content='noindex' />

as can be done with xml feeds.

httpwebwitch

2:14 pm on Sep 26, 2009 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



what is the harm in having your sitemap in the SERPs? Sure it's unintentional, but is it destructive?

I hope it's not ranking above your content pages...

ashishp

2:57 pm on Sep 26, 2009 (gmt 0)

10+ Year Member



Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear.

Unless for some reason people search for XML files in SERPs to add to their Reader, without visiting the site!

Previous thread started by me: [webmasterworld.com ]

SEOHolicc

4:41 pm on Sep 26, 2009 (gmt 0)

5+ Year Member



Bing did the same thing to me after I changed my URL a couple months ago, but none of the other search engines. I couldn't figure out why, but didn't really think much of it.

D_Blackwell

5:45 pm on Sep 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We debated whether or not to leave sitemap.xml as is and to use the robots.txt Sitemap: line. In the end we decided not to rename. The only significant reason that I've heard for hiding the file is that it a complete listing of our pages makes it easier to rip our site. Since that isn't very hard to do to start with, we decided, "So what.", and stayed with sitemap.xml - We have competitors that go both ways. Some hide the file, some do not. A competitor might be interested in how we prioritize pages, but only a fool couldn't guess pretty closely.

Is there a solid value in hiding the file?

We do have certain pages on websites that we don't want in the files, but that is a separate issue and easy enough to leave them out.

blend27

2:41 am on Sep 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One of my competitors has a sitemap.xml in a plain view. She mentions every product that is displayed on her site along with about 200 word description per product she sells = same description goes on the product page. Over 500 products on the overage Day.

She never gets above 60 mark in SERP for those pages on long tail.

Scrapers Paradise! Her Content is all over the web, and it gets spidered elsewhere before GBot hits to her site.

Future

9:50 pm on Sep 27, 2009 (gmt 0)

5+ Year Member



Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear.
If a bot can get access to this information, any other computer can as well. !

This raises a good question about the priority/ranks in sitemaps, time to save my sitemaps (but I cannot or I have to bann same BOT ranges)