Welcome to WebmasterWorld Guest from 54.146.201.80

Message Too Old, No Replies

How to use X-Robots-Tag to remove sitemap.xml from SERPs?

     
6:22 pm on Sep 23, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:June 2, 2006
posts:2112
votes: 2


I saw my sitemap.xml in Google serps earlier this morning. I went through all what I could find on the web, including posts from this forum.

I still don't know if I can simply add X-Robots-Tag stright into my sitemap.xml file, or I have to do some .htaccess exercise.

If the head of my sitemap.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

Can I now put "X-Robots-Tag: noindex" somewhere or I have to serve it in some other way?
If I can, how would the new head look like, please?

I'm just confused with the syntax, as everybody refers to this specific tag like X-Robots-Tag: noindex while I see that the tags from XML file always have values under quotes.
Is all about quotes?

Thanks

[edited by: Robert_Charlton at 6:38 pm (utc) on Sep. 23, 2009]
[edit reason] fixed typo & delinked sample links [/edit]

2:34 pm on Sept 24, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


The X-Robots directive belongs in the HTTP Header [w3.org] that your server uses to precede the actual xml sitemap file. It's not directly included within the file itself. To say it another way, the http header is NOT the <head> section of the file - they are two different things.

I have read in non-authoritative blogs that an x-robots directive can be added to the .htaccess file on an Apache server, but I cannot confirm that information. You might get more specific server help than I can offer by asking in our Apache Forum [webmasterworld.com].

7:56 pm on Sept 24, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:June 2, 2006
posts:2112
votes: 2


Thanks very much Tedster. I can deal with Apache through .htaccess.

It is just that people like myself lack a basic knowledge (like about headers and how stuff really works), while we run sites and play with Apache or other servers.

That's where the question like this one come from and wait for moderator to answer it as most of other participants go "Ha?!".

9:10 pm on Sept 24, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3091
votes: 2


Is google really stupid enough to include sitemaps in serps or was it a mistake? Obviously they have no excuse for it since they know the filename of the sitemap.

Are you certain it's not linked into one of your pages or into another site's page?

7:17 am on Sept 25, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:June 2, 2006
posts:2112
votes: 2


No link for sure. While searching on it, I've been finding posts from people that had the same problem.
One would expect that the "engine" of a search engine says "no" to sitemap.xml in serps.

But...

9:31 pm on Sept 25, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3091
votes: 2


It could say a generic No but sitemaps aren't always called sitemap.xml. But as I said, they have no excuse for not knowing what it is called since you have to tell google in WMT.
12:46 pm on Sept 26, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


In Apache .htaccess:

# Set HTTP X-Robots-Tag response header to "noindex" for sitemap.xml requests
<FilesMatch "^sitemap\.xml$">
Header set X-Robots-Tag: "noindex"
</FilesMatch>

This presumes that the Apache mod_headers module is available on your server, which is not always the case. If that module isn't available, then the options exist to output this header using a script 'wrapped around' the sitemap.xml file -- or moving to a host that allows the use of all common Apache modules.

Jim

1:06 pm on Sept 26, 2009 (gmt 0)

New User

5+ Year Member

joined:Sept 11, 2009
posts:1
votes: 0


I was wondering if one can extend Sitemap file to include

<xhtml:meta xmlns:xhtml='http://www.w3.org/1999/xhtml' name='robots' content='noindex' />

as can be done with xml feeds.

2:14 pm on Sept 26, 2009 (gmt 0)

Moderator from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


what is the harm in having your sitemap in the SERPs? Sure it's unintentional, but is it destructive?

I hope it's not ranking above your content pages...

2:57 pm on Sept 26, 2009 (gmt 0)

Junior Member

10+ Year Member

joined:May 27, 2004
posts:82
votes: 0


Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear.

Unless for some reason people search for XML files in SERPs to add to their Reader, without visiting the site!

Previous thread started by me: [webmasterworld.com ]

4:41 pm on Sept 26, 2009 (gmt 0)

New User

5+ Year Member

joined:June 16, 2009
posts:14
votes: 0


Bing did the same thing to me after I changed my URL a couple months ago, but none of the other search engines. I couldn't figure out why, but didn't really think much of it.
5:45 pm on Sept 26, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2003
posts: 1676
votes: 0


We debated whether or not to leave sitemap.xml as is and to use the robots.txt Sitemap: line. In the end we decided not to rename. The only significant reason that I've heard for hiding the file is that it a complete listing of our pages makes it easier to rip our site. Since that isn't very hard to do to start with, we decided, "So what.", and stayed with sitemap.xml - We have competitors that go both ways. Some hide the file, some do not. A competitor might be interested in how we prioritize pages, but only a fool couldn't guess pretty closely.

Is there a solid value in hiding the file?

We do have certain pages on websites that we don't want in the files, but that is a separate issue and easy enough to leave them out.

2:41 am on Sept 27, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1665
votes: 35


One of my competitors has a sitemap.xml in a plain view. She mentions every product that is displayed on her site along with about 200 word description per product she sells = same description goes on the product page. Over 500 products on the overage Day.

She never gets above 60 mark in SERP for those pages on long tail.

Scrapers Paradise! Her Content is all over the web, and it gets spidered elsewhere before GBot hits to her site.

9:50 pm on Sept 27, 2009 (gmt 0)

Preferred Member

5+ Year Member

joined:Sept 23, 2008
posts:439
votes: 0


Not destructive, but undesirable. Why would I want someone to open a .gz or .xml file to visit my content, I already have my pages listed which present the information the way I want it to appear.
If a bot can get access to this information, any other computer can as well. !

This raises a good question about the priority/ranks in sitemaps, time to save my sitemaps (but I cannot or I have to bann same BOT ranges)