homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

My SITEMAPS are getting indexed--pls help implementing this tag

 12:04 am on May 5, 2011 (gmt 0)

My SITEMAPS are getting indexed. Need help implementing the NOINDEX, follow tag.

I noticed some of my sitemaps are getting indexed. I obviously want them followed, ie their links crawled and those pages indexed, but not the sitemap itself. I found the following tag that I'd like to add.


However, what I need your help with is that I only know a little HTML and I'm used to putting meta tags in html pages, and not sure how or if I can place them in .txt (my yahoo sitemap called urllist.txt) or .xml (my google sitemap) sitemap pages.

In other words, none of my sitemaps have <HEAD> tags, so I kindof doubt I can just paste the above TAG at the top of these 2 sitemaps(?) In summary, how do I add the above index-but-don't-follow tag to my .txt and .xml sitemaps which are not html? Thanks.



 4:23 am on May 5, 2011 (gmt 0)

You can't put anything except text in a .txt file. That's what the extension means. Raw text, period.

When you say "getting indexed" do you mean that pages named sitemap.txt and sitemap.xml are showing up in SERPs?


 4:44 am on May 5, 2011 (gmt 0)

that's what getting indexed means ;)


 8:40 am on May 5, 2011 (gmt 0)


:: panicked detour to google to make sure they aren't pulling the same thing on me, with further detour revealing that I've been mispronouncing one (non-English) word all along, and still further detour to verify that Bing isn't disregarding robots.txt* ::

You're definitely not alone, though. I fed a straight "sitemap.txt" into google, and found about half a dozen of them on the first page (I do 30/page). Same for "sitemap.xml"-- including, a bit hilariously, google's own :)

Can anyone figure out what the variable is? They crawl mine periodically, but it isn't indexed. The ones that are indexed seem to be completely random sites that nobody ever heard of. If example.com were a real domain rather than a reserved name, its sitemap would be indexed.

Aside: As long as I was googling, I searched similarly for "robots.txt" and found a lot of those indexed too. But unlike the sitemaps, they're for domains everyone would have heard of, like, er, whitehouse.gov or microsoft.com.

* Not my own, but on a site I'm pretty familiar with. I was searching for sitemap content and we've got a word in common.


 9:16 am on May 5, 2011 (gmt 0)

Well thanks for following up again, but looks like mostly you were free-associating (you can associate with me freely, lol). But still need to find an answer to how to get my NOINDEX, FOLLOW metatag into my .txt and .xml sitemaps. Again, we want Google to follow the links within the sitemap, but we don't want the index itself (obviously) showing up in SERPs. DOES ANYONE KNOW HOW? There's a lot of smart people here, so I'm hoping one or more of them will find this thread.



 1:54 am on May 6, 2011 (gmt 0)

OK, so you don't want to hear about how many detours it took me to arrive here:

(3 years ago, referring to robots.txt)


(1 1/2 years ago, referring to sitemaps)


Give it another 4 1/2 years and maybe google will admit that This Is Stupid ;)

I don't regret the detour, because it left me with this deathless quote:
these [i.e. robots.txt and sitemap.whatever] are urls and search engines index urls

Think I'll pin that to the bathroom wall.

Besides, it gave me something to do while waiting for the text editor to get bored with holding its breath and turning blue. When you neglect to Save before embarking on an ill-advised RegEx, the text editor holds all the cards.


 6:27 am on May 6, 2011 (gmt 0)

I would like to get others' opinions or experience RE the following simple methods. Please note: my only important .txt files are robots and sitemaps (this post of course is asking how to get urllist.txt and sitemap.xml REMOVED from the SERPs, ie noindex...but follow). I don't have htaccess on our server, so can't do the X-robot solution. Plus I'm not that advanced. But I have read about this solution:

Disallow: /*.txt$

or one of the following:

Disallow: /sitemap.txt


Since I don't have the experience yet, I need others' honest help and to benefit from their experience on this question:

Does "disallow" prevent the sitemap file from being INDEXED (sitemap file itself) or from altogether being accessed (its enclosed links crawled)? //I don't pretend to be the first person to ask this question and understand the circular logic everyone hates where if it can't be crawled then how can SE know to disallow and other related stuff//

So looking for the simplest, smartest solution. Thanks for anyone who takes the time to help me.


 1:09 pm on May 6, 2011 (gmt 0)

disallow excludes the robot from crawling the url.

if you want to prevent indexing specific .txt resources i would suggest providing a X-Robots-Tag: noindex [googleblog.blogspot.com] HTTP Response header.


 4:55 am on May 7, 2011 (gmt 0)

awesome. thanks. phranque, I only know html a little, and have never once worked with htaccess or stuff like that. Can you show me where to start, so that I can create that X-Robots-Tag: no index HTTP Response header? It would be so appreciated. thank you.


 5:28 am on May 7, 2011 (gmt 0)

assuming apache you could do something like this if mod_header is enabled:
<Files "urllist.txt">
Header set X-Robots-Tag "noindex"


 1:15 pm on May 14, 2011 (gmt 0)

Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.


 1:16 pm on May 14, 2011 (gmt 0)

Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved