Welcome to WebmasterWorld Guest from 50.19.34.234

Forum Moderators: goodroi

Message Too Old, No Replies

My SITEMAPS are getting indexed--pls help implementing this tag

     

donna130

12:04 am on May 5, 2011 (gmt 0)

5+ Year Member



My SITEMAPS are getting indexed. Need help implementing the NOINDEX, follow tag.

I noticed some of my sitemaps are getting indexed. I obviously want them followed, ie their links crawled and those pages indexed, but not the sitemap itself. I found the following tag that I'd like to add.

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">

However, what I need your help with is that I only know a little HTML and I'm used to putting meta tags in html pages, and not sure how or if I can place them in .txt (my yahoo sitemap called urllist.txt) or .xml (my google sitemap) sitemap pages.

In other words, none of my sitemaps have <HEAD> tags, so I kindof doubt I can just paste the above TAG at the top of these 2 sitemaps(?) In summary, how do I add the above index-but-don't-follow tag to my .txt and .xml sitemaps which are not html? Thanks.

lucy24

4:23 am on May 5, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You can't put anything except text in a .txt file. That's what the extension means. Raw text, period.

When you say "getting indexed" do you mean that pages named sitemap.txt and sitemap.xml are showing up in SERPs?

donna130

4:44 am on May 5, 2011 (gmt 0)

5+ Year Member



that's what getting indexed means ;)

lucy24

8:40 am on May 5, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Ugh.

:: panicked detour to google to make sure they aren't pulling the same thing on me, with further detour revealing that I've been mispronouncing one (non-English) word all along, and still further detour to verify that Bing isn't disregarding robots.txt* ::

You're definitely not alone, though. I fed a straight "sitemap.txt" into google, and found about half a dozen of them on the first page (I do 30/page). Same for "sitemap.xml"-- including, a bit hilariously, google's own :)

Can anyone figure out what the variable is? They crawl mine periodically, but it isn't indexed. The ones that are indexed seem to be completely random sites that nobody ever heard of. If example.com were a real domain rather than a reserved name, its sitemap would be indexed.

Aside: As long as I was googling, I searched similarly for "robots.txt" and found a lot of those indexed too. But unlike the sitemaps, they're for domains everyone would have heard of, like, er, whitehouse.gov or microsoft.com.


* Not my own, but on a site I'm pretty familiar with. I was searching for sitemap content and we've got a word in common.

donna130

9:16 am on May 5, 2011 (gmt 0)

5+ Year Member



Well thanks for following up again, but looks like mostly you were free-associating (you can associate with me freely, lol). But still need to find an answer to how to get my NOINDEX, FOLLOW metatag into my .txt and .xml sitemaps. Again, we want Google to follow the links within the sitemap, but we don't want the index itself (obviously) showing up in SERPs. DOES ANYONE KNOW HOW? There's a lot of smart people here, so I'm hoping one or more of them will find this thread.

Thanks.

lucy24

1:54 am on May 6, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



OK, so you don't want to hear about how many detours it took me to arrive here:

[webmasterworld.com...]
(3 years ago, referring to robots.txt)

and

[webmasterworld.com...]
(1 1/2 years ago, referring to sitemaps)

Oops.

Give it another 4 1/2 years and maybe google will admit that This Is Stupid ;)

I don't regret the detour, because it left me with this deathless quote:
these [i.e. robots.txt and sitemap.whatever] are urls and search engines index urls

Think I'll pin that to the bathroom wall.

Besides, it gave me something to do while waiting for the text editor to get bored with holding its breath and turning blue. When you neglect to Save before embarking on an ill-advised RegEx, the text editor holds all the cards.

donna130

6:27 am on May 6, 2011 (gmt 0)

5+ Year Member



I would like to get others' opinions or experience RE the following simple methods. Please note: my only important .txt files are robots and sitemaps (this post of course is asking how to get urllist.txt and sitemap.xml REMOVED from the SERPs, ie noindex...but follow). I don't have htaccess on our server, so can't do the X-robot solution. Plus I'm not that advanced. But I have read about this solution:

Disallow: /*.txt$

or one of the following:

Disallow: /sitemap.txt


QUESTION:

Since I don't have the experience yet, I need others' honest help and to benefit from their experience on this question:

Does "disallow" prevent the sitemap file from being INDEXED (sitemap file itself) or from altogether being accessed (its enclosed links crawled)? //I don't pretend to be the first person to ask this question and understand the circular logic everyone hates where if it can't be crawled then how can SE know to disallow and other related stuff//

So looking for the simplest, smartest solution. Thanks for anyone who takes the time to help me.

phranque

1:09 pm on May 6, 2011 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



disallow excludes the robot from crawling the url.

if you want to prevent indexing specific .txt resources i would suggest providing a X-Robots-Tag: noindex [googleblog.blogspot.com] HTTP Response header.

donna130

4:55 am on May 7, 2011 (gmt 0)

5+ Year Member



awesome. thanks. phranque, I only know html a little, and have never once worked with htaccess or stuff like that. Can you show me where to start, so that I can create that X-Robots-Tag: no index HTTP Response header? It would be so appreciated. thank you.

phranque

5:28 am on May 7, 2011 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



assuming apache you could do something like this if mod_header is enabled:
<Files "urllist.txt">
Header set X-Robots-Tag "noindex"
</Files>

donna130

1:15 pm on May 14, 2011 (gmt 0)

5+ Year Member



Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.

donna130

1:16 pm on May 14, 2011 (gmt 0)

5+ Year Member



Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month