homepage Welcome to WebmasterWorld Guest from 54.198.135.17
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
My SITEMAPS are getting indexed--pls help implementing this tag
donna130




msg:4308085
 12:04 am on May 5, 2011 (gmt 0)

My SITEMAPS are getting indexed. Need help implementing the NOINDEX, follow tag.

I noticed some of my sitemaps are getting indexed. I obviously want them followed, ie their links crawled and those pages indexed, but not the sitemap itself. I found the following tag that I'd like to add.

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">

However, what I need your help with is that I only know a little HTML and I'm used to putting meta tags in html pages, and not sure how or if I can place them in .txt (my yahoo sitemap called urllist.txt) or .xml (my google sitemap) sitemap pages.

In other words, none of my sitemaps have <HEAD> tags, so I kindof doubt I can just paste the above TAG at the top of these 2 sitemaps(?) In summary, how do I add the above index-but-don't-follow tag to my .txt and .xml sitemaps which are not html? Thanks.

 

lucy24




msg:4308124
 4:23 am on May 5, 2011 (gmt 0)

You can't put anything except text in a .txt file. That's what the extension means. Raw text, period.

When you say "getting indexed" do you mean that pages named sitemap.txt and sitemap.xml are showing up in SERPs?

donna130




msg:4308127
 4:44 am on May 5, 2011 (gmt 0)

that's what getting indexed means ;)

lucy24




msg:4308167
 8:40 am on May 5, 2011 (gmt 0)

Ugh.

:: panicked detour to google to make sure they aren't pulling the same thing on me, with further detour revealing that I've been mispronouncing one (non-English) word all along, and still further detour to verify that Bing isn't disregarding robots.txt* ::

You're definitely not alone, though. I fed a straight "sitemap.txt" into google, and found about half a dozen of them on the first page (I do 30/page). Same for "sitemap.xml"-- including, a bit hilariously, google's own :)

Can anyone figure out what the variable is? They crawl mine periodically, but it isn't indexed. The ones that are indexed seem to be completely random sites that nobody ever heard of. If example.com were a real domain rather than a reserved name, its sitemap would be indexed.

Aside: As long as I was googling, I searched similarly for "robots.txt" and found a lot of those indexed too. But unlike the sitemaps, they're for domains everyone would have heard of, like, er, whitehouse.gov or microsoft.com.


* Not my own, but on a site I'm pretty familiar with. I was searching for sitemap content and we've got a word in common.

donna130




msg:4308188
 9:16 am on May 5, 2011 (gmt 0)

Well thanks for following up again, but looks like mostly you were free-associating (you can associate with me freely, lol). But still need to find an answer to how to get my NOINDEX, FOLLOW metatag into my .txt and .xml sitemaps. Again, we want Google to follow the links within the sitemap, but we don't want the index itself (obviously) showing up in SERPs. DOES ANYONE KNOW HOW? There's a lot of smart people here, so I'm hoping one or more of them will find this thread.

Thanks.

lucy24




msg:4308549
 1:54 am on May 6, 2011 (gmt 0)

OK, so you don't want to hear about how many detours it took me to arrive here:

[webmasterworld.com...]
(3 years ago, referring to robots.txt)

and

[webmasterworld.com...]
(1 1/2 years ago, referring to sitemaps)

Oops.

Give it another 4 1/2 years and maybe google will admit that This Is Stupid ;)

I don't regret the detour, because it left me with this deathless quote:
these [i.e. robots.txt and sitemap.whatever] are urls and search engines index urls

Think I'll pin that to the bathroom wall.

Besides, it gave me something to do while waiting for the text editor to get bored with holding its breath and turning blue. When you neglect to Save before embarking on an ill-advised RegEx, the text editor holds all the cards.

donna130




msg:4308597
 6:27 am on May 6, 2011 (gmt 0)

I would like to get others' opinions or experience RE the following simple methods. Please note: my only important .txt files are robots and sitemaps (this post of course is asking how to get urllist.txt and sitemap.xml REMOVED from the SERPs, ie noindex...but follow). I don't have htaccess on our server, so can't do the X-robot solution. Plus I'm not that advanced. But I have read about this solution:

Disallow: /*.txt$

or one of the following:

Disallow: /sitemap.txt


QUESTION:

Since I don't have the experience yet, I need others' honest help and to benefit from their experience on this question:

Does "disallow" prevent the sitemap file from being INDEXED (sitemap file itself) or from altogether being accessed (its enclosed links crawled)? //I don't pretend to be the first person to ask this question and understand the circular logic everyone hates where if it can't be crawled then how can SE know to disallow and other related stuff//

So looking for the simplest, smartest solution. Thanks for anyone who takes the time to help me.

phranque




msg:4308710
 1:09 pm on May 6, 2011 (gmt 0)

disallow excludes the robot from crawling the url.

if you want to prevent indexing specific .txt resources i would suggest providing a X-Robots-Tag: noindex [googleblog.blogspot.com] HTTP Response header.

donna130




msg:4309167
 4:55 am on May 7, 2011 (gmt 0)

awesome. thanks. phranque, I only know html a little, and have never once worked with htaccess or stuff like that. Can you show me where to start, so that I can create that X-Robots-Tag: no index HTTP Response header? It would be so appreciated. thank you.

phranque




msg:4309171
 5:28 am on May 7, 2011 (gmt 0)

assuming apache you could do something like this if mod_header is enabled:
<Files "urllist.txt">
Header set X-Robots-Tag "noindex"
</Files>

donna130




msg:4312385
 1:15 pm on May 14, 2011 (gmt 0)

Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.

donna130




msg:4312386
 1:16 pm on May 14, 2011 (gmt 0)

Thanks Phranque. I'll call my hosting support and ask if they have apache. but i think we're windows server. not sure if they coexist or where exactly I'll go into make those script changes. If you know please pass it along, thanks.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved