Welcome to WebmasterWorld Guest from 54.144.107.83

Forum Moderators: goodroi

Message Too Old, No Replies

Noindex vs Disallow within a Sitemap

What is the use of using Noindex within a Sitemap?

     
9:03 am on Jul 25, 2012 (gmt 0)

Full Member from IN 

Top Contributors Of The Month

joined:June 18, 2012
posts: 293
votes: 0


Hi,

I never heard of using 'noindex' within a sitemap. What is the difference between the 'disallow' and 'noindex' parameters?

Suppose I want to block a single URL in my website not to be crawled and indexed, which is the best way?

user-agent: googlebot
noindex: /fr/content.aspx

user-agent: googlebot
disallow: /fr/content.aspx


I just came across a website which uses both in their robots.txt file to make sure that their pages are not indexed in the search results.
[korinaithacahotel.com...]


I would very much appreciate your help guys. Thanks a lot!


Best,
7:17 pm on July 25, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12693
votes: 244


Short answer: the word 'noindex' is not part of the Robots Exclusion Standard. Use it at your own risk.

Disallow = robots stay out, no crawling allowed
Noindex = page is not mentioned in google's* search index

Yes, a page can be indexed even if a search engine has not seen it. They only have to know it exists.


* I say specifically google, because That Other Search Engine has indexed a few pages that are clearly and explicitly labeled noindex.
12:30 am on July 26, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


the problem with Noindex: in a robots exclusion protocol is that robots are for crawling, not indexing.


according to their documentation google only supports the Disallow: and Allow: directives in robots.txt.

Block or remove pages using a robots.txt file - Webmaster Tools Help:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449 [support.google.com]
6:58 am on July 26, 2012 (gmt 0)

Full Member from IN 

Top Contributors Of The Month

joined:June 18, 2012
posts: 293
votes: 0


@phranque
Thank you so much for answering! "robots are for crawling, not indexing" - It cannot be explained in any other words :)

Cheers!
7:06 am on July 26, 2012 (gmt 0)

Full Member from IN 

Top Contributors Of The Month

joined:June 18, 2012
posts: 293
votes: 0


@lucy24

Thanks! After so much of research, I found that 'noindex' in a robots.txt is not a directive. But still I am very much confused since the 'robots.txt checkers' available online do not find the use of 'noindex' as an error and report to us, why is that?

And also whey you say "I say specifically google, because That Other Search Engine has indexed a few pages that are clearly and explicitly labeled noindex"

Do you refer to the NOINDEX in robots.txt or NOINDEX in Meta Tags?

Thanks again.
7:07 am on July 26, 2012 (gmt 0)

Full Member from IN 

Top Contributors Of The Month

joined:June 18, 2012
posts: 293
votes: 0


And sorry about the mistyped 'Title'

The correct one is Noindex vs Disallow within a Robots.txt

:)
9:54 pm on July 26, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12693
votes: 244


Oops. I meant the "noindex" meta tag. It would never occur to me to say "noindex" in robots.txt. I don't even use "allow", since only a handful of robots recognize the word.

Incidentally, when I first saw the topic header I thought it was going to be the perennial unanswered question: how the bleepity bleep do you prevent g### from indexing your sitemap and robots.txt? :)
11:19 pm on July 26, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


how the bleepity bleep do you prevent g### from indexing your sitemap and robots.txt?


you could always try using the X-Robots-Tag HTTP header:
http://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag [developers.google.com]
5:28 am on July 27, 2012 (gmt 0)

Full Member from IN 

Top Contributors Of The Month

joined:June 18, 2012
posts: 293
votes: 0


@lucy24

hahaha...yes it's all because of my wrong title. Good that you asked otherwise phranque would not have shared that resource link :)

Thanks both lucy24 and phranque for your detailed answering.

Cheers!
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members