lucy24

msg:4478853 | 7:17 pm on Jul 25, 2012 (gmt 0) |
Short answer: the word 'noindex' is not part of the Robots Exclusion Standard. Use it at your own risk. Disallow = robots stay out, no crawling allowed Noindex = page is not mentioned in google's* search index Yes, a page can be indexed even if a search engine has not seen it. They only have to know it exists. * I say specifically google, because That Other Search Engine has indexed a few pages that are clearly and explicitly labeled noindex.
|
phranque

msg:4478905 | 12:30 am on Jul 26, 2012 (gmt 0) |
the problem with Noindex: in a robots exclusion protocol is that robots are for crawling, not indexing. according to their documentation google only supports the Disallow: and Allow: directives in robots.txt. Block or remove pages using a robots.txt file - Webmaster Tools Help: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449 [support.google.com]
|
shaunm

msg:4478986 | 6:58 am on Jul 26, 2012 (gmt 0) |
@phranque Thank you so much for answering! "robots are for crawling, not indexing" - It cannot be explained in any other words :) Cheers!
|
shaunm

msg:4478987 | 7:06 am on Jul 26, 2012 (gmt 0) |
@lucy24 Thanks! After so much of research, I found that 'noindex' in a robots.txt is not a directive. But still I am very much confused since the 'robots.txt checkers' available online do not find the use of 'noindex' as an error and report to us, why is that? And also whey you say "I say specifically google, because That Other Search Engine has indexed a few pages that are clearly and explicitly labeled noindex" Do you refer to the NOINDEX in robots.txt or NOINDEX in Meta Tags? Thanks again.
|
shaunm

msg:4478990 | 7:07 am on Jul 26, 2012 (gmt 0) |
And sorry about the mistyped 'Title' The correct one is Noindex vs Disallow within a Robots.txt :)
|
lucy24

msg:4479244 | 9:54 pm on Jul 26, 2012 (gmt 0) |
Oops. I meant the "noindex" meta tag. It would never occur to me to say "noindex" in robots.txt. I don't even use "allow", since only a handful of robots recognize the word. Incidentally, when I first saw the topic header I thought it was going to be the perennial unanswered question: how the bleepity bleep do you prevent g### from indexing your sitemap and robots.txt? :)
|
phranque

msg:4479269 | 11:19 pm on Jul 26, 2012 (gmt 0) |
| how the bleepity bleep do you prevent g### from indexing your sitemap and robots.txt? |
| you could always try using the X-Robots-Tag HTTP header: http://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag [developers.google.com]
|
shaunm

msg:4479319 | 5:28 am on Jul 27, 2012 (gmt 0) |
@lucy24 hahaha...yes it's all because of my wrong title. Good that you asked otherwise phranque would not have shared that resource link :) Thanks both lucy24 and phranque for your detailed answering. Cheers!
|
|