Forum Moderators: Robert Charlton & goodroi
That's too bad, as it has forced many to password-protect or cloak pages that the Webmaster simply does not want used as entry points to the site.
So, given that the robots must fetch a page to read the on-page meta-tag, using a robots.txt Disallow will at least keep the wasted bandwidth down, as was it's intent.
Jim
[edited by: jdMorgan at 1:45 am (utc) on Sep. 21, 2006]
Google has a specific "nosnippet" option for the robots meta for people who want a URL only listing for a page. It would be strange if they treat noindex and nosnippet as equal.
What often happens is that people overdo their robots blocking. If you put a "noindex" in the meta tag AND disallow crawling by Googlebot as the OP proposed, the bot doesn't read the content of that page and doesn't see the "noindex" meta tag. DisAllow in the robots.txt does however not prevent Google to add a URL only listing, or even a complete listing where the description of DMOZ is used as the snippet.
So to remove your content from the Google index, you have to allow Googlebot to read the file. If you block it in robots.txt, the content can appear in the index. Quite a contradiction, but it is how the rules work.
> If you put a "noindex" in the meta tag AND disallow crawling by Googlebot as the OP proposed, the bot doesn't read the content of that page and doesn't see the "noindex" meta tag.
Well yes, and I posted that same idea using different words above. But I have lots of URL-only listings in G right now, pointing to pages which are not Disallowed in robots.txt, and carry only the "noindex,nofollow" on-page meta tag, with no mention of nosnippet, noarchive, or noodp.
I have posted on WebmasterWorld several times in the past, explaining the heirarchy of robots.txt over on-page robots tags, and how to do it properly for various circumstances. But ever since G started talking about "The Deep Web," it has stopped working at G, and now at Y and M as well.
And I'm seeing this behaviour across multiple, diverse sites -- mine and others.
Jim
Google and Yahoo did list those pages in the SERPs for a search on the domain name for several months, but no longer do so. Hopefully, this was a temporary glitch.
Jim
Google and Yahoo did list those pages in the SERPs for a search on the domain name for several months, but no longer do so. Hopefully, this was a temporary glitch.
Yes, I remember that glitch, it was also discussed in a thread [webmasterworld.com] here. During a short time many of my "noindex" pages were visible in Google's SERPs.
Using the robots.txt exclusion results in the page not being spidered, but still appearing as a URL-only lising in the SERPs - especially if someone links to it.
.
Yahoo goes further. They construct a title for that previously URL-only listing by using the anchor text of one of the links that points to that page, but only if that anchor text is not "click here" or some other generally poor quality text.