Forum Moderators: goodroi
1. Using a robots.txt disallow command will stop Spiders (Googlebot etc) from even reading the page referenced by the link/path.
2. A noindex tag will allow the spider to read the page, but it won't be returned in search results.
So far so good?
My question therefore is this. If there is a link to another (non-disallowed) page, from the page that has the noindex tag, will Googlebot (and the others) follow that link and index the page fully and return its content in search results?
Thanks
However, on point #2, the "noindex" tag keeps the 'bot from reading the page, but it won't keep the page out of the search results. A noindexed page won't show up for the words on the page itself (since you've told the bot not to use the text on it), but it *will* show up in searches for the domain, or in searches for words used in the link text of links pointing to the page. Google just shows the URL as the page title, while Yahoo uses the link text of whatever link it likes best that points at that URL.
This represents a relatively-recent change in behaviour. Until a few years ago, the 'big' search engines would not include a URL in their index if it was for a page marked "noindex". But then the "deep Web" fad started, and with the added pressure of "my index is bigger than yours," we lost the ability to tell the 'bots "Please don't mention this page" without cloaking -- For example, serving 'bots a 401-Authentication (login) Required response.
Jim
Lea