Do spiders follow links from pages with the noindex tag?

Forum Moderators: goodroi

Message Too Old, No Replies

Do spiders follow links from pages with the noindex tag?

As opposed to links from robots files?

bouncybunny

1:51 am on Jan 2, 2007 (gmt 0)

My understanding of the difference between using a robots.txt file and the noindex tag at the top of a page, is;

1. Using a robots.txt disallow command will stop Spiders (Googlebot etc) from even reading the page referenced by the link/path.

2. A noindex tag will allow the spider to read the page, but it won't be returned in search results.

So far so good?

My question therefore is this. If there is a link to another (non-disallowed) page, from the page that has the noindex tag, will Googlebot (and the others) follow that link and index the page fully and return its content in search results?

Thanks

jdMorgan

2:10 am on Jan 2, 2007 (gmt 0)

Yes to your final question, the links will be followed.

However, on point #2, the "noindex" tag keeps the 'bot from reading the page, but it won't keep the page out of the search results. A noindexed page won't show up for the words on the page itself (since you've told the bot not to use the text on it), but it *will* show up in searches for the domain, or in searches for words used in the link text of links pointing to the page. Google just shows the URL as the page title, while Yahoo uses the link text of whatever link it likes best that points at that URL.

This represents a relatively-recent change in behaviour. Until a few years ago, the 'big' search engines would not include a URL in their index if it was for a page marked "noindex". But then the "deep Web" fad started, and with the added pressure of "my index is bigger than yours," we lost the ability to tell the 'bots "Please don't mention this page" without cloaking -- For example, serving 'bots a 401-Authentication (login) Required response.

Jim

leadegroot

2:46 am on Jan 2, 2007 (gmt 0)

Are you sure, Jim?
I understood the reverse for Google - a robots.txt entry doesn't keep an url-only entry out of the index if the bot finds a link to the page out there, somewhere, but a noindex meta on the page will stop it indexing at all. (But it will read the page and follow links, assuming there isn't a nofollow meta on the page).
Not sure about the other engines.

Lea

bouncybunny

6:59 am on Jan 2, 2007 (gmt 0)

Thanks guys.

I'm going to hang around here until this is sorted out though. ;-)

Although my main question appears to have been answered. Google *will* follow links on pages that have the 'noindex' tag.