Forum Moderators: Robert Charlton & goodroi
Indexed, though blocked by robots.txt
internal links to them are nofollowIn spite of what G said when it first instituted the “nofollow” label (there's a recent thread somewhere hereabouts), “nofollow" never really meant “Pretend you have not seen this link”. It only means “Don’t tell them I sent you.” That’s why GSC's “who links to you” list shows a random mix of follow and nofollow links.
why Google would index a page that robots.txt has never let it seeThat's because robots.txt does not prevent Google from following links but it does prevent Google from crawling the page to evaluate the noindex header on the page. It probably is not actually indexed in serps, just not noindexed. Confusing? Yes.
I'm still a bit hazy about why Google would index a page that robots.txt has never let it see. What, exactly. is in the index?That's why I believe most of the time it's a non-issue: sure, in some abstract hypothetical sense the unseen page is “indexed”--but will it ever crop up in any actual SERP seen by any actual human? Consider your Contact page. There are millions (literally) of pages on the internet whose link is the word “Contact” or similar, so it is not likely that anyone searching for “contact” will be offered pages the search engine has not seen. But if your linking text is some extremely unusual phrase, then a person searching for that phrase might arrive at “we are unable to show you”.
That's because robots.txt does not prevent Google from following links but it does prevent Google from crawling the page