Page is a not externally linkable
Robert_Charlton - 7:25 pm on May 30, 2010 (gmt 0)
URI only listings...
This gets discussed periodically here. As I'll elaborate in a moment, Google has referrred to these listings as "references".
The only way I've found to prevent these "references" is, as pageoneresults suggests, to use the meta robots tag with noindex,nofollow in the CONTENT of those URIs... and (with a tip of the hat to jdMorgan for this)... not to simultaneously use robots.txt to disallow the crawling of the CONTENT of those URIs.
If you disallow the crawling, then the robots meta tag with noindex, nofollow is not seen, and the reference... if from a spiderable page... will be indexed. Apparently, different engines treat the meta robots tag differently, but that's another can of worms.
My first experience with this situation was in this discussion in 2003....
Problem with Googlebot and robots.txt?
Google indexing links to blocked urls even though it's not following them
http://www.webmasterworld.com/forum3/11621.htm
[webmasterworld.com...]
I think it's worth quoting here GoogleGuy's comment on the situation and my response... both of which are still applicable...
GoogleGuy, with my emphasis added...
If we have evidence that a page is good, we can return that reference even though we haven't crawled the page.
My response...
GoogleGuy - Since you've asked in the past for suggestions for improving Google's serps, I'd suggest that less aggressive indexing here would be helpful. I can't imagine why Google would want to return a link to a blocked page.