Forum Moderators: open
They are listing pages that they've found a link to, but they're not actually fetching the page, since that is disallowed by robots.txt. The listing in their SERPs shows only the link text used to link to the page and the page's URL.
As with Google and Ask Jeeves, the work-around is probably to allow them to fetch the page, and then use a <meta name="robots" content="noindex"> tag in the head of each page. For non-HTML pages such as pdf and xls, this can't be done.
One of the reasons I disallow some pages is because they are just lousy landing pages, and changing that would bloat the page. The <meta robots> technique will probably work, but it costs extra bandwidth because the page must be fetched to read that tag. Oh, bother.
Jim
One of the reasons I disallow some pages is because they are just lousy landing pages...
Sometimes it gets way more inconvenient than this. ;) I've had links, without title or description, to "blocked" co-branded subdomains outrank the main site. In fact, here's the thread where I first saw this problem, and you're the one who guided me to the solution:
Problem with Googlebot and robots.txt?
Google indexing links to blocked urls even though it's not following them
[webmasterworld.com...]