Yahoo listing robots-disallowed pages w/link-text only

Forum Moderators: open

Message Too Old, No Replies

Yahoo listing robots-disallowed pages w/link-text only

Oh bother -- Yahoo joins Google and Ask Jeeves

jdMorgan

5:08 am on May 1, 2004 (gmt 0)

It looks like Yahoo has joined Google and Ask Jeeves in listing pages that are disallowed in robots.txt.

They are listing pages that they've found a link to, but they're not actually fetching the page, since that is disallowed by robots.txt. The listing in their SERPs shows only the link text used to link to the page and the page's URL.

As with Google and Ask Jeeves, the work-around is probably to allow them to fetch the page, and then use a <meta name="robots" content="noindex"> tag in the head of each page. For non-HTML pages such as pdf and xls, this can't be done.

One of the reasons I disallow some pages is because they are just lousy landing pages, and changing that would bloat the page. The <meta robots> technique will probably work, but it costs extra bandwidth because the page must be fetched to read that tag. Oh, bother.

Jim

Robert Charlton

6:03 am on May 1, 2004 (gmt 0)

Jim - Good catch.

One of the reasons I disallow some pages is because they are just lousy landing pages...

Sometimes it gets way more inconvenient than this. ;) I've had links, without title or description, to "blocked" co-branded subdomains outrank the main site. In fact, here's the thread where I first saw this problem, and you're the one who guided me to the solution:

Problem with Googlebot and robots.txt?
Google indexing links to blocked urls even though it's not following them
[webmasterworld.com...]