Pages Restricted by Robots.txt Showing Link in result

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages Restricted by Robots.txt Showing Link in result

Shows blocked in webmaster tools...

supafresh

4:46 pm on May 30, 2007 (gmt 0)

So i have this page that is blocked by robots.txt , confirmed to be blocked in webmaster tools and it still shows in the results with a basic title tag, no descrip and link.

ive submitted a removal request on webmaster tools but shouldnt this link have not been spidered if the landing page says in robots.txt and on page meta tags, do not spider?

g1smd

10:55 pm on May 30, 2007 (gmt 0)

Google has ignored these directives for several short spells in the past, but not any time recently.

Every time I now see this happen it has been a typo somewhere in the site pages, or configuration.

hvacdirect

11:15 pm on May 30, 2007 (gmt 0)

It usually means that someone else linked to it (external from your site), thus they know about the link, but won't crawl it, why they don't show any description and it won't rank for anything other than the link anchor text that got it there in the first place.

andrewshim

4:12 am on May 31, 2007 (gmt 0)

I seem to have a similar problem. Google indexed my RSS file. Unfortunately, the title and description in the feed is the same as my Homepage (I have since changed it).

I have blocked googlebot from crawling the RSS feed with my robots.txt file and requested for the removal of the xml file, but I noticed that this request is still pending. I cannot stop others from linking to this feed but, is blocking googlebot from crawling it enough?

supafresh

2:16 pm on May 31, 2007 (gmt 0)

Well the problem is that the link is showing up in the serps better then my actual page. It might see my page as duplicate content and penalize.

jimbeetle

2:40 pm on May 31, 2007 (gmt 0)

URL only listings for pages blocked by robots.txt is normal behaviour for Google (and Yahoo!). If they know about a page through links it's possible it will show in the SERPs even if disallowed in robots.txt.

The simple fix is to first expose the page to the bots by removing the disallow in robots.txt. Then, place a robots noindex meta in the head of the page. (Removing the disallow in robots.txt allows the bots to see and (hopefully) obey the noindex.)

It's possible that the page will show as a supplemental result for quite awhile, but will eventually fall completely out of the index.

g1smd

4:24 pm on May 31, 2007 (gmt 0)

Yes, URL-only entries in the SERPs occur because someone somewhere has linked to the page.

The OP mentioned a "basic title tag" and I took that to mean something more than a "URL-only" entry.

jimbeetle

4:37 pm on May 31, 2007 (gmt 0)

Oops, I skipped over the "basic title tag" since it read so much like a URL only listing. Is it a new flavor of weird listing?

g1smd

4:49 pm on May 31, 2007 (gmt 0)

There have been a couple of occasions where Google showed "full entries" in the SERPs for excluded pages. There is a thread from about this time last year when that happened: [webmasterworld.com...]

I do occasionally see single-line entries that do have a title, but no snippet. I do see them on pages with very high PR and having many incoming links. It is as if Google can't believe that the page should not be in the SERPs and tries very hard to include it. In the cases that I have looked at, they have been pages that did not need to be included in the index (such as main "admin" pages for a forum, and so on).

Yahoo also does this, and they actually use the anchor text of some incoming link (where that anchor text is NOT some generic "click here" type message) to "invent" a title for the excluded page.

jimbeetle

4:55 pm on May 31, 2007 (gmt 0)

I've heard of the Yahoo practice, in fact I think one of its reps described it in a thread a few months back. That Google is also doing it now is news to me. Looks like SEs are really, really pushing the envelope when it comes to webmaster directions on which pages to list or not.

jimbeetle

5:12 pm on May 31, 2007 (gmt 0)

Hey g1, your mailbox is full.

I didn't see any naked title entries in the Google example you sent, just two normal entries, then a string of URL-only entries. Must be pulling results from different places.

Not that I'm doubting it, mind you.