Forum Moderators: Robert Charlton & goodroi
ive submitted a removal request on webmaster tools but shouldnt this link have not been spidered if the landing page says in robots.txt and on page meta tags, do not spider?
I have blocked googlebot from crawling the RSS feed with my robots.txt file and requested for the removal of the xml file, but I noticed that this request is still pending. I cannot stop others from linking to this feed but, is blocking googlebot from crawling it enough?
The simple fix is to first expose the page to the bots by removing the disallow in robots.txt. Then, place a robots noindex meta in the head of the page. (Removing the disallow in robots.txt allows the bots to see and (hopefully) obey the noindex.)
It's possible that the page will show as a supplemental result for quite awhile, but will eventually fall completely out of the index.
There have been a couple of occasions where Google showed "full entries" in the SERPs for excluded pages. There is a thread from about this time last year when that happened: [webmasterworld.com...]
.
I do occasionally see single-line entries that do have a title, but no snippet. I do see them on pages with very high PR and having many incoming links. It is as if Google can't believe that the page should not be in the SERPs and tries very hard to include it. In the cases that I have looked at, they have been pages that did not need to be included in the index (such as main "admin" pages for a forum, and so on).
.
Yahoo also does this, and they actually use the anchor text of some incoming link (where that anchor text is NOT some generic "click here" type message) to "invent" a title for the excluded page.