Page is a not externally linkable
g1smd - 12:32 pm on Mar 6, 2006 (gmt 0)
Lack of inbound links is probably one factor. I'd like to see what other factors there are. One is that the page no longer exists, or even that the domain no longer exists. Another is when you exclude the page using "robots.txt", the page re-appears in the supplemental index a few months later with a cache date of just before the date of when the "robots.txt" file first excluded that URL. Google seems to be saying "we were allowed to index it way back then, so we will still leave that copy in our index now". However, the reason for putting the URL in robots.txt in the first place was "oh #*$!, I didn't want that page indexed". . >> To me it has always been because the page has not got crawled recently - the page has not necessarily got a penalty - it has just not been crawled. << >> As soon as this page is crawled again then it no longer shows as supplemental << I have seen pages that are crawled weekly show as a supplemental result for words that were in the previous version of the page, and as a normal result when you search for current content. Google want to hang on to the previous version of the page for some reason. Even when they are showing the old information in the snippet, the cache that it links to is one from only a few days ago. It is infuriating that you amend your website to update all the old email addresses and old phone numbers, and three years later Google is still showing them in the snippets, even though that data is no longer on the real page, nor in any cache that they have made public in the last two years or more; except for some pages that are "fully supplemental" where they continue to show a cache from January 2004. Wake up! It is March 2006 now.
>> I still think people need to ask themselves why a page goes supplemental. <<