Forum Moderators: Robert Charlton & goodroi
The site has never had very strong incoming links.
I am hoping you could share your thoughts as to why sites go into the supplemental index. Does it relate to incoming links or lack of them? Does it related to lack of unique content on pages? Internal linking structure?
What can be done to get sites out of the supplemental index? Has anyone done this sucessfully and if so, how?
If you could back your thoughts up with examples or things you have seen that would be much appreciated.
Thanks.
[edited by: ciml at 12:53 pm (utc) on July 27, 2005]
[edit reason] No real domains in examples please. [/edit]
Have you checked for the www vs non-www [google.co.uk] issue? There are numerous threads here about that.
It appears that pages spidered only by the Mozilla googlebot will appear as Supplemental Results.
The Mozilla bot will spider URL's with more than two parameters, whereas the standard Googlebot won't.
The Mozilla Googlebot will also try and find URL's in JavaScript.
Do your pages have multiple parameters or only accesible via JavaScript?
Some people have reported pages bouncing back and forwards between the Supplemental and standard index depending on which bot visited last, although I have not observed this myself on my site.
Which of the above two bot visited your pages last?
Since about July 4th, Googlebot/2.1 has only grabbed robots.txt. My site pages are actually being crawled by Mozilla/5.0 which fits into your theory about pages being crawled by that bot being in the supplemental index.
Here is something interesting though. As I replied before to oddsod, I do have a 301 redirect from non-www to www and this is functionning correctly on the site.
Oddly, when I do site: , 90% of the pages that Google shows in the cache are pages from non-www. The cache of these non-www pages is also months old. Even the 10% of pages that Google is showing with the www show very old data and these pages are also in the supplemental index. I placed the redirect from non-www to www many months ago. Don't know the exact date but I am guess sometime between February and April.
My pages are static html and have no parameters. I don't have Javascript on my site. I do however have some pages that are dynamic pages. These dynamic pages have a no index, no follow tag on them. I did notice before that Google would crawl these pages (despite the no index, no follow tag) and then display them as URL only. Perhaps it sent in Mozilla 5.0 if that is what is supposed to spider those pages.
Okay, so what do I do now? How can I get myself out of the supplemental index? How can I even get it to spider my www pages. Like I said, my redirect is functionning properly (I have checked it with the tool here at WW.)
I have a page that is indexed normally for current content, shows in the SERPs when you search for that current content, is cached every few days, and the snippet shows a part of the current content..... however if you do a search for a word that was removed from the page over two years ago, then that same page is still returned as a match but this time it is flagged as a supplemental result. The cache shown is the same "new" and up to date one (so the word you searched for is nowhere in this cache [it even says "the following terms only appear in links pointing to this page..." (which is untrue - the only link pointing to it is that actual Google SERP)]). This type of supplemental result is a pain as it takes you to a page where that content used to be, shows you a snippet with the information you are looking for but then you find that neither the real page, nor the Google cache contains the information at all.
I have 5 friends who all have sites with the same issues. I got them all to contact Google help within the same week and ask the same questions using almost exactly the same words. The responses from Google went from "definately yes", and "yes", through to "no" and "definately no" when we said about trying to get the old data deleted from their database.
When I do site:, all my pages are all supplemental. 90% of my pages show non-www pages which have not been accessible on my server for months. 10% show www pages but the cache of these pages is also months old.
What did your friends specifically ask Google? What were the responses? What action did they take and what were the results?
So far I have come up with the following as courses of action I might take:
1) Submit the site using the "Add URL to Google" hoping that they will start fresh.
2) Submit my site using Google sitemaps.
Any other ideas?
Redirect permanent /directoryxyz h**p://www.***.com/directoryabc
This seems to fix some problems.
I am using this to fix problems where google is reporting ten times the number of files than exist in the diretory. I am also looking for it to fix where everything has gone URL.
Anyone have thoughts on this?
Our site has been reporting 80,000 pages where there is only 20,000. Every since we have done the above it is slowy getting back to correct pages counts.
I too have a large site like yours. Google was reporting 3 and 4 times the amount of actual pages as well. Now it is reporting a close to accurate count but the pages are in the supplemental index. Most pages reported are non-www and showing months old data despite there being in place a redirect from non-www to www.
Our site has been reporting 80,000 pages where there is only 20,000. Every since we have done the above it is slowy getting back to correct pages counts.
It seems to be working for you in terms of getting accurate page counts but were your pages in the supplemental index and did this help get them out of the supplemental?
Yahoo and MSN handle the 301-redirect really well. Google is a little slower at this.
Also, the page count goes down radically.
Not sure if this will really help out. We lost most of Google traffic after Feb. 2nd. Therefore, with Google, there is not much left for us to lose. Only MSN and Yahoo provide traffic. I have been trying to somehow get the site back in good graces with Google.
It is good to know that Yahoo and MSN handle the redirect well. I too have a feeling that links may help getting you out of the supplemental index.
I am going to be getting press releases as well for my site when I release some new content which should be coming soon. It is nice to know that helped one of your sites.