|"Supplemental Result" on site:|
what does this mean?
Greetings and Gidday folks,
Been busy, how 'bout y'awl?
Just did a site:example.com for a client to check google status, and 3 "Supplemental Results" came up, plus a url only for the home page.
Interesting thing is the "Supplemental Results" are pages that have been deleted since June of last year, and when the link is clicked, it pulls the 404 error message, as should be so.
So what is a "Supplemental Result", why are defunct pages appearing and why OH why can't I get this darn site/url into the index proper?
I've posted a couple of times previoulsy about this problem, bounced a few ideas around with the fine folk here, but still no joy after nearly a YEAR!
(No, no spamdexing, no hidden keywords, no other stupid spammer crud), except I did go xml compliant. This site and one other xml code compliant site with the same non Google indexed problem appear in other SEs fine, but G seems to have spat the dummy.
Looking forward to responses.
[edited by: ciml at 3:56 pm (utc) on April 13, 2004]
[edit reason] Examplified domain. [/edit]
Supplimental results are 404 or redirected URLs. They come up when there aren't many results for the query you entered.
I'm still not sure why Google bother showing 404 pages, since you can't actually look at the pages (expecially if the site owner used NOARCHIVE).
I would add my "weird" experience with "Supplimental Results" and would appreciate if experts can answer this one too along with the original question.
If I do site:mysite.com .. I do see about 4 pages from my site and my home page is NOT listed as "Supplimental Result". If I take a phrase from my home page and do a phrase search (with quotes) on google, I get only my home page as result is my homepage IS listed as "Supplimental Result".
Googlebot has not visited for 3 weeks .... and my PR is still zero, while I have > 20 incoming links (PR2 - PR6) for almost 2 months now.
Does this mean my homepage is really becoming a "Supplimental Result".
I have thousands of supplemental results which were somehow generated when I had a problem with my HTTP headers ... G's cache of my page is totally empty and it cannot recognize the file type as html. But it continues to list the pages. If google returns to the page it will find a 301 to a new address with corrected headers. I wish I could do something to get google to check these again and follow the 301.
File Format: Unrecognized - View as HTML
Supplemental Result - Similar pages
Greetings again all
Thanks for the explanation on "Supplemental Pages".
Must admit, seems a bit redundant unless you want to check the page using G cache.
Still begs the question tho', is it our code or G's, or just "one of those things"?
Sigh, guess the next stop is a server switch to see if THAT works!
[edited by: jpalmer at 12:50 am (utc) on April 15, 2004]
Supplemental results are aren't necessarily 404. The coincidence is high because the reason many pages haven't been crawled in a long time is because they don't exist.
|"Been busy, how 'bout y'awl?"|
Not too busy. Taxes, search engines. You know. :) Supplemental results are results that we can show for some more arcane queries that typically have fewer results. Danny Sullivan had a good write-up about them a little while ago: http://searchenginewatch.com/searchday/article.php/3071371
It is a different index, so some of the results can be a little older, but some of the supplemental results are pretty up-to-date too.
When a whole site seems to be relegated to only being 'supplemental', how does it clear itself so that it returns to the 'normal' index? That seems to be my problem.
GG said: "It is a different index"
If my home page is listed as "Supplimental Result".. that means the home page is now part of the "different index". What can I do to bring my home page back to the main Index? Any suggestions?
Things that you can control, and which may or may not help (but do them anyway):
- Use an HTTP sniffer to look at the HTTP headers and make sure that they are all OK, that they return the correct (200) response code, and the last modified date of the page, and especially check what MIME type the pages are flagged as being.
- Run all of your site through the HTML validator to make sure the HTML code is without errors, to be absolutely sure that you aren't tripping the spiders in any way.
- Review all the meta tags. Cut out unecessary stuff (I use meta Content-Type, meta Content-Language, <title>, meta description, meta keywords, and just a few others)
- Build more incoming links from quality sites of both diverse ownership and location.
- Run a link checker over your site to make sure that there are no gross navigation arrors, dead links, and so on.
- Get a sitemap page up, linking to every page of the site in plain HTML code. Link to the sitemap page from the footer of every page of your site.
- Make sure that you aren't disallowing stuff you that you want to be indexed, by having a botched robots.txt file.
- Check what happens if you try to access your site as http://domain.com/ and then as http://www.domain.com/ instead.
Want any more?
excellent list, always good to get a reminder about the housekeeping, but mostly been there done that, all (apparently) AOK.
I guess an HTTP sniffer is next, but really ... why G when other SEs seem to have no problem with it? Or is this finally the subject of disccussion in the supporters forum at the moment?
The HTTP sniffer is just to check that you really are returning the correct reponse code and MIME type to the SE request. Don't forget that while most browsers request stuff using HTTP 1.1, that most Search Engines ask for it using HTTP 1.0, and therein might be the problem if someone botched the server configuation in some way. It is worth checking.
Gidday again g1smd,
Have downloaded a couple of freebie HTTP sniffers
and after the usual precautions, will install and
give 'em a whirl.
I requested a while ago, that our server host support
folk thoroughly check DNS and IP addressing, which they
did, and all apparently OK, so this will be an interesting
exercise for me.
But ... still begs the question ... why only two sites previously
in the Google index just fine with validated loose 3.2 and 4.0 html,
and is it just co-incidence that I'd coded for XML 1.0 standard, and
can't get 'em in the index now?
"Supplemental results are results that we can show for some more arcane queries that typically have fewer results."
With all due respect to GG, this sounds like Marketing reply to hide a deficiency. It looks like a separate supplemental index is necessary only if the main index has reached some kind of limit, either an index address limit, a horsepower limit or algorithm problems.
It has also been stated elsewhere by a google person that google has 4 billion documents and 3 billion web pages. sounds like the difference is in the supplemental bucket.
about 25-30% of my pages have been relegated to the supplemental bucket. can others tell me how many of your pages are in the supplemental bucket? Maybe with enough data, we can estimate how big the problem is.
renee, this effect is also known as "slow death" and some people, including me, guess that it's some kind of penalty.
I don't believe this is simply "some kind of penalty". Or at least it is an explanation that doesn't help any. The remaining pages of my site (25-30%) rank very well as did most of the other pages before they were classified as supplemental. By the way, the pages are template driven, so the remaining as well as the supplemental pages are prety much "of the same nature".
Another interesting aspect: googlebot is aggressively crawling the site including the supplemental pages. Some pages have been taken out of supplemental but it seems the total number of non-supplemental pages don't seem to increase much. Is it possible that google is limiting total pages by domain, ip, site family?
ThomasB, how many % of your pages are classed as supplemental?
>> XML 1.0 standard <<
Umm, did you really mean XML or just XHTML instead? XML would be served with a different MIME type, and it might be one that Google does not recognise.
Not sure why you had to downloade a HTTP sniffer. There are several online ones. You type in the web address, and they return the result immediately.