Forum Moderators: open
Been busy, how 'bout y'awl?
Just did a site:example.com for a client to check google status, and 3 "Supplemental Results" came up, plus a url only for the home page.
Interesting thing is the "Supplemental Results" are pages that have been deleted since June of last year, and when the link is clicked, it pulls the 404 error message, as should be so.
So what is a "Supplemental Result", why are defunct pages appearing and why OH why can't I get this darn site/url into the index proper?
I've posted a couple of times previoulsy about this problem, bounced a few ideas around with the fine folk here, but still no joy after nearly a YEAR!
(No, no spamdexing, no hidden keywords, no other stupid spammer crud), except I did go xml compliant. This site and one other xml code compliant site with the same non Google indexed problem appear in other SEs fine, but G seems to have spat the dummy.
Looking forward to responses.
JP
[edited by: ciml at 3:56 pm (utc) on April 13, 2004]
[edit reason] Examplified domain. [/edit]
If I do site:mysite.com .. I do see about 4 pages from my site and my home page is NOT listed as "Supplimental Result". If I take a phrase from my home page and do a phrase search (with quotes) on google, I get only my home page as result is my homepage IS listed as "Supplimental Result".
Googlebot has not visited for 3 weeks .... and my PR is still zero, while I have > 20 incoming links (PR2 - PR6) for almost 2 months now.
Does this mean my homepage is really becoming a "Supplimental Result".
Example:
www.site.com/location/abc.pl
File Format: Unrecognized - View as HTML
Supplemental Result - Similar pages
Thanks for the explanation on "Supplemental Pages".
Must admit, seems a bit redundant unless you want to check the page using G cache.
Still begs the question tho', is it our code or G's, or just "one of those things"?
Sigh, guess the next stop is a server switch to see if THAT works!
Hooroo
JP
[edited by: jpalmer at 12:50 am (utc) on April 15, 2004]
Not too busy. Taxes, search engines. You know. :) Supplemental results are results that we can show for some more arcane queries that typically have fewer results. Danny Sullivan had a good write-up about them a little while ago: http://searchenginewatch.com/searchday/article.php/3071371
It is a different index, so some of the results can be a little older, but some of the supplemental results are pretty up-to-date too.
[webmasterworld.com...]
- Use an HTTP sniffer to look at the HTTP headers and make sure that they are all OK, that they return the correct (200) response code, and the last modified date of the page, and especially check what MIME type the pages are flagged as being.
- Run all of your site through the HTML validator to make sure the HTML code is without errors, to be absolutely sure that you aren't tripping the spiders in any way.
- Review all the meta tags. Cut out unecessary stuff (I use meta Content-Type, meta Content-Language, <title>, meta description, meta keywords, and just a few others)
- Build more incoming links from quality sites of both diverse ownership and location.
- Run a link checker over your site to make sure that there are no gross navigation arrors, dead links, and so on.
- Get a sitemap page up, linking to every page of the site in plain HTML code. Link to the sitemap page from the footer of every page of your site.
- Make sure that you aren't disallowing stuff you that you want to be indexed, by having a botched robots.txt file.
- Check what happens if you try to access your site as http://domain.com/ and then as http://www.domain.com/ instead.
Want any more?
excellent list, always good to get a reminder about the housekeeping, but mostly been there done that, all (apparently) AOK.
I guess an HTTP sniffer is next, but really ... why G when other SEs seem to have no problem with it? Or is this finally the subject of disccussion in the supporters forum at the moment?
Cheers
JP
<edited>
Have downloaded a couple of freebie HTTP sniffers
and after the usual precautions, will install and
give 'em a whirl.
I requested a while ago, that our server host support
folk thoroughly check DNS and IP addressing, which they
did, and all apparently OK, so this will be an interesting
exercise for me.
But ... still begs the question ... why only two sites previously
in the Google index just fine with validated loose 3.2 and 4.0 html,
and is it just co-incidence that I'd coded for XML 1.0 standard, and
can't get 'em in the index now?
Hooroo
JP
With all due respect to GG, this sounds like Marketing reply to hide a deficiency. It looks like a separate supplemental index is necessary only if the main index has reached some kind of limit, either an index address limit, a horsepower limit or algorithm problems.
It has also been stated elsewhere by a google person that google has 4 billion documents and 3 billion web pages. sounds like the difference is in the supplemental bucket.
about 25-30% of my pages have been relegated to the supplemental bucket. can others tell me how many of your pages are in the supplemental bucket? Maybe with enough data, we can estimate how big the problem is.
I don't believe this is simply "some kind of penalty". Or at least it is an explanation that doesn't help any. The remaining pages of my site (25-30%) rank very well as did most of the other pages before they were classified as supplemental. By the way, the pages are template driven, so the remaining as well as the supplemental pages are prety much "of the same nature".
Another interesting aspect: googlebot is aggressively crawling the site including the supplemental pages. Some pages have been taken out of supplemental but it seems the total number of non-supplemental pages don't seem to increase much. Is it possible that google is limiting total pages by domain, ip, site family?
ThomasB, how many % of your pages are classed as supplemental?
Umm, did you really mean XML or just XHTML instead? XML would be served with a different MIME type, and it might be one that Google does not recognise.
Not sure why you had to downloade a HTTP sniffer. There are several online ones. You type in the web address, and they return the result immediately.