Supplemental Results: What exactly are they

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Supplemental Results: What exactly are they

vite_rts

10:52 am on Oct 4, 2006 (gmt 0)

< For those who are completely new to this topic, Google sometimes returns
a listing on their search results page marked "Supplemental Result' in green
letters. Such results, taken from Google's separate "Supplemental Index" are
the topic of this thread. >

Hi Guys who know

1, Supplemental pages also get cached, I see this

2, Why does a page get marked as supplemental

3, How are supplemental pages treated during keyword search by web user using google search

4, Do supplemental pages get re-spidered

5, Are changes to supplemental pages re cached

6, Does a page stop being supplemental as soon as google system has fresh spider cache to re evaluate the page eg. directory pages initially light on content but now filled with human edited enteries

7, Are the anecdotal stories of 1 year supplemental status true :-)

Cheers

[edited by: tedster at 10:09 am (utc) on Dec. 22, 2006]

tedster

3:10 pm on Oct 4, 2006 (gmt 0)

1. Supplemental pages also get cached.
That is clearly so. Let's clarify this a bit -- it is not a "page" that is marked Supplemental, because there is no technical definition for the word "page". It is a url that is given Supplemental status -- and even more precisely, it is a particular cache DATE for that url. A different cache date of the same url may still appear in the regular search results.

2. Why do urls get marked as Supplemental Results?
GoogleGuy's original stated purpose for their Supplemental Index [webmasterworld.com] is "to augment the results for obscure queries." So what does this mean in practice? Here is a summary from g1smd -- factors that my result in a url being placed in the Supplemental Index:

For a page that goes 404 or the domain expires, Google keeps a copy of the very last version of the page that they saw, as a Supplemental Result and show it in the index when the number of other pages returned is low. The cached copy will be quite old.
For a normal site, the current version of the page should be in the normal index, and the previous version of the page is held in the supplemental index.
If you use search terms that match the current content, then you see that current content in the title and snippet, in the cache, and on the live page.
If you search for terms that were only on the old version of the page, then you see those old search terms in the title and snippet, even though they are not in the cache, nor found on the live page. That result will be marked as supplemental.
There are also supplemental results where the result is for duplicate content [webmasterworld.com] [the link goes to a detailed discussion] of whatever Google considers to be the "main" site. These results seemingly hang around forever, with an old cache, a cache that often no longer reflects what is really on the page right now. Usually there is no "normal" result for that duplicate URL - just the old supplemental, based on the old data. On the other hand, the "main" URL will usually have both a normal result and a supplemental result (but not always).
If you have multiple URLs leading to the same content, "duplicate content", some of the URLs will appear as normal results and some will appear as Supplemental Results. The Supplemental Results will hang around for a long time, even if the page is edited or is deleted. Google might filter out some of the duplicates, removing them from their index: in that case what is left might just be a URL that is a Supplemental Result.
The fix for this is to make sure that every page has only one URL that can access it; make sure that any alternatives cannot be indexed. Run Xenu LinkSleuth over the site and make sure that you fix every problem found. Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.
Also, make sure that every page has a unique title tag and a unique meta description, as failing to do so is another problem that can hurt a site.

3. How are supplemental urls treated during keyword search by web user using google search?
They may still be returned for searches that are relatively narrow in focus or use obscure words.

4. Do supplemental urls get re-spidered?
Yes, but on a less frequent schedule than urls that are in the regular index.

5. Are changes to supplemental urls re-cached?
Keeping in mind what I mentioned above - that a Supplemental Result is a "url+cache date" - the answer is yes.

6. Does a url stop being supplemental as soon as google system has fresh spider cache to re-evaluate the page?
If the original reason for the Supplemental status is no longer there, yes. The key is to stay foucsed on whether a relatively current version of the url is appearing in the regular search results. Some older cache date, or some alternate url that pointed to the same content, may still show in the [site:] results. If a url is getting ranked as a regular search result, then there is no reason for concern if a Supplemental version is also present, as long as the root cause for the Supplemental status is now removed.

7. Are the anecdotal stories of 1 year supplemental status true?
Yes, and even longer. Even if a newer version of a url is in the regular index, the older Supplemental version can hang around for a year or more. In fact, even when it's no longer visible, my suspicion is that Google never completely throws that data away.

--------------------------

Late Addition: Here's a quote attributed to Matt Cutts:

...having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.

[edited by: tedster at 3:14 am (utc) on Oct. 13, 2006]

sandpetra

3:44 pm on Oct 4, 2006 (gmt 0)

I've only noticed Supplemental Pages occur on my (small sites) when navigation is rerouted and the page is no longer part of the main navigation but has a page in Google OR the page is too deep away from the home page - in recent oberservations level 5 (although the latter could be due to pages with little or no unique content).

lmo4103

4:12 pm on Oct 4, 2006 (gmt 0)

Why do urls get marked as Supplemental Results?

If you change the url for a page that is already indexed, then both the new url and the orignal url may become supplemental for a time. During this unspecified time, both urls may be absent from the regular index. The more urls that are changed at once, the more likely it is for this to happen.

Frederic1

4:17 pm on Oct 4, 2006 (gmt 0)

7. Are the anecdotal stories of 1 year supplemental status true?

I remember Google Guy saying that they were working on making the supplemental results refresh faster.

flanok

4:47 pm on Oct 4, 2006 (gmt 0)

So supposing you had copy that you knew was duplicate and so you rewrite the content to be unique, asuming Google would then relist.

But of course it doesn't seem to happen in a short peried of time.

Would you now save the page as a new URL and then 301 the old page to the new one and update your internal links to the new page?

Or do you just have to wait for an update whenever and hope you will be relisted?

Also I have pages that I know are not duplicate content and are unique with plenty of content, what is the most likely reason for these to be suplimental?

One final question, I have 301'd all of my urls to the same page
mysite.co.uk/ mysite.co.uk/index.htm/mysite.co.uk/index.html etc.
I have had my first cache date, but these urls are still in the index. Does it take more than one update, or should it have beeen corrected first time?
Thanks

Mark

randle

5:31 pm on Oct 4, 2006 (gmt 0)

Ever since Big Daddy thousands of pages from our discussion boards have been deemed supplemental. They are unique and have good content so the only issue we can think of is the lack of Page Rank, and subsequent very low crawl frequency. Discussion boards are challenging when it comes to strong linking structures.

Are others finding that; Poor linking structure leads to low Page Rank which leads to being crawled once in a blue moon, which results in the page becoming supplemental?

ichthyous

5:59 pm on Oct 4, 2006 (gmt 0)

I just replaced an old site with a new one. So now I have three sites, the old one I left up and have been transitioning away from, the new live one and an online store with the same content, but which has never been fully indexed becuase the dynamic URLS are a mile long. Even though I had a noindex tag in my robots.txt Google was indexing the new site as I was building it. During the course of the build the urls changed three times. When the site went live last week Google updated its index and suddenly I had similar content from three sites at the same time and tons of broken urls to fix. I did a major amount of 301 redirecting from the old URLs to the new ones and have ALWAYS had in place a www--->non-www 301 redirect in place.

To add the cherry to the sundae I decided to delete my old sitemap from google and was unable to add a new one as I had checked the "non-www" preference button for my www.mysite.com account. I had to add a new non-www site with and upload a new clean final sitemap which google grabbed immediately.

In the time since Google updated last week my number of indexed pages soared and my traffic tripled. The traffic remains high and now my pagerank on my new site has appeared as a 4. However, when i type a site:mysite search I now get both www and non-www listings for my site as 1st and second listing...a clear sign that Google is seeing two separate sites with duplicate content. And today almost every page in the index has gone supplemental.

So, where exactly was the mistep? I have corrected a lot of the broken urls from the new site and have 301 redirected any high ranking pages from the old site to the new one. The only problem i can see is that since I have both the canonicalizing www--->non www redirect AND some url rewrites that some of the URLS are being double 301 redirected. Also, previously my domain used the www and I changed to non-www about two weeks ago at which time I added the canonical 301 code. Server header tests show proper redirects and 200 OK for the non-www version. Google has become the bane of my existence

tedster

6:33 pm on Oct 4, 2006 (gmt 0)

If your traffic remains high, then give Google some time to sort out all those changes you threw at it. And expect some duplicate urls to remain showing as Supplemental for a long while.

ichthyous

6:44 pm on Oct 4, 2006 (gmt 0)

Perhaps it hasn't started to crash yet because it hasn't seeped into all the data centers. In 5 years since my site's been up i have never seen both www and non-www versions of the domain name in the index at same time. Perhaps my big mistake was to switch away from the www version at all as it was stable for years. I have also noticed something with Google...whenever I see a high number of supps in the index a lot of the traffic converts to foreign searches. It's like google passes off the supplementals on foreigners or something!

g1smd

8:16 pm on Oct 4, 2006 (gmt 0)

>> Ever since Big Daddy thousands of pages from our discussion boards have been deemed supplemental. They are unique and have good content... <<

There is no way that they are unique. Every forum that I have ever looked at, exposes at least four, often as many as twelve or more, URLs for every piece of content on the site.

In addition, the bot will see tens or hundreds of thousands of URLs that just return the message "Error: you are not logged in" for URLs that logged-in users would use to start a new thread, reply to a thread, send a private message, and so on.

Check the threads at WebmasterWorld talking about vBulletin, for example, just a few months ago for more details. [webmasterworld.com...]

Reno

8:33 pm on Oct 4, 2006 (gmt 0)

Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.

For those of us that are .htaccess challenged, what is the correct format to use to satisfy the quote above? Does something like this work:

=====================

RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www\.)?domainname.com$
RewriteRule ^(.*)$ [domainname.com...] [R=301]

=====================

g1smd

8:42 pm on Oct 4, 2006 (gmt 0)

Supplemental Results are:

- URLs that used to show content but are now redirecting. These are dropped after one year.

- URLs that used to show content but are now 404. These are dropped after one year.

- URLs that are Duplicate Content and still return that content as "200 OK". These are the only ones that need any more fixing.

- URLs that are relegated there due to very low PageRank for the site as a whole. They are not "good enough" for the main index. This accounts for perhaps 1% of the Supplemental Results that I see.

[webmasterworld.com...]

You can see the HTTP response codes by upgrading your web browser to Mozilla, or Firefox, or Seamonkey and then installing the Live HTTP Headers extension.

Alternatively get WebBug but do make sure that you always test using the HTTP/1.1 setting.

[edited by: g1smd at 9:12 pm (utc) on Oct. 4, 2006]

g1smd

8:49 pm on Oct 4, 2006 (gmt 0)

This seemed important at the time that I wrote it: [webmasterworld.com...]

flanok

8:53 pm on Oct 4, 2006 (gmt 0)

Hi
How do I find out if a URL is a 404 or 200?
Thanks
mark

g1smd

8:59 pm on Oct 4, 2006 (gmt 0)

The redirect code for "index" to "/" and for non-www to www for multiple domains:
[webmasterworld.com...]

A better version in:
[webmasterworld.com...]

lmo4103

9:11 pm on Oct 4, 2006 (gmt 0)

To check server response status: 301, 200, 404, etc.

[webmasterworld.com...]

Home -> Forums Index -> Tools

Choose option: Server Header Checker

Reno

9:19 pm on Oct 4, 2006 (gmt 0)

A better version in:
[webmasterworld.com...]
There were quite a number of variations of the 301 format in that thread. The final one that jdMorgan posted is:
==========================
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?
RewriteRule ^(([^/]*/)*)index\.html?$ http://example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]
==========================
Could someone who has expertise in this stuff confirm that version to be the right one? Since .htaccess is so powerful, we do not want to make a mistake. Thanks....
..............................

flanok

9:54 pm on Oct 4, 2006 (gmt 0)

Header codes
Great thanks for that I now know how to get header codes.

I have tried the first 10 pages and they have returned 200 code.

these need fixed.

Do I 301 to another rewritten page?

Rewrite content on page?

or is this a non www or www issue?

Thanks
Mark

steveb

10:11 pm on Oct 4, 2006 (gmt 0)

Supplementals are pages.

Despite it seeming they are, pages are not the same as URLs. The same URL can have both supplemental and non-supplemental pages indexed. Getting a page indexed normally does nothing to get rid of the supplemental, you just can't see it, unless you search for words on the supplemental page not on the normal page.

Supplementals have nothing to do with crawl frequency, other than URLs infrequently crawled become supplementals more often and more easily. A URL crawled every day can still have a supplemental page associated with it.

Even if a url is getting ranked as a regular search result, you should be greatly concerned if a supplemental version also exists, even if the cause of supplemental status has been fixed. Supplemental pages of a URL can become dominant over a normally indexed page for the same URL, and having a hidden supplemental basically always hurts the ranking of a healthy page.

Hidden supplenetals are like sweeping excrement under a rug. It's still there. It's still bad. It's not fixed. It will cause stinky problems until it is completely removed by Google, something that could take up to a couple years.

chewy

10:17 pm on Oct 4, 2006 (gmt 0)

Pls, what is the (current) definitive way to see which pages are supplemental?

g1smd

10:19 pm on Oct 4, 2006 (gmt 0)

What TYPE of Supplementals do you want to see?

This can be useful: site:www.domain.com -inurl:www

There is more than one type of Supplemental Result. You need to look at the HTTP response code for each one too.

[webmasterworld.com...]

This should also help too:
[webmasterworld.com...]

iblaine

11:10 pm on Oct 4, 2006 (gmt 0)

Supplemental results are those pages that are not good enough to make it into the main index.

So what makes a page valuable? A site with a high PR will have the ability to allow more pages in the main index. Pages with quality IBLs will be ok.

g1smd

11:12 pm on Oct 4, 2006 (gmt 0)

I'm not happy with that "not good enough" tag. There is much more to it than that.

See the second of three blocks of text here: [webmasterworld.com...] begins "Supplemental Results are...":

iblaine

12:00 am on Oct 5, 2006 (gmt 0)

The "not good enough" comment came straight from a google engineer. Surely a more accurate definition exists. IMHO, the general idea behind supplemental results is to make room for pages that Google thinks aren't worthy of being in the main index.

lmo4103

2:25 am on Oct 5, 2006 (gmt 0)

Can you quote the "not good enough" comment that came straight from a google engineer?

gohome

8:59 am on Oct 5, 2006 (gmt 0)

Is there a known reason why the Supplemental urls show before the normal urls when doing a site:www.domain.com? My index page seems buried down on the 3rd results page.

Halfdeck

10:01 am on Oct 5, 2006 (gmt 0)

If the original reason for the Supplemental status is no longer there, yes.

During the most recent supplemental index update, I noticed there was a delay between global supplemental cache refresh and re-evaluation.

Refreshing the cache is straight-forward. Pull a url from a database, recrawl it, and update the database.

Evaluating those pages, measuring trust, checking for duplicates, etc involves more work, especially when you're dealing with a site with hundreds of thousands of urls and comparing those pages with the rest of the web.

Until a site's entire supplemental caches are refreshed, how do you identify dupes? I don't think you can.

I noticed pages I fixed show up with fresh cache in the supplemental index right after the recent cache refresh, and I couldn't understand what went wrong. But after a week or two, some of those pages started moving over into the main index.

g1smd

6:54 pm on Oct 5, 2006 (gmt 0)

>> I noticed pages I fixed show up with fresh cache in the supplemental index <<

These are what I call the "historical supplemental results" where if you search for a word newly added to the page you will see that URL as a normal result, but if you search for a word from the old version of the page it will continue to show up as a Supplemental Result - often for a very long time (like a year).

Everything is fine as long as all other alternative URLs for that content are now returning either 301 or 404, or contain a meta robots noindex tag. Those alternative URLs would be non-www vs. www, multiple domains, alternative URL parameters, capitalisation issues (IIS), http vs. https, etc.

The URLs that you no longer want indexed will show up only as Supplemental for a very long while, and the URL that you do want to show up, will show in the main index but will also contain the "historical" component that remains supplemental too.

tedster

3:28 pm on Oct 6, 2006 (gmt 0)

Here's a quote attributed to Matt Cutts:

having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.

[edited by: tedster at 11:12 pm (utc) on Oct. 8, 2006]

This 71 message thread spans 3 pages: 71