Forum Moderators: Robert Charlton & goodroi
Hi Guys who know
1, Supplemental pages also get cached, I see this
2, Why does a page get marked as supplemental
3, How are supplemental pages treated during keyword search by web user using google search
4, Do supplemental pages get re-spidered
5, Are changes to supplemental pages re cached
6, Does a page stop being supplemental as soon as google system has fresh spider cache to re evaluate the page eg. directory pages initially light on content but now filled with human edited enteries
7, Are the anecdotal stories of 1 year supplemental status true :-)
Cheers
[edited by: tedster at 10:09 am (utc) on Dec. 22, 2006]
1. Supplemental pages also get cached.
That is clearly so. Let's clarify this a bit -- it is not a "page" that is marked Supplemental, because there is no technical definition for the word "page". It is a url that is given Supplemental status -- and even more precisely, it is a particular cache DATE for that url. A different cache date of the same url may still appear in the regular search results.
2. Why do urls get marked as Supplemental Results?
GoogleGuy's original stated purpose for their Supplemental Index [webmasterworld.com] is "to augment the results for obscure queries." So what does this mean in practice? Here is a summary from g1smd -- factors that my result in a url being placed in the Supplemental Index:
For a page that goes 404 or the domain expires, Google keeps a copy of the very last version of the page that they saw, as a Supplemental Result and show it in the index when the number of other pages returned is low. The cached copy will be quite old.For a normal site, the current version of the page should be in the normal index, and the previous version of the page is held in the supplemental index.
If you use search terms that match the current content, then you see that current content in the title and snippet, in the cache, and on the live page.
If you search for terms that were only on the old version of the page, then you see those old search terms in the title and snippet, even though they are not in the cache, nor found on the live page. That result will be marked as supplemental.
There are also supplemental results where the result is for duplicate content [webmasterworld.com] [the link goes to a detailed discussion] of whatever Google considers to be the "main" site. These results seemingly hang around forever, with an old cache, a cache that often no longer reflects what is really on the page right now. Usually there is no "normal" result for that duplicate URL - just the old supplemental, based on the old data. On the other hand, the "main" URL will usually have both a normal result and a supplemental result (but not always).
If you have multiple URLs leading to the same content, "duplicate content", some of the URLs will appear as normal results and some will appear as Supplemental Results. The Supplemental Results will hang around for a long time, even if the page is edited or is deleted. Google might filter out some of the duplicates, removing them from their index: in that case what is left might just be a URL that is a Supplemental Result.
The fix for this is to make sure that every page has only one URL that can access it; make sure that any alternatives cannot be indexed. Run Xenu LinkSleuth over the site and make sure that you fix every problem found. Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.
Also, make sure that every page has a unique title tag and a unique meta description, as failing to do so is another problem that can hurt a site.
3. How are supplemental urls treated during keyword search by web user using google search?
They may still be returned for searches that are relatively narrow in focus or use obscure words.
4. Do supplemental urls get re-spidered?
Yes, but on a less frequent schedule than urls that are in the regular index.
5. Are changes to supplemental urls re-cached?
Keeping in mind what I mentioned above - that a Supplemental Result is a "url+cache date" - the answer is yes.
6. Does a url stop being supplemental as soon as google system has fresh spider cache to re-evaluate the page?
If the original reason for the Supplemental status is no longer there, yes. The key is to stay foucsed on whether a relatively current version of the url is appearing in the regular search results. Some older cache date, or some alternate url that pointed to the same content, may still show in the [site:] results. If a url is getting ranked as a regular search result, then there is no reason for concern if a Supplemental version is also present, as long as the root cause for the Supplemental status is now removed.
7. Are the anecdotal stories of 1 year supplemental status true?
Yes, and even longer. Even if a newer version of a url is in the regular index, the older Supplemental version can hang around for a year or more. In fact, even when it's no longer visible, my suspicion is that Google never completely throws that data away.
--------------------------
Late Addition: Here's a quote attributed to Matt Cutts:
...having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.
[edited by: tedster at 3:14 am (utc) on Oct. 13, 2006]
Why do urls get marked as Supplemental Results?
But of course it doesn't seem to happen in a short peried of time.
Would you now save the page as a new URL and then 301 the old page to the new one and update your internal links to the new page?
Or do you just have to wait for an update whenever and hope you will be relisted?
Also I have pages that I know are not duplicate content and are unique with plenty of content, what is the most likely reason for these to be suplimental?
One final question, I have 301'd all of my urls to the same page
mysite.co.uk/ mysite.co.uk/index.htm/mysite.co.uk/index.html etc.
I have had my first cache date, but these urls are still in the index. Does it take more than one update, or should it have beeen corrected first time?
Thanks
Mark
Are others finding that; Poor linking structure leads to low Page Rank which leads to being crawled once in a blue moon, which results in the page becoming supplemental?
To add the cherry to the sundae I decided to delete my old sitemap from google and was unable to add a new one as I had checked the "non-www" preference button for my www.mysite.com account. I had to add a new non-www site with and upload a new clean final sitemap which google grabbed immediately.
In the time since Google updated last week my number of indexed pages soared and my traffic tripled. The traffic remains high and now my pagerank on my new site has appeared as a 4. However, when i type a site:mysite search I now get both www and non-www listings for my site as 1st and second listing...a clear sign that Google is seeing two separate sites with duplicate content. And today almost every page in the index has gone supplemental.
So, where exactly was the mistep? I have corrected a lot of the broken urls from the new site and have 301 redirected any high ranking pages from the old site to the new one. The only problem i can see is that since I have both the canonicalizing www--->non www redirect AND some url rewrites that some of the URLS are being double 301 redirected. Also, previously my domain used the www and I changed to non-www about two weeks ago at which time I added the canonical 301 code. Server header tests show proper redirects and 200 OK for the non-www version. Google has become the bane of my existence
There is no way that they are unique. Every forum that I have ever looked at, exposes at least four, often as many as twelve or more, URLs for every piece of content on the site.
In addition, the bot will see tens or hundreds of thousands of URLs that just return the message "Error: you are not logged in" for URLs that logged-in users would use to start a new thread, reply to a thread, send a private message, and so on.
Check the threads at WebmasterWorld talking about vBulletin, for example, just a few months ago for more details. [webmasterworld.com...]
Additionally do make sure that you have a site-wide 301 redirect from non-www to www as that is another form of duplicate content waiting to cause you trouble.
=====================
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www\.)?domainname.com$
RewriteRule ^(.*)$ [domainname.com...] [R=301]
=====================
- URLs that used to show content but are now redirecting. These are dropped after one year.
- URLs that used to show content but are now 404. These are dropped after one year.
- URLs that are Duplicate Content and still return that content as "200 OK". These are the only ones that need any more fixing.
- URLs that are relegated there due to very low PageRank for the site as a whole. They are not "good enough" for the main index. This accounts for perhaps 1% of the Supplemental Results that I see.
[webmasterworld.com...]
You can see the HTTP response codes by upgrading your web browser to Mozilla, or Firefox, or Seamonkey and then installing the Live HTTP Headers extension.
Alternatively get WebBug but do make sure that you always test using the HTTP/1.1 setting.
[edited by: g1smd at 9:12 pm (utc) on Oct. 4, 2006]
A better version in:
[webmasterworld.com...]
[webmasterworld.com...]
Home -> Forums Index -> Tools
Choose option: Server Header Checker
A better version in:
[webmasterworld.com...]
There were quite a number of variations of the 301 format in that thread. The final one that jdMorgan posted is:==========================
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?
RewriteRule ^(([^/]*/)*)index\.html?$ http://example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]==========================
Could someone who has expertise in this stuff confirm that version to be the right one? Since .htaccess is so powerful, we do not want to make a mistake. Thanks....
..............................
Despite it seeming they are, pages are not the same as URLs. The same URL can have both supplemental and non-supplemental pages indexed. Getting a page indexed normally does nothing to get rid of the supplemental, you just can't see it, unless you search for words on the supplemental page not on the normal page.
Supplementals have nothing to do with crawl frequency, other than URLs infrequently crawled become supplementals more often and more easily. A URL crawled every day can still have a supplemental page associated with it.
Even if a url is getting ranked as a regular search result, you should be greatly concerned if a supplemental version also exists, even if the cause of supplemental status has been fixed. Supplemental pages of a URL can become dominant over a normally indexed page for the same URL, and having a hidden supplemental basically always hurts the ranking of a healthy page.
Hidden supplenetals are like sweeping excrement under a rug. It's still there. It's still bad. It's not fixed. It will cause stinky problems until it is completely removed by Google, something that could take up to a couple years.
This can be useful: site:www.domain.com -inurl:www
There is more than one type of Supplemental Result. You need to look at the HTTP response code for each one too.
[webmasterworld.com...]
[webmasterworld.com...]
This should also help too:
[webmasterworld.com...]
See the second of three blocks of text here: [webmasterworld.com...] begins "Supplemental Results are...":
If the original reason for the Supplemental status is no longer there, yes.
During the most recent supplemental index update, I noticed there was a delay between global supplemental cache refresh and re-evaluation.
Refreshing the cache is straight-forward. Pull a url from a database, recrawl it, and update the database.
Evaluating those pages, measuring trust, checking for duplicates, etc involves more work, especially when you're dealing with a site with hundreds of thousands of urls and comparing those pages with the rest of the web.
Until a site's entire supplemental caches are refreshed, how do you identify dupes? I don't think you can.
I noticed pages I fixed show up with fresh cache in the supplemental index right after the recent cache refresh, and I couldn't understand what went wrong. But after a week or two, some of those pages started moving over into the main index.
These are what I call the "historical supplemental results" where if you search for a word newly added to the page you will see that URL as a normal result, but if you search for a word from the old version of the page it will continue to show up as a Supplemental Result - often for a very long time (like a year).
Everything is fine as long as all other alternative URLs for that content are now returning either 301 or 404, or contain a meta robots noindex tag. Those alternative URLs would be non-www vs. www, multiple domains, alternative URL parameters, capitalisation issues (IIS), http vs. https, etc.
The URLs that you no longer want indexed will show up only as Supplemental for a very long while, and the URL that you do want to show up, will show in the main index but will also contain the "historical" component that remains supplemental too.
having supplemental results these days is not such a bad thing. In your case, I think it just reflects a lack of PageRank/links. We've got your home page in the main index, but if you look at your site ... you'll see not a ton of links ... So I think your site is fine ... it's just a matter of we have to select a smaller number of documents for the web index. If more people were linking to your site, for example, I'd expect more of your pages to be in the main web index.
[edited by: tedster at 11:12 pm (utc) on Oct. 8, 2006]