Welcome to WebmasterWorld Guest from 22.214.171.124
The current situation:
Google refuses to recognize a 301 of a Supplemental listing.
Google refuses to delete a Supplemental listing that is now a nonexistent 404 (not a custom 404 page, a literal nothing there) no matter if it is linked to from dozens of pages.
In both the above situations, even if Google crawls through links every day for six months, it will not remove the Supplemental listing or obey a 301.
Google refuses to obey its own URL removal tool for Supplementals. It only "hides" the supplementals for six months, and then returns them to the index.
As of the past couple days, I have succeeded (using the below tactics) to get some Supplementals removed from about 15% of the datacenters. On the other 85% they have returned to being Supplemental however.
Some folks have hundreds or thousands of this type of Supplemental, which would make this strategy nearly impossible, but if you have less than twenty or so...
1) Place a new, nearly blank page on old/supplemental URL.
2) Put no actual words on it (that it could ever rank for in the future). Only put "PageHasMoved" text plus link text like "MySiteMap" or "GoToNewPage" to appropriate pages on your site for a human should they stumble onto this page.
3) If you have twenty supplementals put links on all of them to all twenty of these new pages. In other words, interlink all the new pages so they all have quite a few links to them.
4) Create a new master "Removed" page which will serve as a permanent sitemap for your problem/supplemental URLs. Link to this page from your main page. (In a month or so you can get rid of the front page link, but continue to link to this Removed page from your site map or other pages, so Google will continually crawl it and be continually reminded that the Supplementals are gone.)
5) Also link from your main page (and others if you want) to some of the other Supplementals, so these new pages and the links on them get crawled daily (or as often as you get crawled).
6) If you are crawled daily, wait ten days.
7) After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches.
8) Now put 301s on all the Supplemental URLs. Redirect them too either the page with the content that used to be on the Supplemental, or to some page you don't care about ranking, like an "About Us" page.
9) Link to some or all of the 301ed Supplementals from your main page, your Removed page and perhaps a few others. In other words, make very sure Google sees these new 301s every day.
10) Wait about ten more days, longer if you aren't crawled much. At that point the 15% datacenters should first show no cache for the 301ed pages, and then hours later the listings will be removed. The 85% datacenters will however simply revert to showing the old Supplemental caches and old Supplemental listings, as if nothing happened.
11) Acting on faith that the 15% datacenters will be what Google chooses in the long run, now use the URL removal tool to remove/hide the Supplementals from the 85% datacenters.
Will the above accomplish anything? Probably not. The 85% of the datacenters may just be reflecting the fact that Google will never under any circumstances allow a Supplemental to be permanently removed. However, the 15% do offer hope that Google might actually obey a 301 if brute forced.
Then, from now on, whenever you remove a page be sure to 301 the old URL to another one, even if just to an "About Us" page. Then add the old URL to your "Removed" page where it will regularly be seen and crawled. An extra safe step could be to first make the old page a "PageHasMoved" page before you redirect it, so if it ever does come back as a Supplemental, at least it will come back with no searchable keywords on the page.
Examples of 15% datacenter: 126.96.36.199 188.8.131.52 184.108.40.206
Examples of 85% datacenter: 220.127.116.11 18.104.22.168 22.214.171.124
I have those kind of indented results, current cache, current page, different from the 'main' result (not duplicate content).
>> Some are from DMOZ listings, at least that is the case periodically with our site. <<
Yes, the ODP descriptions must be stored some place else too, but the example I was quoting was where I was searching (just a few days ago) for words that were removed from a particular web page nearly two years ago. The page that was returned as a matching result, had the snippet showing the words that I was searching for, but the cache was from only three days ago and did not contain any of the searched-for content (because it hasn't existed for nearly 2 years).
Just had my first page that was only Supplemental on the 15% datacenters have its 301 kick in on the 85% datacenters, which you would expect since it was merely a "normal" page there, but it reverted to being a Supplemental on the 15% datacenters. Since 301s seemed to work on the 15% for supplementals listed on 100% of the datacenters, I hoped this one that was only on the 15% would work, but it didn't.
So, bottom line, no 301 redirect of a Supplemental has been obeyed on 100% of Google's datacenters, even though in all cases there was a current (non-supplemental) page regularly crawled on that URL. (It sounds so confusing to even type...)
I am hoping that they will clear many of them away later in the week, but I suspect that all that will happen is that certain classes of supplemental page will simply be hidden from view in the SERPs rather than fixing the underlying problem.
It is very easy to create a supplemental result. Have a page that can be accessed through 2 URLs. Get both of them indexed, and wait for them both to be cached (and note that one URL is cached more often than the other) then change the content very slightly. If you are quick enough Google will not see them as being duplicates any more, as their duplicate checking mechanism seems to compare cache copies, or one cache copy with one live copy (and often gets it wrong). Google will then continue to present the "old" copy as a supplemental result seemingly for ever more, for those search terms that were on the page back then, and you can now change the page content to whatever you want and the other URL will rank for any search terms that are now on the live page too.
If you weren't quick enough, then don't worry. If the duplicate page checker did get the other page and hide it from the SERPs with an old cache, then after just a few weeks more it will usually reappear anyway. It does this because the page as seen through the other URL (the one that still shows in the SERPs) now has different content. The cache of the old page will come back as a supplemental result, simply because the old cache doesn't have the same content as the live page located at the other URL, so it will no longer be filtered out as a duplicate. It matters not that for the cached page with the old cache, that the live content is also different: once the page is marked as supplemental, the old cache is no longer compared to what is really on the live page at the same URL.
(Yes, this also sounds so confusing to type...)
Are the following all true?
A page is either supplemental or not - the query does not matter.
Supplemental pages effectively won't rank for any but obscure searches.
Pages can move from supplemental to regular listings but this is rare and nobody understands how it happens.
The query does matter. For a page that still exists, but the content has been changed, the page may be supplemental for one search query, often for very old page content, while also being found for the new content of the page as a normal listing. The "new" query will show a modern snippet and a modern cache. The "old" query will usually show an "old" snippet reflecting the "old" content, but the cache that it links to may be a cache from 18 months ago, or it may be a cache of the "new" content - I haven't yet worked out why one or the other may be shown. I think it just depends on whatever data was imported to Googles database at the time.
>> Supplemental pages effectively won't rank for any but obscure searches. <<
They will usually rank for content that used to be on the page, as the supplemental is usually one where the supplemental result represents an old version of the page. Some supplemental results are, however, just left over remnants of sites where the whole site is 404, or even where the domain no longer exists at all.
The other class of supplemental result is where the page was originally a duplicate page, and was previously filtered out, but which has re-appeared when the page was modified and the old cache copy for "URL.A" is no longer a duplicate of the real page at the canonical URL, "URL.B". So "URL.A" reappears in the index to bug you for all time. Many of these are on sites where "URL.A" represents a non-www URL that has been redirected to www for the last 6 months, but where Google just re-imported old data back into their database in July or August without doing a reality check on it.
>> Pages can move from supplemental to regular listings but this is rare and nobody understands how it happens. <<
It is almost impossible to update a supplemental result to a regular listing. If it is supplemental when seaching for "red widget" and the page content has been changed to "blue gadget" then the "blue gadget" search may show the page as a regular result, but the page will continue to be returned as a supplemental result when searching for "red widget". The cache for the "old" search may show the old cache from 18 months ago, or may link to a cache only days old - showing the "blue gadget" content.
Google needs to re-evaluate the data that it has. There is a lot of junk in the supplemental results; page content from yesteryear. Google should link to the current vesion of a page, and try to forget what the old version contained.
They should leave that job to archive.org who do it in a much better way.
I'm still digesting the implications for our site, which recovered about 10% of our former G traffic during this update for reasons I'm trying to understand but seem to be related to supplemental changes.
If I query "site:mysite.com oregon" I get regular results, but "site:msysite.com montana" It's all supplementals.
Also specific Oregon queries do fine but similar queries for other states fail, even though we have equally thorough content.
Why did they go supplemental, can we reverse it, and what are the ramifications of the supplemental pages for the site?
Any help would be greatly appreciated; we have not been as well versed in this area as we should be.
There is no way to get rid of a Supplemental, currently. Your best bet is to tweak the content of any Supplemental, and get it crawled. You would have to believe that eventually Google will get rid of the Supplemental database as it exists now.
However, it has also seen the supplemental result get dropped from the index in a matter of only a week or so.
I have been testing this for several months; it has worked nearly every time (for pages where the page is still online).
The original URL is a static page at www.domain.com/folder/page.html and I linked to it using www.domain.com/folder/page.html?-index-me- instead.
That new URL appeared in the index within days and ranked OK. Within a few weeks the basic URL listed as a supplemental result had dropped out.
[edited by: g1smd at 9:37 pm (utc) on Nov. 4, 2005]
It was on a disposable domain; just messing about to see if there was any way to dislodge supplemental results in any way at all -- and there are very few.
You need a real link for this. Google indexes the "new" URL. Try it for a couple of pages and see what you get.
There are no guarantees; use at your own risk.
1) mysite.ext/mypage.html (crawled daily)
2) mysite.ext/Mypage.html (supplemental) (old cache date)
3) mysite.ext/MyPage.html (supplemental) (old cache date)
They are all the same content, and I suspect that page 1, the correct page, is being penalized for duplicate content. If I put in a 301 redirect from page 2 and page 3, to page 1, then place links somewhere on a regularly crawled page to page 2 and page 3 (so that they get re-crawled?), would that have no effect on the supplementals indexed?
So, if I just put different content (or no content) on the supplemental pages, and then link to them from a daily crawled page, would they then get re-indexed and eliminate the possibility of dup content penalty on page 1?
Once the page is supplemental, the content on it now will live (at this point) forever in Google's memory.
Put new content on the pages you want to get rid of, let them get crawled for a period of time (week, months, decades...) then 301 to somewhere. This won't accomplish anything right away, but someday Google should get its act together and start obeying the 301s.
Because of this I have used the Google Removal Tool and I deleted all
these old URLs with "noindex, noarchive".
All these URLs came again after 180days.
I do see the possibility to delete this URLs not with metatags,but with or 404 410.
Is this method better or do the URLs also come back definitely after 180days?
Do you think that deleting with the Google removal tool has any effects?
Do deleted ("hided") URLs actually play no more role for calculations like duplicate
content during the 180 days?
Thank you for your help!
For a page that is 404, or even for a domain that no longer exists, you can "remove" the pages from the index, but, as you say, even if the pages no longer exist at the end of the 90 or 180 days then they are automatically added back in. That seems to me to be wrong. If the pages still no longer exist, then they should be permanently removed.
Even where Google might discard the old cache, they still keep a record of the old title and the old snippets for ever. For pages that are supplemental results there is no way to get Google to forget about those pages.