Welcome to WebmasterWorld Guest from 188.8.131.52
The current situation:
Google refuses to recognize a 301 of a Supplemental listing.
Google refuses to delete a Supplemental listing that is now a nonexistent 404 (not a custom 404 page, a literal nothing there) no matter if it is linked to from dozens of pages.
In both the above situations, even if Google crawls through links every day for six months, it will not remove the Supplemental listing or obey a 301.
Google refuses to obey its own URL removal tool for Supplementals. It only "hides" the supplementals for six months, and then returns them to the index.
As of the past couple days, I have succeeded (using the below tactics) to get some Supplementals removed from about 15% of the datacenters. On the other 85% they have returned to being Supplemental however.
Some folks have hundreds or thousands of this type of Supplemental, which would make this strategy nearly impossible, but if you have less than twenty or so...
1) Place a new, nearly blank page on old/supplemental URL.
2) Put no actual words on it (that it could ever rank for in the future). Only put "PageHasMoved" text plus link text like "MySiteMap" or "GoToNewPage" to appropriate pages on your site for a human should they stumble onto this page.
3) If you have twenty supplementals put links on all of them to all twenty of these new pages. In other words, interlink all the new pages so they all have quite a few links to them.
4) Create a new master "Removed" page which will serve as a permanent sitemap for your problem/supplemental URLs. Link to this page from your main page. (In a month or so you can get rid of the front page link, but continue to link to this Removed page from your site map or other pages, so Google will continually crawl it and be continually reminded that the Supplementals are gone.)
5) Also link from your main page (and others if you want) to some of the other Supplementals, so these new pages and the links on them get crawled daily (or as often as you get crawled).
6) If you are crawled daily, wait ten days.
7) After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches.
8) Now put 301s on all the Supplemental URLs. Redirect them too either the page with the content that used to be on the Supplemental, or to some page you don't care about ranking, like an "About Us" page.
9) Link to some or all of the 301ed Supplementals from your main page, your Removed page and perhaps a few others. In other words, make very sure Google sees these new 301s every day.
10) Wait about ten more days, longer if you aren't crawled much. At that point the 15% datacenters should first show no cache for the 301ed pages, and then hours later the listings will be removed. The 85% datacenters will however simply revert to showing the old Supplemental caches and old Supplemental listings, as if nothing happened.
11) Acting on faith that the 15% datacenters will be what Google chooses in the long run, now use the URL removal tool to remove/hide the Supplementals from the 85% datacenters.
Will the above accomplish anything? Probably not. The 85% of the datacenters may just be reflecting the fact that Google will never under any circumstances allow a Supplemental to be permanently removed. However, the 15% do offer hope that Google might actually obey a 301 if brute forced.
Then, from now on, whenever you remove a page be sure to 301 the old URL to another one, even if just to an "About Us" page. Then add the old URL to your "Removed" page where it will regularly be seen and crawled. An extra safe step could be to first make the old page a "PageHasMoved" page before you redirect it, so if it ever does come back as a Supplemental, at least it will come back with no searchable keywords on the page.
Examples of 15% datacenter: 184.108.40.206 220.127.116.11 18.104.22.168
Examples of 85% datacenter: 22.214.171.124 126.96.36.199 188.8.131.52
However I suspect that it could matter in many cases and I wonder if part of Google's current problem of killing good pages stems from conflicts between the gigantic and unmanageable supplemental index and the regular index.
I've tried everything on one set of pages. I had redone them all on new pages in order to update my site. I could never get my old deleted pages out of supplemental and truly gone from Google so I decided to move the new pages to another domain. I don't want to risk any penalty on my big domain.
When I deleted these pages I left the links to them (or what was them) on my site. They seem to be completly gone from Google now. But the old old ones remain. It seems if you can get Google to see the deleted pages right away before they are declared supplemental you can get rid of them.
The older deleted pages are still a problem. I'm tired of having a link to each "now deleted URL" on my homepage. It looks tacky. Should I just give up? That homepage gets a new date in the serps every 2 or 3 days so I KNOW Google must have found them. <sigh>
The only thing to do now is put a 301 on the links to the recently delted pages. You don't have to 301 to the new location if you don't want to. Just pick any page to 301 to. Then remove those links from your front page in two or three weeks. The newly deleted pages should not go supplemental now. Even better, make sure one or more links to the newly deleted/301ed URLs continue to exist (doesn't need to be a front page) so Google is forced to see the 301 regularly
>>I just don't understand why Google hangs on to all these deleted pages. I even found one in supplimental that has been gone for 4 or 5 years.<<
As far as suuplementals are concerned, it seems that Google still niether forgive nor forget ;-)
Lets hope the next update to deal with the supplementals and canonicals issues as well.
Lets hope the next update to deal with the supplementals and canonicals issues as well.
From what GG andf MC said early on that was the plan for this update. Something must not have worked out so it was back to the drawing board.
As long as they don't count the sups as duplicate copy and penalize the site it doesn't matter. Is there any evidence this has happened?
<meta name="robots" content="noindex,follow">
- on the gone pages?
Oh yes, been there done that. Once the pages turn supplemental it doesn't do any good.
Not that you should't try. It used to work maybe it will again.
Now that I think about it if you set it up that way, leave links to the 'gone' pages maybe it would work. Just do it before they go supplimental.
I am hoping Google will do something to solve this problem. Makes more sense for them to do it from their end.
Pages that went supplemental and then had their content changed, still continued to show up as supplemental results for keywords no longer on the page, and correctly showed as a normal result for any current content that was searched for. The snippet would reflect the search that was made, but in both cases the cache was always a bang up to date modern copy from ~2 to ~10 days ago. For the supplemental result, words would show in the snippet that are no longer on the real page and no longer in the copy of the cache that Google was showing to the public. This appears to be fixed for some of the pages that I have tracked for the last 2 years or more.
For pages where the supplemental result represents a page that is now replaced with a 301 redirect to another version of the page, nothing has been fixed at all; neither is there any change for a URL that is really a long-term 404.
Google attempted to combat certain types of spam, and the usual "domain-hopping when found out" scenarios by inserting some latency into the system, latency that was useful in detecting duplicate content when spammers abandon a domain and start again on a new one, for example. Maybe that is where the supplemental results come in, a record of the previous content of a site that can be used to penalise a spammer migrating it to a fresh domain?
Whilst a good idea on the surface, maybe Google naively assumed that non-spammers would usually have "perfect" sites (regarding URLs and redirects), but in reality it has been found that even normal sites have duplcate content across www and non-www, mixed links pointing to both www and non-www within a site, and multiple URLs that can reach the same content, along with 302 redirects that go to error pages instead of serving a 404 for "page not found".
It must be very difficult to write a unversal algorithm to cover all eventualities, but it has taken Google a very long time to see that their database has a lot of "rogue" data within it. The test DC appears to have a fix for one problem, but I see no movement yet on several other classes of screw up.
And I wonder: could they also be handicapped when it comes to passing PR?