Forum Moderators: Robert Charlton & goodroi
The current situation:
Google refuses to recognize a 301 of a Supplemental listing.
Google refuses to delete a Supplemental listing that is now a nonexistent 404 (not a custom 404 page, a literal nothing there) no matter if it is linked to from dozens of pages.
In both the above situations, even if Google crawls through links every day for six months, it will not remove the Supplemental listing or obey a 301.
Google refuses to obey its own URL removal tool for Supplementals. It only "hides" the supplementals for six months, and then returns them to the index.
As of the past couple days, I have succeeded (using the below tactics) to get some Supplementals removed from about 15% of the datacenters. On the other 85% they have returned to being Supplemental however.
Some folks have hundreds or thousands of this type of Supplemental, which would make this strategy nearly impossible, but if you have less than twenty or so...
1) Place a new, nearly blank page on old/supplemental URL.
2) Put no actual words on it (that it could ever rank for in the future). Only put "PageHasMoved" text plus link text like "MySiteMap" or "GoToNewPage" to appropriate pages on your site for a human should they stumble onto this page.
3) If you have twenty supplementals put links on all of them to all twenty of these new pages. In other words, interlink all the new pages so they all have quite a few links to them.
4) Create a new master "Removed" page which will serve as a permanent sitemap for your problem/supplemental URLs. Link to this page from your main page. (In a month or so you can get rid of the front page link, but continue to link to this Removed page from your site map or other pages, so Google will continually crawl it and be continually reminded that the Supplementals are gone.)
5) Also link from your main page (and others if you want) to some of the other Supplementals, so these new pages and the links on them get crawled daily (or as often as you get crawled).
6) If you are crawled daily, wait ten days.
7) After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches.
8) Now put 301s on all the Supplemental URLs. Redirect them too either the page with the content that used to be on the Supplemental, or to some page you don't care about ranking, like an "About Us" page.
9) Link to some or all of the 301ed Supplementals from your main page, your Removed page and perhaps a few others. In other words, make very sure Google sees these new 301s every day.
10) Wait about ten more days, longer if you aren't crawled much. At that point the 15% datacenters should first show no cache for the 301ed pages, and then hours later the listings will be removed. The 85% datacenters will however simply revert to showing the old Supplemental caches and old Supplemental listings, as if nothing happened.
11) Acting on faith that the 15% datacenters will be what Google chooses in the long run, now use the URL removal tool to remove/hide the Supplementals from the 85% datacenters.
Will the above accomplish anything? Probably not. The 85% of the datacenters may just be reflecting the fact that Google will never under any circumstances allow a Supplemental to be permanently removed. However, the 15% do offer hope that Google might actually obey a 301 if brute forced.
Then, from now on, whenever you remove a page be sure to 301 the old URL to another one, even if just to an "About Us" page. Then add the old URL to your "Removed" page where it will regularly be seen and crawled. An extra safe step could be to first make the old page a "PageHasMoved" page before you redirect it, so if it ever does come back as a Supplemental, at least it will come back with no searchable keywords on the page.
Examples of 15% datacenter: 216.239.59.104 216.239.57.99 64.233.183.99
Examples of 85% datacenter: 216.239.39.104 64.233.161.99 64.233.161.105
You would think that if you request it to be removed and the page then is either redirected or excluded via robots.txt or even results in a 404. Google would leave it out of the index (main or supplemental) even after the 6 months expires.
Maybe Google Guy can or will shed some light on this situation especially at a time when people are trying to eliminate unintentional duplicates from Google so that the can get out a duplicate content penalty situation.
I have several hundred unintentional duplicates that are supplementals, I would at least like to be a good citizen and get them out of Google's supplemental index so that someone elses pages can use that space. I just don't feel right hogging up all that space in the supplemental index for pages on my site that no longer exist because the return a 404.
So is it the removal tool or the hide-it tool?
I once tried to call my website a cashcow, but wife wouldn't let me because she said she had undenyable proof that it didn't fit that definition.
Yes, I did say two hundred thousand. That's far more than the real number of pages on the domain.
After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches.
I also have to wonder if those 15% datacenters have anything to do with something Matt Cutts noted on his blog last week:
I have heard some reports of people having issues with doing a 301 ... but we may be due to replace the code that handles that in the next couple months or so.
For a while it seems to help having added a noindex to my custom 404 not found.
Then I tried linking the deleted pages from the homepage in hopes that Google would find them and see they had been deleted. That seemed to work temporarily but then they came back. Plus it looks terrible to have a list of deleted pages on your homepage.
Now I've dropped my custom 404s as someone suggested that. It doesn't seem to have helped.
I do think Google knows the pages are gone as they list them as supplimental. But why list deleted pages?
The other frustrating thing is the deleted pages still have the original PR while the identical pages with the new URL have no PR. <sigh>
I also have to wonder if those 15% datacenters have anything to do with something Matt Cutts noted on his blog last week:I have heard some reports of people having issues with doing a 301 ... but we may be due to replace the code that handles that in the next couple months or so.
I think that is the case - 301 worked properly a few months ago, so they are likely to repair it, and this thread is a good place to point the problem out to GoogleGuy.
Google "hide URL" tool shows (C) 2002 Google in footer - it just requires much development and I'm sure it will be upgraded in time. I has plenty of bugs, some of them dangerous, but the team has apparently more important things to do right now.
<speculation>
I think it was a one-off loss i.e. they lost their database of pages that had been requested via the removal tool. I don't know when exactly that happened but any pages requested prior to that date would have come back into supp and pages requested subsequently would stay delisted.
</speculation>
Pages I've removed via removal tool sometime ago still seem removed. Fingers crossed. But, yes, the whole affair is pretty messy and we shouldn't have to submit URLs for removal anyway. My grandmother understands 301s, why can't Google?
Both. If you search for pagehasmoved, the new title is displayed and the listing looks normal. if uou search for "some text that was on the the page in 2004", then the 2004 title and a listing marked Supplemental will be shown.
To some degree this makes sense, as Google says the Supplemental listings are in a separate database.
"are you sure that it was your actions that caused the pages to be removed, or is it just Google noise?"
Definitely sure. The 85/15 split occurs at the same moment, so it can't possibly be a coincidence or glitch.
Separate note, I used the URL removal tool on more 85/15 pages twenty hours ago. These pages had been previously removed seven months ago, and reappared like clockwork last month. After 20 hours they still have not been removed. First time removals go in six or so hours in my experience. If the ones from yeterday don't get removed at all, that is a new wrinkle.
I swore a little while back that I was going to stay away from the URL removal (er, hiding) tool and just provide Google with some on-site cues to help sort itself out. Fat lot of luck I've had with that. URL removal tool, here I come.
(The solution, BTW, seems obvious: Google need to do a thorough crawl of every URL in the supplemental index and treat the results (301s, 404s, etc.) properly, just as if they were URLs in the normal index/crawl. A little too obvious, I suppose?)
Around the time they were messing with 302 redirects a buch of stuff started to happen with 301 redirects. My site happened to disappear from the index about that time when I noticed this stuff happening. I did have an opportunity to send G-engineers a bunch of examples of these poorly handled 301s. I haven’t heard anything back – like usual.
What I did see is old 301 pages being resurrected with the design of the new pages they redirect to in which the old pages have NEVER seen such a design. This creating an exact duplicate of the new page in their index. At the time our site was disappearing from the index OLD 301 pages appeared with old cache dates but of our newest design (the old page never seen that design at any point in history) at the same time the newer version started to disappear from the index.
To make matters worse external 301 redirects (tracking scripts) also appeared in the index with a cache of the page that contains the URL to the tracking 301 script creating yet another duplicate of a page under a separate URL. Imagine tons of these redirects and the possible effects.
My main concern is whether Google would treat all those old pages as some sort of dupe content or not. If they do, could there be tons of dupe content especially if a webmaster makes huge directory changes? If they don’t treat them as the same page but separate then what impact (such as orphans) would it have on pagerank and such if they never get rid of them – or – does Google just ignore the old but keep it around for, hmmmmm ever? If it does ignore then what is up with what I described in the first paragraph?
I am also wondering if this little trick described in this thread could possibly help to alleviate some of these G-generated problems. Maybe if I resurrect some of or old pages but slapping some junk on them for a while then 404/410 the darn things - at least this could help get rid of possible G-generated dupes and effects of such (if any).
If a page is Supplemental now, this won't work. Google will "permanently" (apparently till they extract their head from their rectum) remember the old content.
macdave, "Despite twice-daily homepage crawls and numerous new pages indexed, Googlebot hasn't touched the supplementals in the two weeks they've been linked there"... what is on those Supplementals? I have no problem at all getting Supplemental URLs crawled that were NOT canonical problems in the past. If your Supplementals are from www versus non-www issues, that seems more difficult. Also, I suppose as a basic that pagerank matters. Did other non-new pages get crawled? I can see the bot going for a new URL, but it might skip all the ones it knows about, including the old Supplemental.
I have several supplementals ending their process now and in the past 36 hours. Their seemed to be bad news in that after 24 hours the first of my pseudo-successfully redirected supplementals came back on the 15% datacenters. Some other pages "disappeared" but I could tell they were still there, as results for the directory they were in would say "4 of about 5", meaning that fifth one was a supplemental trying to disappear but being prevented.
Frustrated, I used the URL hiding tool on this whole batch, since my rankings disappeared when they were un-hid, so maybe hiding them again might be good enough. I still have two more pages set to either redirect or not in eleven days or so (based on how long it has taken this first batch).
I do intend to permanently keep a sitemap of all the dead and redirected locations. Regardless of whether it does any good or even any bad, I feel like I have to do everything I can to minimally avoid the annoyance of Google stupidly showing my eyes these ancient URLs anytime I check out my domains.
Oh and Matt, if you seriously did not know that a Supplemental can't be redirected, that should tell you this is an area you should learn more about.
Google, I've told you these pages do not exist and should be removed from your index, please do so.
Google, I have told you these pages have been "moved permanently", please obey the 301.
what is on those Supplementals?They've been 301s for many months. But no that you've gotten me thinking about it, there are a couple possibly complicating factors with these particular URLs:
1) They're in the form
/directory/index.html -- which G may be merging with /directory/ and/or /directory. An [inurl:/directory/] search shows "results 1-1 of 3", which would suggest that it has additional information about those variations. 2) Until quite recently
/directory/ was disallowed in my robots.txt I'll have to try with some others that don't have those particular issues...
I have no problem at all getting Supplemental URLs crawled that were NOT canonical problems in the past. If your Supplementals are from www versus non-www issues, that seems more difficult. Also, I suppose as a basic that pagerank matters.No canonical problems that I've ever noticed, and we do have a non-www to www redirect in place. Homepage PR has been 5 for years with a couple thousand natural IBLs. (per Yahoo) We had some hijacking problems about a year ago, but cleared those up with no ill side affects.
Did other non-new pages get crawled? I can see the bot going for a new URL, but it might skip all the ones it knows about, including the old Supplemental.
Frustrated, I used the URL hiding tool on this whole batch, since my rankings disappeared when they were un-hid, so maybe hiding them again might be good enough.
My major reservation about using the tool again is this: Suppose that sometime during the six months of hiding, Google gets its act straightened out and crawls and cleans up the supplemental index. Are hidden URLs part of that cleanup? Or do they just reappear in six months only to be ignored again? I suppose that I have nothing to lose aside from being thrust back into the same position in another 6 months' time, but I'd hate to miss the opportunity to have this cleared up correctly in the meantime.
Google, I've told you...You're starting to sound like Dayo_UK :-)
That won't work. That is a waste of time. Google will refuse to obey a 301 to a Supplemental.
You need to put actual pages back on those old URLs and get them crawled. Then after Google has these new pages firly in their cache (after at least ten days), then put the 301 on. Eleven or so days after that, see what happens.
There problem is that even though Google crawls the new page, it still remembers the old page... meaning if the page now simply says the word "red", and it used to simply say "blue", the page will show up for either searches for red (not shown as Supplemental) and "blue" (showing the Supplemental tag).
On my stats page, they are showing up as URL unreachable and HTTP Error. I just created the "page has moved" page for each of these and then searched for other supplemental results and created those pages also.
Has anyone else noticed this?
Reason I say this is only one of our sites has supplimental listings. I jumped for joy when I saw them, but then I realized that every one of our competitors in that competitive market all of a sudden had them as well. None of our other sites, (different markets, same categories), have them, nor do any of our competitors in these other markets.
S
In a few days I have a couple 301s about to kick in on Supplementals that are only supplemental on the 15% datacenters. Might be interesting what happens with those.
I have recently removed a lot of pages from a site, and after a few days (2-5) all the pages was removed from the google index, even from the supplement results.
The only thing that is different from this site compared to other of my sites where pages does not disappear is that this site uses verified google sitmaps.
Has any other tried this on a page that keeps old pages in the index?
lovethecoast: Reason I say this is only one of our sites has supplimental listings. I jumped for joy when I saw them, but then I realized that every one of our competitors in that competitive market all of a sudden had them as well. None of our other sites, (different markets, same categories), have them, nor do any of our competitors in these other markets.
That's not my experience at all. We have a lovely supplemental listing for a current page and our best competitor does not. So we get 2 results while they only get 1. Nor do I notice a consistent trend amongst other websites in our industry.
Another type of supplemental is where the cache is one or two years old and the page is returned for searches on that content even though the real page has been 404 for a year or more (or in several cases the domain itself doesn't even exist, and hasn't for more than 6 months). In this case Google only has ancient data in the cache and in the snippet.
Another type of result is where a page ranks for the current content, and both the cache and the snippet are modern, but if you search for content that was on the page a year or more ago, the page is returned for that search but both the cache and the snippet are ancient. Google has both a modern and an ancient copy of both the cache and the snippet.
In all cases, the "ancient data" is always tagged as being supplemental, but the modern data may be supplemental or more usually it is shown as a normal result, not as a supplemental. That is, a supplemental page may not be supplemental for all keyword searches that it is returned for. I first made this point more than a year ago, and I am glad to see a wider discussion of that now.
(*)From this, it seems obvious that the snippet content does NOT always come from the cache data; maybe there is a separate place that snippets are generated from?
a supplemental page may not be supplemental for all keyword searches that it is returned for.
I notice the pages that I deleted in July have a cached date of July 17, 2005. That's three months! I'd reorganized things a bit and had to set the pages with new URLs. I wouldn't mind since the supplimental version is usually buried in the serps except for the fear of a duplicate penalty.