| This 77 message thread spans 3 pages: 77 (  2 3 ) > > || |
|How to remove (some) Supplemental Listings|
sort of... maybe
Google's ill-advised Supplemental index is polluting their search results in many ways, but the most obviously stupid one is in refusing to EVER forget a page that has been long deleted from a domain. There are other types of Supplementals in existence, but this post deals specifically with Supplemental listings for pages that have not existed for quite some time.
The current situation:
Google refuses to recognize a 301 of a Supplemental listing.
Google refuses to delete a Supplemental listing that is now a nonexistent 404 (not a custom 404 page, a literal nothing there) no matter if it is linked to from dozens of pages.
In both the above situations, even if Google crawls through links every day for six months, it will not remove the Supplemental listing or obey a 301.
Google refuses to obey its own URL removal tool for Supplementals. It only "hides" the supplementals for six months, and then returns them to the index.
As of the past couple days, I have succeeded (using the below tactics) to get some Supplementals removed from about 15% of the datacenters. On the other 85% they have returned to being Supplemental however.
Some folks have hundreds or thousands of this type of Supplemental, which would make this strategy nearly impossible, but if you have less than twenty or so...
1) Place a new, nearly blank page on old/supplemental URL.
2) Put no actual words on it (that it could ever rank for in the future). Only put "PageHasMoved" text plus link text like "MySiteMap" or "GoToNewPage" to appropriate pages on your site for a human should they stumble onto this page.
3) If you have twenty supplementals put links on all of them to all twenty of these new pages. In other words, interlink all the new pages so they all have quite a few links to them.
4) Create a new master "Removed" page which will serve as a permanent sitemap for your problem/supplemental URLs. Link to this page from your main page. (In a month or so you can get rid of the front page link, but continue to link to this Removed page from your site map or other pages, so Google will continually crawl it and be continually reminded that the Supplementals are gone.)
5) Also link from your main page (and others if you want) to some of the other Supplementals, so these new pages and the links on them get crawled daily (or as often as you get crawled).
6) If you are crawled daily, wait ten days.
7) After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches.
8) Now put 301s on all the Supplemental URLs. Redirect them too either the page with the content that used to be on the Supplemental, or to some page you don't care about ranking, like an "About Us" page.
9) Link to some or all of the 301ed Supplementals from your main page, your Removed page and perhaps a few others. In other words, make very sure Google sees these new 301s every day.
10) Wait about ten more days, longer if you aren't crawled much. At that point the 15% datacenters should first show no cache for the 301ed pages, and then hours later the listings will be removed. The 85% datacenters will however simply revert to showing the old Supplemental caches and old Supplemental listings, as if nothing happened.
11) Acting on faith that the 15% datacenters will be what Google chooses in the long run, now use the URL removal tool to remove/hide the Supplementals from the 85% datacenters.
Will the above accomplish anything? Probably not. The 85% of the datacenters may just be reflecting the fact that Google will never under any circumstances allow a Supplemental to be permanently removed. However, the 15% do offer hope that Google might actually obey a 301 if brute forced.
Then, from now on, whenever you remove a page be sure to 301 the old URL to another one, even if just to an "About Us" page. Then add the old URL to your "Removed" page where it will regularly be seen and crawled. An extra safe step could be to first make the old page a "PageHasMoved" page before you redirect it, so if it ever does come back as a Supplemental, at least it will come back with no searchable keywords on the page.
Examples of 15% datacenter: 126.96.36.199 188.8.131.52 184.108.40.206
Examples of 85% datacenter: 220.127.116.11 18.104.22.168 22.214.171.124
Maybe Google should just use a proper definition and call it the Google Hide-It tool.
You would think that if you request it to be removed and the page then is either redirected or excluded via robots.txt or even results in a 404. Google would leave it out of the index (main or supplemental) even after the 6 months expires.
Maybe Google Guy can or will shed some light on this situation especially at a time when people are trying to eliminate unintentional duplicates from Google so that the can get out a duplicate content penalty situation.
I have several hundred unintentional duplicates that are supplementals, I would at least like to be a good citizen and get them out of Google's supplemental index so that someone elses pages can use that space. I just don't feel right hogging up all that space in the supplemental index for pages on my site that no longer exist because the return a 404.
So is it the removal tool or the hide-it tool?
I once tried to call my website a cashcow, but wife wouldn't let me because she said she had undenyable proof that it didn't fit that definition.
Thank you for the tips. I tried the blank page route and linked from my homepage.. G would never look at the supplemental. Had them out there for 3 months. They were in my google sitemap too :(
Jut to be clear, I didn't say "blank". There should be something there to be indexed.
steveb, are you sure that it was your actions that caused the pages to be removed, or is it just Google noise? The number of non-existent supplemental pages on my domains has been fluctuating alarmingly during the past two months - anything between 100,000 and 200,000 supplemental 404 pages depending on which day or datacenter I check.
Yes, I did say two hundred thousand. That's far more than the real number of pages on the domain.
have an entire sited listed in supplemental as %20www..cannot work out the best way to remove them..even if google gets a 404 for each page it appears that it will keep the pages making me appear to have 2 duplicate sites...
steveb, thanks for posting your findings on this topic.
|After ten days the old Supplemental pages should show their new "PageHasMoved" caches. If you search for that text restricted to your domain, those pages will show in the results, BUT they will still ALSO continue to show for searches for the text on the ancient Supplemental caches. |
When the pages show for their "PageHasMoved" text, are they still listed as supplemental, or do they appear to be part of the normal index at this point?
I also have to wonder if those 15% datacenters have anything to do with something Matt Cutts noted on his blog last week:
|I have heard some reports of people having issues with doing a 301 ... but we may be due to replace the code that handles that in the next couple months or so. |
I really wish Google Guy would give us some suggestions on this. I have tried so many things.
For a while it seems to help having added a noindex to my custom 404 not found.
Then I tried linking the deleted pages from the homepage in hopes that Google would find them and see they had been deleted. That seemed to work temporarily but then they came back. Plus it looks terrible to have a list of deleted pages on your homepage.
Now I've dropped my custom 404s as someone suggested that. It doesn't seem to have helped.
I do think Google knows the pages are gone as they list them as supplimental. But why list deleted pages?
The other frustrating thing is the deleted pages still have the original PR while the identical pages with the new URL have no PR. <sigh>
|I also have to wonder if those 15% datacenters have anything to do with something Matt Cutts noted on his blog last week: |
|I have heard some reports of people having issues with doing a 301 ... but we may be due to replace the code that handles that in the next couple months or so. |
I think that is the case - 301 worked properly a few months ago, so they are likely to repair it, and this thread is a good place to point the problem out to GoogleGuy.
Google "hide URL" tool shows (C) 2002 Google in footer - it just requires much development and I'm sure it will be upgraded in time. I has plenty of bugs, some of them dangerous, but the team has apparently more important things to do right now.
>> Google refuses to obey its own URL removal tool for Supplementals.
I think it was a one-off loss i.e. they lost their database of pages that had been requested via the removal tool. I don't know when exactly that happened but any pages requested prior to that date would have come back into supp and pages requested subsequently would stay delisted.
Pages I've removed via removal tool sometime ago still seem removed. Fingers crossed. But, yes, the whole affair is pretty messy and we shouldn't have to submit URLs for removal anyway. My grandmother understands 301s, why can't Google?
"When the pages show for their "PageHasMoved" text, are they still listed as supplemental, or do they appear to be part of the normal index at this point?"
Both. If you search for pagehasmoved, the new title is displayed and the listing looks normal. if uou search for "some text that was on the the page in 2004", then the 2004 title and a listing marked Supplemental will be shown.
To some degree this makes sense, as Google says the Supplemental listings are in a separate database.
"are you sure that it was your actions that caused the pages to be removed, or is it just Google noise?"
Definitely sure. The 85/15 split occurs at the same moment, so it can't possibly be a coincidence or glitch.
Separate note, I used the URL removal tool on more 85/15 pages twenty hours ago. These pages had been previously removed seven months ago, and reappared like clockwork last month. After 20 hours they still have not been removed. First time removals go in six or so hours in my experience. If the ones from yeterday don't get removed at all, that is a new wrinkle.
I am also thinking about putting a list of URL's that are supplemental (and have been deleted a long time ago) at the bottom of one of my pages so that Google can see they are 404 - but I was wondering if you could then be penalised for linking to all these non-existent pages?
I've been having a heck of a time getting Googlebot to even look at any of my supplemental pages. Trting to test some techniques similar to what steveb has suggested, a couple weeks ago I linked to a couple long-dead supplemental URLs from my home page. Despite twice-daily homepage crawls and numerous new pages indexed, Googlebot hasn't touched the supplementals in the two weeks they've been linked there. So that's the other half of the battle...
I swore a little while back that I was going to stay away from the URL removal (er, hiding) tool and just provide Google with some on-site cues to help sort itself out. Fat lot of luck I've had with that. URL removal tool, here I come.
(The solution, BTW, seems obvious: Google need to do a thorough crawl of every URL in the supplemental index and treat the results (301s, 404s, etc.) properly, just as if they were URLs in the normal index/crawl. A little too obvious, I suppose?)
“I have heard some reports of people having issues with doing a 301 ... but we may be due to replace the code that handles that in the next couple months or so. “
Around the time they were messing with 302 redirects a buch of stuff started to happen with 301 redirects. My site happened to disappear from the index about that time when I noticed this stuff happening. I did have an opportunity to send G-engineers a bunch of examples of these poorly handled 301s. I haven’t heard anything back – like usual.
What I did see is old 301 pages being resurrected with the design of the new pages they redirect to in which the old pages have NEVER seen such a design. This creating an exact duplicate of the new page in their index. At the time our site was disappearing from the index OLD 301 pages appeared with old cache dates but of our newest design (the old page never seen that design at any point in history) at the same time the newer version started to disappear from the index.
To make matters worse external 301 redirects (tracking scripts) also appeared in the index with a cache of the page that contains the URL to the tracking 301 script creating yet another duplicate of a page under a separate URL. Imagine tons of these redirects and the possible effects.
My main concern is whether Google would treat all those old pages as some sort of dupe content or not. If they do, could there be tons of dupe content especially if a webmaster makes huge directory changes? If they don’t treat them as the same page but separate then what impact (such as orphans) would it have on pagerank and such if they never get rid of them – or – does Google just ignore the old but keep it around for, hmmmmm ever? If it does ignore then what is up with what I described in the first paragraph?
I am also wondering if this little trick described in this thread could possibly help to alleviate some of these G-generated problems. Maybe if I resurrect some of or old pages but slapping some junk on them for a while then 404/410 the darn things - at least this could help get rid of possible G-generated dupes and effects of such (if any).
If you were to decide you wanted to delete or move a page today, I would first put up a nearly blank Page Moved page and leave that for awhile, let's say a month. Link to it from a few decent locations. After that, I would 301 to the new location of the page (or any other page besides your main page). This way, if the page ever reappears, it will likely come back as the Page Moved page, rather than the old/real content.
If a page is Supplemental now, this won't work. Google will "permanently" (apparently till they extract their head from their rectum) remember the old content.
macdave, "Despite twice-daily homepage crawls and numerous new pages indexed, Googlebot hasn't touched the supplementals in the two weeks they've been linked there"... what is on those Supplementals? I have no problem at all getting Supplemental URLs crawled that were NOT canonical problems in the past. If your Supplementals are from www versus non-www issues, that seems more difficult. Also, I suppose as a basic that pagerank matters. Did other non-new pages get crawled? I can see the bot going for a new URL, but it might skip all the ones it knows about, including the old Supplemental.
I have several supplementals ending their process now and in the past 36 hours. Their seemed to be bad news in that after 24 hours the first of my pseudo-successfully redirected supplementals came back on the 15% datacenters. Some other pages "disappeared" but I could tell they were still there, as results for the directory they were in would say "4 of about 5", meaning that fifth one was a supplemental trying to disappear but being prevented.
Frustrated, I used the URL hiding tool on this whole batch, since my rankings disappeared when they were un-hid, so maybe hiding them again might be good enough. I still have two more pages set to either redirect or not in eleven days or so (based on how long it has taken this first batch).
I do intend to permanently keep a sitemap of all the dead and redirected locations. Regardless of whether it does any good or even any bad, I feel like I have to do everything I can to minimally avoid the annoyance of Google stupidly showing my eyes these ancient URLs anytime I check out my domains.
Oh and Matt, if you seriously did not know that a Supplemental can't be redirected, that should tell you this is an area you should learn more about.
Google, I've told you these pages do not exist and should be removed from your index, please do so.
Google, I have told you these pages have been "moved permanently", please obey the 301.
They've been 301s for many months. But no that you've gotten me thinking about it, there are a couple possibly complicating factors with these particular URLs:
|what is on those Supplementals? |
1) They're in the form
/directory/index.html -- which G may be merging with
/directory. An [inurl:/directory/] search shows "results 1-1 of 3", which would suggest that it has additional information about those variations.
2) Until quite recently
/directory/ was disallowed in my robots.txt
I'll have to try with some others that don't have those particular issues...
No canonical problems that I've ever noticed, and we do have a non-www to www redirect in place. Homepage PR has been 5 for years with a couple thousand natural IBLs. (per Yahoo) We had some hijacking problems about a year ago, but cleared those up with no ill side affects.
|I have no problem at all getting Supplemental URLs crawled that were NOT canonical problems in the past. If your Supplementals are from www versus non-www issues, that seems more difficult. Also, I suppose as a basic that pagerank matters. |
|Did other non-new pages get crawled? I can see the bot going for a new URL, but it might skip all the ones it knows about, including the old Supplemental. |
Crawling frequency is at least daily for most major pages on the site, usually twice a day for the home page. We have a lot of deep content that gets crawled fairly regularly -- most cache dates are within the last two weeks.
|Frustrated, I used the URL hiding tool on this whole batch, since my rankings disappeared when they were un-hid, so maybe hiding them again might be good enough. |
I'm thinking along those same lines. I did a major round of URL hiding on March 15-20. Count foward six months: those URLs came out of hiding within days of September 22 (possibly on Sept. 22 itself -- I wasn't paying much attention to them until our Google traffic dried up).
My major reservation about using the tool again is this: Suppose that sometime during the six months of hiding, Google gets its act straightened out and crawls and cleans up the supplemental index. Are hidden URLs part of that cleanup? Or do they just reappear in six months only to be ignored again? I suppose that I have nothing to lose aside from being thrust back into the same position in another 6 months' time, but I'd hate to miss the opportunity to have this cleared up correctly in the meantime.
You're starting to sound like Dayo_UK :-)
macdave it sounds like you are linking from your main page to a Supplemental URL and having a 301 on that.
That won't work. That is a waste of time. Google will refuse to obey a 301 to a Supplemental.
You need to put actual pages back on those old URLs and get them crawled. Then after Google has these new pages firly in their cache (after at least ten days), then put the 301 on. Eleven or so days after that, see what happens.
Once a page is designated "supplimental" does Google ever look at it again? Would it even help to put new material on it is Google might not ever find it?
As described above, Google will crawl a supplemental page every day. No problem there.
There problem is that even though Google crawls the new page, it still remembers the old page... meaning if the page now simply says the word "red", and it used to simply say "blue", the page will show up for either searches for red (not shown as Supplemental) and "blue" (showing the Supplemental tag).
I have not attempted any of these suggestions until today when I noticed that G (via my sitemaps stats page) has been trying to fetch a bunch of pages that I deleted over a year ago when I switched my site from php to asp. These are all showing up in the supplemental index.
On my stats page, they are showing up as URL unreachable and HTTP Error. I just created the "page has moved" page for each of these and then searched for other supplemental results and created those pages also.
Has anyone else noticed this?
I think these supplimentals are put in by hand at Google. Not someone literally putting them in by hand, but an Engineer that goes to a specific section of the index and clicks a button to "add supplimental links" -- that's why after a while, you got some of them to change -- they re-did that section of the index.
Reason I say this is only one of our sites has supplimental listings. I jumped for joy when I saw them, but then I realized that every one of our competitors in that competitive market all of a sudden had them as well. None of our other sites, (different markets, same categories), have them, nor do any of our competitors in these other markets.
Been working on a few different ones the past week. Currently one Supplemental has disappeared on the 15% datacenters while reverting to a Supplemental on the 85% ones. If it behaves as others did, it will revert to a Supplemental on the 15% ones in the next 48 hours, but hopefully it won't.
In a few days I have a couple 301s about to kick in on Supplementals that are only supplemental on the 15% datacenters. Might be interesting what happens with those.
How does Google deal with a redirect 410 "gone" error?
Is it any more effective than using a 301 for deleted pages? Or will Google still continue to list them in supplemental results. Googlebot uses a HTTP/1.1 UA, which understands a 410 error.
I have recently removed a lot of pages from a site, and after a few days (2-5) all the pages was removed from the google index, even from the supplement results.
The only thing that is different from this site compared to other of my sites where pages does not disappear is that this site uses verified google sitmaps.
Has any other tried this on a page that keeps old pages in the index?
I wouldn't consider a page out of the index until at least one year. Google has been resurrecting pages from well over a year ago that have been shown as deleted for a long time.
In other words, just because it shows as deleted now doesn't mean it will stay that way.
|How does Google deal with a redirect 410 "gone" error? |
Removing pages by sending a 410 was very effective for me with Googlebot and Slurp.
|lovethecoast: Reason I say this is only one of our sites has supplimental listings. I jumped for joy when I saw them, but then I realized that every one of our competitors in that competitive market all of a sudden had them as well. None of our other sites, (different markets, same categories), have them, nor do any of our competitors in these other markets. |
That's not my experience at all. We have a lovely supplemental listing for a current page and our best competitor does not. So we get 2 results while they only get 1. Nor do I notice a consistent trend amongst other websites in our industry.
You guys are talking about something else. Supplemental results are never something you want to have.
You are thinking of extra results displayed in the search results.
I know of sites with supplemental results reflecting content that was removed up to 2 years ago - and yes the pages have a cache from yesterday, which is updated (almost) daily. They rank well for words contained in the current content, and have a snippet that reflects the current content. If you then search for "old content", the page is returned, you see "old content" in the snippet, but the cache page is from yesterday, with none of the words that were found in the snippet anywhere to be found in the cache. In this case, Google has one modern cache stored, but it has both a modern snippet and an ancient snippet stored(*). Why do they do this?
Another type of supplemental is where the cache is one or two years old and the page is returned for searches on that content even though the real page has been 404 for a year or more (or in several cases the domain itself doesn't even exist, and hasn't for more than 6 months). In this case Google only has ancient data in the cache and in the snippet.
Another type of result is where a page ranks for the current content, and both the cache and the snippet are modern, but if you search for content that was on the page a year or more ago, the page is returned for that search but both the cache and the snippet are ancient. Google has both a modern and an ancient copy of both the cache and the snippet.
In all cases, the "ancient data" is always tagged as being supplemental, but the modern data may be supplemental or more usually it is shown as a normal result, not as a supplemental. That is, a supplemental page may not be supplemental for all keyword searches that it is returned for. I first made this point more than a year ago, and I am glad to see a wider discussion of that now.
(*)From this, it seems obvious that the snippet content does NOT always come from the cache data; maybe there is a separate place that snippets are generated from?
|a supplemental page may not be supplemental for all keyword searches that it is returned for. |
I hadn't realized that. Thanks for repreating the info.
I notice the pages that I deleted in July have a cached date of July 17, 2005. That's three months! I'd reorganized things a bit and had to set the pages with new URLs. I wouldn't mind since the supplimental version is usually buried in the serps except for the fear of a duplicate penalty.
| This 77 message thread spans 3 pages: 77 (  2 3 ) > > |