|Google Changes 410 Status Handling|
Google's JohnMu made a post on Google's Help Forum that contains a bit of news on the 404 versus 410 status code issue:
|...we are now treating the 410 HTTP result code as a bit "more permanent" than a 404. So if you're absolutely sure that a page no longer exists and will never exist again, using a 410 would likely be a good thing... |
In the worst case, the 410 will be treated the same as a 404; in the best case it'll be a bit quicker & stickier.
Google's Help Forum [google.com]
So, what does "quicker and stickier" mean? I would guess that a 410 will result in more rapidly lowered spidering of the url, and dropping the url from Google's search results. Naturally there would be some safeguards in place to protect webmasters against user error.
Hat tip to SE Roundtable [seroundtable.com]
For very big sites, using a 410 when appropriate might help spidering just a tiny bit - but it's more likely to help a site that is trying to recover from some kind of technical error that flooded G with too many urls accidentally.
That's good news! -- It's about time the listened to me! ... ;)
I agree that they mean that 410ed URLs will be dropped faster and subsequently spidered less often, although I imagine that they *will* still spider them occasionally if they continue to find links to those URLs -- and especially from within the same domain (so we Webmasters need to be sure not to leave any orphan links behind on our own sites pointing to those 410ed URLs).
If there are no links at all to the 410ed URLs, I'd expect (hope) them to disappear fast and only rarely get spidered again -- if ever.
So the main question in my mind is, what happens if there are no on-site links to the 410ed URLs, but there *are* some third-party links to them? That's the grey area, and I'd guess that the URLs will still get spidered occasionally -- but *how* occasionally is the question...
I understand that Google has to deal with (probably a majority of) sites that don't use 410s, and many that use them incorrectly. But I'm glad to see the balance tipping a bit in favor of those sites that say "410-Gone" only when we really mean "It's gone, gone for good, it ain't never coming back, but if it ever does you'll see one or more *new* links to it appear on this domain."
The discussion in the cited thread got a bit heated and a bit confusing, so for those who read that thread and came away with some uncertainty, I'd recommend the following responses for URLs that are removed: Resource is gone, but a very-relevant replacement exists:
301-Moved Permanently redirect old URL to replacement URL. Resource is gone, but no very-relevant replacement exists:
Respond with 410-Gone and serve a custom 410 error page which explains (in a fairly apologetic tone) that the resource has been removed. Then provide text links to related/relevant resources, one or more related-category pages, your site search facility, your (HTML) site map page, and your home page -- as applicable, as possible, and in that order. A long-delay meta-refresh allowing the visitor enough time to read and fully-comprehend the custom 410-Gone error page is permissible, but not necessary or required.
To generate a 410-Gone response, you'll need to use mod_alias or mod_rewrite in a .htaccess file or server config file on Apache, ISAPI Rewrite on IIS, or use a PHP or .asp script, etc.
As with any tweaking of server responses, it's a very good idea to check your work using a reliable server headers checker, and to be sure that the URL request returns the expected server status code directly, with no intervening 301, 302, or 303 redirects, for example. As usual, I recommend the "Live HTTP Headers" add-on for Firefox/Mozilla browsers as basic Webmaster kit.
If you can't generate a 410-Gone due to hosting restrictions or lack of necessary technical knowledge, then letting the URL go 404 and using Webmaster Tools very, very carefully to request removal is a still-viable approach. But before using the Removal Tool, it'd probably be a good idea to read the several "Help, I just accidentally removed my whole site!" threads here, and to be very sure that you understand exactly how the prefix-matching used by such tools works...
Where there is a risk that more than the desired single obsolete URL might be removed due to the Removal Tool's prefix-matching on 'common URL-paths,' a short-delay (instant) meta-refresh to a replacement URL may be a viable last resort if such a relevant replacement URL exists. To my knowledge, only Yahoo! has ever described how they handle this as a 301, but I haven't (re-)researched it recently. At the least, this might help Google 'figure it out' even if the unambiguous server response and Removal Tool options aren't possible.
Anyway, it'll be interesting to see how the newly-described Google 410 behavior plays out... :)
Ah, good news. Just a few days ago I posted a question on this forum [webmasterworld.com] about this same issue. Would this mean I can serve a 410 now for these pages and replace them later (when?) with a 404 or serve a 410 for ever?
John Mueller is recommending "forever", as I read it.
It's about time. There is a need for "forever."
I'd hope that 410 returned for
example.com[b]/[/b] isn't treated as 'forever', and that finding new links from other sites pointing to such a URL triggers new spidering. :)
This is great, but what about other SEs? Will 410 be treated the same as a 404 for Yahoo and Bing? Or will I need to serve both 404 and 410?
It is about time. We suffered a hack on one of our big sites last week were hundreds of doorway pages were inserted. We nuked them with 404 but are planning on 410ing them now? Maybe it would be good to wait and not confuse the google bot, they already picked up over 700 pages in WMT as 404....
<sigh> My experience is to not wait for google to sort it out with a 404. Sorry. I had a similar problem for the WHOLE MONTH OF OCTOBER!<grin> I waited weeks for my 20,000 (! I know, brutal) 404s to resolve, and slowly they showed up in WMT, but they were being removed from the SERPS at the rate of about 100 a week. I couldn't wait. I eventually said, screw this, did an URL Removal Request for the whole damn bunch, within 4 hours they were gone from the SERPS (!, I swear to god, 4 hours), and within 2 days, the problem that these pages had caused me in the SERPs was resolved and my SERPs in this respect are fine again, i.e. I now have my homepage back in the SERPs with my sitelinks all in. Crazy. So, if you are anxious, I wouldn't wait for 404s to resolve. And I wouldn't debate 404 vs. 410 with yourself, in my opinion, it is moot. I swapped over to a noindex tag on these pages, and did a mass deletion with the URL Removal Request tool, and... BOOM! Fixed! Ciao, baby! Insert a huge sigh of relief here. My only regret is that I waited 3 weeks for the 404s to fix it, and they never did.