Forum Moderators: phranque

Message Too Old, No Replies

Returning 410 instead of 404

         

Marcia

5:54 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a way to return a 410 instead of 404 using .htaccess?

[httpd.apache.org...]

Somehow it doesn't seem exactly right, but going by how it's explained for that module, would this be how?

Redirect 410 /foo/page.html

Let me explain why. Some search engines keep 404's in their index, which is plausible since the page may just be temporarily inaccessible. But for 410/gone:

Returns a "Gone" status (410) indicating that the resource has been permanently removed. When this status is used the URL argument should be omitted.

Couldn't that send a different message and result in the pages no longer being included in the index? Could it help to avoid duplicate content issues from occurring, which *does* happen with 404's and even 301's at times.

Key_Master

6:33 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Matt Cutts has said that Google treats a 410 the same as it treats a 404. So it wouldn't work with Google.

I know it's frustrating. Google hates to drop urls from their index, even urls they haven't crawled.

jdMorgan

6:39 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



410-Gone is the proper response to an HTTP/1.1 request for a resource which has been intentionally removed. It is unambiguous in contrast to a 404, which --as you state-- may indicate either an intentionally-removed resource or a temporary condition due to a Webmaster boo-boo or some other problem.

410-Gone was introduced in HTTP/1.1 to resolve this ambiguity. Over time, some HTTP/1.0 clients were 'extended' to support most but not all HTTP/1.1 requirements. The major search engine spiders that publish as HTTP/1.0 fall into this category.

If you are hosted on a dedicated server or VPS, the proper thing to do is to detect HTTP/1.1 or extended HTTP/1.0 requests by checking for the presence of the HTTP_HOST request header. If this header is present, it is most likely that the client will understand a 410-Gone response. If the header is not present, a 404 should be returned instead.

If you are hosted on a name-based virtual server, this is a non-issue, since your site cannot be accessed at all by a true HTTP/1.0 client that does not send the HTTP_HOST header. Therefore, no conditional checking is needed, and you may use mod_alias or unconditional mod_rewrite code to return a 410 response.

In order to test for HTTP_HOST, you can use mod_rewrite:


# Test for HTTP/1.1 (or extended HTTP/1.0) hostname request header
RewriteCond %{HTTP_HOST} .
# If present, return 410-Gone for removed page
RewriteRule ^removed_page\.html$ - [G]
# Else let server generate default 404-Not Found

As to the question, "Does 410 work better with regard to search engines?", I'm afraid that's up to the search engines. They *should* interpret a 410 as a request to drop the defunct URL immediately, perhaps with a single retry after a 24-grace period, just in case the Webmaster makes an error. Therefore, using a 410 *should* prevent their spiders from returning month after month looking for a page that was removed last year (Inktomi used to drive me nuts with this).

However, I haven't seen compelling evidence that 410s are treated any differently from 404s yet -- I just use 410s because that's what HTTP/1.1 says we should use. If the search engines conform their behaviour in the future, I'm good to go. And if not, my doing it correctly doesn't have any significant downside.

Jim

Marcia

9:29 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's shared name-based hosting, so if that's how it's done I guess it can't hurt to give it a try.

Matt Cutts has said that Google treats a 410 the same as it treats a 404. So it wouldn't work with Google.

I know it's frustrating. Google hates to drop urls from their index, even urls they haven't crawled.


Google will index and save forever pages you've only thought of in your head and never even put online! :)

But it isn't Google on this. I've caught some problems with Yahoo with removed/redirected pages and directories (like there used to be with Inktomi), and I'd like to give whatever I can a shot to try to see what might be done to deal with that. So I'll double-check the problematic Yahoo listings again and take it further from there.

Thanks!

kevinpate

5:28 pm on May 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I haven't seen compelling evidence that 410s
> are treated any differently from 404s yet

I'm in full agreement. I've been sending 410's out for many, many months on a number of gone forever items. The bots for G, Y, M, don't care. They just keep coming back time after time.