Forum Moderators: open

Message Too Old, No Replies

Resource is gone... HTTP 404 or 410?

How to tell GoogleBot that a resource is gone and it won't come back

         

Martin Dunst

5:07 pm on Jun 5, 2003 (gmt 0)

10+ Year Member



Hello,

Let's assume we have a database driven online shop.
The product URLs all look like "http://www.example.com/productXYZ.html", while "XYZ" is the product's database id.
All documents are spidered / indexed perfectly.

When a product is deleted by the shop administrator, its database record is also deleted and the corresponding URL would henceforth remain unused. The resource is now gone, and it won't come back.

I thought that returning a "410 Gone" status code would be the right thing in this case.
Then again, 410 is a HTTP/1.1 status code, but GoogleBot is sending HTTP/1.0 requests.

It seems that sending a "404 Not Found" instead keeps GoogleBot coming back and check for the resource again and again.

Is there any way of letting GoogleBot know that the resource is _really_ gone for good?

regards
Martin

Mohamed_E

7:02 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, Martin!

A Google site search of WebmasterWorld for "410 gone" google [google.com] will give you the collective wisdom of our community :)

The very first thread [webmasterworld.com] I clicked on had this as msg #12, jdMorgan is someone whose knowledge and judgement I respect very much:

If you have removed a page, and have no replacement for it, then a 410-gone is the proper server response. If you use a 404, the spider will assume that your server is having problems, and will "give you a break" by trying to retrieve that file for a few months.

404-Not Found means the file was not found for unspecified reasons, but this condition is not necessarily permanent.

410-gone means it's really, really gone, and the condition is permanent.

I use Google for local searchers much more often than the BB's own search function.

Martin Dunst

7:29 pm on Jun 5, 2003 (gmt 0)

10+ Year Member



Mohamed_E,

Thanks a lot for your reply...
Shame on me, I should have found the thread myself :/

jdMorgan said in his posting that 410 was the proper response.
I do think so, too - but:
is a HTTP/1.1 header really the a proper response for a client sending HTTP/1.0 requests?

regards
Martin

jdMorgan

7:48 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Martin,

That's a very good point about 410-Gone being an HTTP/1.1 response code.

In the case of an HTTP/1.0 request for a missing resource, either a 404-Not Found or 403-Forbidden response would be correct.

It's also interesting that mod_rewrite on Apache, although written in 1996, supports returning an HTTP/1.1 (1996) 410-Gone status, and further, that it does so regardless of the request's HTTP version.

I'm still digging through specs right now... In line with the "robustness" required of clients and servers under HTTP, it seems to me that an HTTP/1.0 client receiving an HTTP/1.1 "enhanced" response should treat it as a member of the general status group it can recognize, i.e. treat 410 as 4xx, or 400.

I'd love to hear comments from any resident spider authors/administrators!

Jim

Mohamed_E

9:00 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Martin,

If you really want an authoritative answer Google's Googlebot FAQ Googlebot: Google's Web Crawler [google.com] suggests:

My Googlebot question is not answered here. Where do I send my question?

Please send questions regarding our Googlebot technology to googlebot@google.com.

I find it almost inconceivable that Google would not interpret such a useful response code; the fact that they call their agent HTTP/1.0 is, to my mind, irrelevant.

killroy

9:41 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sometimes I feel like totally giving up on "them" doing it right.... then I jsut dynamically create a simple page wit ha notice that the resource is not avaialable anymore and a link to "related" resources o the homepage. A simple meta robot noindex nofollow seems to do a better job then a 404.

SN

PS: Funny thing, I was about to post a comment on the "new design" of webmasterworld with less clutter... when I wanted to post a reply I realised I wasn't logged in... I didn't remember what it looked like when not logged in...

Martin Dunst

11:56 am on Jun 8, 2003 (gmt 0)

10+ Year Member



Thank you everybody for your answers.

I think the 404 response might be the best option.

After all, it's not only about GoogleBot understanding the 410-response, it's also about other HTTP/1.0 clients.
I guess every HTTP-client deserving its name should know the 404 status code, so I prefer it over 410, which could be mis-interpreted by HTTP/1.0 clients (as jdMorgan suggested).

jdMorgan said:

I'd love to hear comments from any resident spider authors/administrators!

Me too.

regards
Martin

killroy

1:02 pm on Jun 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I've only written small spoiders for my own use (but lots of those) and if you want a comment, but the "404 page not found" somewhere in the body text as well. Mayn small quick'n'dirty bots don't do seperate header handling, but simply scan the body instead.

just my 2cents

SN

jaski

1:11 pm on Jun 8, 2003 (gmt 0)

10+ Year Member



404 is very well known .. and spiders will stop requesting it after a few times I guess.

annej

1:36 pm on Jun 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I finally got Google to quit drop my 404s with

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

killroy

2:50 pm on Jun 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There ya go :)SN