Forum Moderators: open
Let's assume we have a database driven online shop.
The product URLs all look like "http://www.example.com/productXYZ.html", while "XYZ" is the product's database id.
All documents are spidered / indexed perfectly.
When a product is deleted by the shop administrator, its database record is also deleted and the corresponding URL would henceforth remain unused. The resource is now gone, and it won't come back.
I thought that returning a "410 Gone" status code would be the right thing in this case.
Then again, 410 is a HTTP/1.1 status code, but GoogleBot is sending HTTP/1.0 requests.
It seems that sending a "404 Not Found" instead keeps GoogleBot coming back and check for the resource again and again.
Is there any way of letting GoogleBot know that the resource is _really_ gone for good?
regards
Martin
A Google site search of WebmasterWorld for "410 gone" google [google.com] will give you the collective wisdom of our community :)
The very first thread [webmasterworld.com] I clicked on had this as msg #12, jdMorgan is someone whose knowledge and judgement I respect very much:
If you have removed a page, and have no replacement for it, then a 410-gone is the proper server response. If you use a 404, the spider will assume that your server is having problems, and will "give you a break" by trying to retrieve that file for a few months.404-Not Found means the file was not found for unspecified reasons, but this condition is not necessarily permanent.
410-gone means it's really, really gone, and the condition is permanent.
I use Google for local searchers much more often than the BB's own search function.
That's a very good point about 410-Gone being an HTTP/1.1 response code.
In the case of an HTTP/1.0 request for a missing resource, either a 404-Not Found or 403-Forbidden response would be correct.
It's also interesting that mod_rewrite on Apache, although written in 1996, supports returning an HTTP/1.1 (1996) 410-Gone status, and further, that it does so regardless of the request's HTTP version.
I'm still digging through specs right now... In line with the "robustness" required of clients and servers under HTTP, it seems to me that an HTTP/1.0 client receiving an HTTP/1.1 "enhanced" response should treat it as a member of the general status group it can recognize, i.e. treat 410 as 4xx, or 400.
I'd love to hear comments from any resident spider authors/administrators!
Jim
If you really want an authoritative answer Google's Googlebot FAQ Googlebot: Google's Web Crawler [google.com] suggests:
My Googlebot question is not answered here. Where do I send my question?Please send questions regarding our Googlebot technology to googlebot@google.com.
I find it almost inconceivable that Google would not interpret such a useful response code; the fact that they call their agent HTTP/1.0 is, to my mind, irrelevant.
SN
PS: Funny thing, I was about to post a comment on the "new design" of webmasterworld with less clutter... when I wanted to post a reply I realised I wasn't logged in... I didn't remember what it looked like when not logged in...
I think the 404 response might be the best option.
After all, it's not only about GoogleBot understanding the 410-response, it's also about other HTTP/1.0 clients.
I guess every HTTP-client deserving its name should know the 404 status code, so I prefer it over 410, which could be mis-interpreted by HTTP/1.0 clients (as jdMorgan suggested).
jdMorgan said:
I'd love to hear comments from any resident spider authors/administrators!
Me too.
regards
Martin