Forum Moderators: open

Message Too Old, No Replies

Does Googlebot honor 410 responses?

         

Mohamed_E

1:05 pm on Oct 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This question periodically comes up, and as far as I know has not been answered satisfactorily. I emailed googlebot@google.com and got a canned reply telling me to look at the FAQ, which, of course, tells me to email them if I do not find the answer there :(

The problem is simple. HTTP/1.1 [w3.org] has introduced the nice 410 response:

10.4.11 410 Gone

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.

This, of course, is the ideal solution for those who want to delete pages. But Googlebot advertises herself as an HTTP/1.0 type of person.

Does anyone here (hint, hint :) ) have a definite answer to that question?

BlueSky

3:17 pm on Oct 2, 2003 (gmt 0)

10+ Year Member



Yes, Googlebot honors 410's. The bot spidered some pages I didn't want. After feeding it 404's, the caches dropped but the links continued to stay in the index for several weeks. I read jdMorgan's posts about using 410 so I tried that instead. The links dropped out within a couple days of the pages being indexed.

jdMorgan

11:28 pm on Oct 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mohamed_E,

This is a really good question. BlueSky's experience mirrors my own: Despite 410-Gone being an HTTP/1.1 response, it seems to be honored by many 'bots, even if they advertise HTTP/1.0.

If you are concerned with going 'by the book' with 404/410 responses, you can always use something like this to return 410 to HTTP/1.1 and higher user-agents only:


# Respond with 410-Gone status to HTTP/1.1 requests for removed resources.
RewriteCond %{THE_REQUEST} ^[^\ ]+\ [^\ ]+\ HTTP/(1\.[1-9]¦[2-9]\.[0-9])
RewriteCond %{REQUEST_URI} ^/(announce¦sp_event/event1¦sp_event/event2)\.html$ [OR]
RewriteCond %{REQUEST_URI} ^/(2002news¦2002weather)\.html$
RewriteRule .* - [G]

This code tests for a request HTTP version from 1.1 through 9.9 before returning 410-Gone for the listed files. Otherwise, the code falls through and HTTP/1.0 requests will be handled by serving the default 404 page, just as it would be if the above code were not present.

Jim