Forum Moderators: phranque

Message Too Old, No Replies

The '410 Gone' error message

"Remove All References to this Resource"

         

g1smd

9:20 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There has been much previous discussion as to the merits of using '
410 Gone
' instead of using '
404 Not Found
' when a page of content is no longer available, and will not be coming back.

I've recently been looking at a CMS that had been modified to use the 410 status code for pages that have been deleted. Ordinarily, and especially for static sites, that would not be a problem.

Upon reading the verbose error message sent by the server used for this CMS, I have one possible concern when there are parameters involved in the request.

Before explaining the problem, you need to be aware that while most people would say that '
www.example.com/somepage.php?parameter=value
is a URL', there is one very important point within the HTTP specs to take careful note of.

Technically the URL is just
www.example.com/somepage.php
and the appended
parameter=value
part is 'data to be used by the resource found at the aforementioned URL'.

So, when accessing the website in question, and asking for
www.example.com/somepage.php?someparam=somevalue&page=23456
the returned status code was "
HTTP/1.1 410 Gone
" and the verbose error message said:


[b]Gone[/b]


The requested resource
/somepage.php
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.


Apache/2.2.2 (Fedora) Server at example.com Port 80


With a "
404 Not Found
" it would be
www.example.com/somepage.php?someparam=somevalue&page=23456
that is Not Found.

With a "
410 Gone
" the message says that
/somepage.php
no longer exists, and to remove all references to it.

To me, that says "remove all references to
/somepage.php
whatever the attached parameters, not just the ones you requested now".

So, is there a danger that some spidering/indexing system will take that literally and remove "all" references? Is that what is actually intended, or am I reading too much into this?

I'm not waiting around to find out, the CMS has now been converted to use www.example.com/something/34567-somepage/ style URLs and old format URLs are redirected to the new format.

jdMorgan

12:54 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is in the ambiguity of the CMS's definition of (the scope of) the URL, as you summarized in your introduction above.

The "remove *all* references" bit in their error description text is simply a request to remove all links, listings, etc. that point to this (single) URL, because if this (single) URL no longer resolves, then *all* references to it will be obsolete.

So the words "all" refers to "all of your references" and not to "all possible query string variations of our defunct URL."

However, I'd agree that because most "Web people" probably don't realize that a query string attached to a URL is not technically part of that URL (because it specifies data, and not a Web "location"), it is potentially dangerous to return a 410 or 404 --or indeed, any "actionable error" response code-- unless you are sure that the client software will handle it properly.

If I had a major investment in a query-string-based site using that CMS and no option to switch to "static" URLs, I'd do some serious search-engine-behaviour testing on this subject -- and continue re-testing periodically as well. Or I'd keep a very close eye on what big companies like Amazon do with their removed resources.

Jim

g1smd

1:31 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim.

While the error message returned in this case is sent by the CMS for those requests, I note the exact same error message is sent directly by Apache 2.2.2 (Fedora) for other non-existent URLs requested from the same server.

I tested that by using this code:
RewriteRule ^test12345 - [G]

in the .htaccess file, and then requesting the URL
example.com/test12345

from the server.

So, the 'problem' runs through to the core of Apache itself.

jdMorgan

1:48 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's not so much of a "problem," just an ambiguity in the scope of the word "URL." I think that most of the major search engines understand to use the wider scope --including the query string as part of the URL-- when removing references to resources.

Unfortunately, most search engine robots' "Webmaster help" pages are devoid of any such detailed information -- on this and on many other subjects.

Jim