Forum Moderators: phranque
Wow, I seem to have started quite a discussion. happy!
So, all the 404s that GWT finds are for articles or news stories that have been deleted.
Is it better to do 410 instead of 404 for those? And how is that set up?
And is it possible to have it set up so they land on a 410 and then in a few seconds get redirected to my homepage?
Seem to recall reading comments here at WW some years back, but I've slept since then and brain's fuzzy.
The check for the non-existent file is to make sure that the server returns a 404 for a file that doesn't exist (if the server returns a 200, then we have no way of knowing if the verification file actually exists or if the server just returns a 200 for everything).
Vanessa Fox
Unless GWT show an error somewhere I have no any evidence they do on purpose invalid requests.
404 and 301 in that case state the same, link is no longer valid.
10.4.5 404 Not Found
[W3.org...]
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
10.4.11 410 Gone
[W3.org...]
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
10.3.2 301 Moved Permanently
[W3.org...]
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
The requested resource has been assigned a new permanent URI.
[edited by: pageoneresults at 4:38 pm (utc) on Nov 30, 2011]
It's a signal that there "used to be" a document at the location requested.
The website exposes specific content so as the owner or webmaster you expect every single request to your website to be related with the content you expose isn't it?
I'd also be willing to set up a test page of 1,000 URIs (301>200)
pointed to your site with my choice of path names and anchor text. Are you that sure that there would be "zero effect"? ;)
301 > 200 != 200
Okay, I'm not following all this, but what's wrong with using "ErrorDocument 404 siteindex.html"? That returns a proper 404 for the requested page (the one that doesn't exist), and then sends the user to the site index so they can find what they want.
(if you are using ErrorDocument then this "home page" lookalike would have to be pure html rather than generating the page dynamically)
ErrorDocument 404 /cgi-bin/bad_urls.pl
is to return 404 response in headers, with the content being the same or similar to your home page
These are basic protocols that have been established for many years
No indication is given of whether the condition is temporary or permanent.
Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval.
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise.
Google requests random URLs to test the response, and is expecting a 404
There is no documentation or reference that backs this up.
[edited by: pageoneresults at 4:22 pm (utc) on Dec 9, 2011]