jdMorgan - 2:22 pm on May 6, 2011 (gmt 0)
You have mentioned this several times now.
There is no such thing. 4xx codes are error codes.
Redirect codes are always 3xx.
Tell it to Apache [httpd.apache.org] ;)
The Apache documentation was written by humans, not deities. It is technically in error calling a 400-series error response a redirect. To promote clarity, it should really be called an "internal rewrite," or at least be called an "internal redirect" to avoid confusion.
400-Series error responses do not cause a change to the requested URL on the client side (watch your browser address bar during a 30x response, then compare to a 4xx response). Therefore, this is clearly not a "client redirect." Rather, a 4xx response code is sent, along with the content of the 4xx ErrorDocument (if present) or with server-generated error message text (depending on how you've set it up with the ErrorDocument directive).
If your address bar changes during 400-series error handling, then this indicates that you have improperly defined the ErrorDocument as a URL instead of defining it as a local URL-path only. This is one of the most common errors seen in error-handling configuration on Apache servers -- despite the fact that Apache warns about it in the ErrorDocument directive documentation. Briefly, use
ErrorDocument 404 /404-error-page.html
and never use
ErrorDocument 404 http://example.com/404-error-page.html
as the latter will result in a 302-Found client redirect response instead of the desired 404 error response.
Since a 302 redirect response tells search engines that the document exists but has moved, it is a potentially-serious problem.
On the robots.txt versus noindex issue, clarification is also needed:
If you block a resource URL-path (e.g. a page) using robots.txt, then that resource will not be fetched by any robots.txt-compliant robot. Therefore, its on-page "meta-robots" tag is irrelevant except in cases such as that described by IncrediBill where there is a "glare" situation -- where the resource is fetched during the time that you are making robots.txt and on-page "meta-robots" changes, or while an error exists in the robots.txt file.
Resource URL-paths which are Disallowed by robots.txt will not be fetched by robots.txt-compliant robots, but they may still appear as URL-only listings in search results, based on incoming links from other pages.
If a resource URL-path is not Disallowed by robots.txt, then it may be fetched. If it is marked as "no-index," then it will not be placed in the search engine's index, and it will not appear in meta-robots-tag-compliant search engine results.
If a page's URL-path is not Disallowed by robots.txt and if it is marked as "no-follow," then links on this page will not be followed from this page.
So, there is a hierarchy between robots.txt and on-page meta-robots tags, and they do very different things.
If your server returns either a 404-Not Found or 410-Gone response to a request, then Google shows a 404-Not Found response in their Webmaster Tools status reports. They really should fix this to promote clarity, but that's how it is for now.