| 3:14 pm on Oct 26, 2010 (gmt 0)|
or better even, how do you rewrite the wrongly entered url (eg. http://www.example.com/wrongurl/) to the custom 404 error page (eg. http://www.example.com/notfound.html) ?
| 3:24 pm on Oct 26, 2010 (gmt 0)|
One reason it's not trivial to do this is that it's such a bad idea.
For purposes of keeping your site's search ranking and not confusing visitors, you should resist the urge to "correct" all incorrect URLs, and simply improve your 404 page to be informative and useful.
A good 404 page explains (in a somewhat apologetic tone) that the requested resource could not be found for an unknown reason, and then presents helpful resources for the visitor to find what he/she was looking for.
These typically include a link to the home page, links to major "categories" or "sections" of the site, a link to your HTML site map, and a link to your site's search facility, as applicable.
You can, if you like, include a meta-refresh on this page to forward the user to your home page after sufficient time has been allowed for the user to read and completly understand the page and select one of the links provided. But don't be in a rush here. Allow 15 to 30 seconds -- enough time for a new, non-technical reader to read and fully understand what they're reading and to make an informed choice.
If you set this meta-refresh time too short, then some search engines will treat it as a redirect, and you will get the same problems as described here for explicit redirects.
If you insist on redirecting requests for all missing resources to your home page, you will create an essentially 'infinite' URL-space, where requests for *any* URL that resolves to your server will be served the home page. The result will be duplicate content and the search engine spiders' arbitrary limitation of the depth to which they are willing to crawl your site, since they will see that it has an infinite number of URLs on it. Neither of these are good.
It is true that most major search engines have methods to avoid these problems, but I have never been one to rely on their algorithms to be 'perfect and fault-free' 100% of the time. If you set up your server correctly and in compliance with the requirements and intent of the HTTP protocol [w3.org], then you simply don't have to worry about whether each search engine can compensate for the problems in your server configuration and "figure it out."
Note that for resources which are intentionally removed, a 410-Gone response (and error page) is the correct approach. The 410-Gone error page can be mostly identical to the 404 error page, except that it should state that the requested resource has been intentionally removed, rather than being "missing for an unknown reason."
Implementing 410-Gone requires that you keep a list of intentionally-removed resources (for example, as part of your .htaccess file or in your "main script." Then when you get a request for one of them, your code "knows" that a 410 response should be served by using a rewriterule or by generating a 410 page and response header in your script(s). This presumes that your site is stable and well-administrated, so that you have (and allow) only a very few URLs which must be removed over the life of the site.
Nothing in the above should be construed to mean that you cannot "correct" minor common typos in requested URLs and redirect them to the correct page. Useful corrections are to fix things like ".htm" when ".html" is needed, removing "punctuation" from the end of the requested URLs such as "my-page.php." or "my-page.php," when this occurs due to poorly-coded link-posting software in forums, blogs, etc., and redirecting requests for resources for which you have knowingly changed the URL. The point is that if a requested URL is completely unresolvable, then you should not just redirect it to your home page; Reserve redirects for replacing only those URLs for which you know the exact, correct, relevant, and unique replacement URL, as intended by the HTTP protocol.
| 8:10 am on Oct 27, 2010 (gmt 0)|
thanks for the informative reply, i like the meta refresh option, but i will rather redirect it to the 404 page (/notfound.html) as you explained it's bad idea to go to homepage.
The 404 page i designed is very informative and have all the most important links present
What i wanna ask is, althought a bad idea, because of the 'infinite' url space that will be created, can the below redirect be done in apache ?
Redirect ALL 404 pages to http://www.example.com/notfound.html without using the meta-refresh option ?
| 8:38 am on Oct 27, 2010 (gmt 0)|
When requesting a page that does not exist, your server configuration should already be set up so that the contents of the file "notfound.html" are displayed.
This text should be displayed at the originally requested URL, and the server will automatically deliver the correct "HTTP/1.1 404-Not Found" response in the HTTP header.
The 404 status code in the HTTP header signals that there is nothing at the URL. The page of error message text is merely a hint for the human.
It would be a bad idea to redirect to example.com/notfound.html and for the browser URL bar to show this URL, because the error message page would then be delivered with a "200 OK" HTTP response code. By it's very nature, the "not found" error page MUST be delivered with a "404" HTTP status code.
| 9:03 am on Oct 27, 2010 (gmt 0)|
i see what ur saying, thanks for the help
| 12:30 pm on Oct 27, 2010 (gmt 0)|
The HTTP protocol document that I linked to above should be treated as "the law." In some cases, your site can break the law and get away with it. In other cases, breaking the law will result in your site getting the death penalty. So, it is not good to be ignorant of the law or to break it.
HTTP requires that certain server responses occur under certain circumstances, and these responses form the very foundation of how the Web works. If you return a 301, 302, 303, or 200 response when a 404 or 410 is called for, you should expect to have serious problems with both users and with search engines.
Returning incorrect server responses is no more safe than driving the wrong way on a high-speed road... and usually has a similarly-unpleasant result.
| 12:55 pm on Oct 27, 2010 (gmt 0)|
Noted ! thanks webmasters I understand that concept now
| 9:11 am on Oct 28, 2010 (gmt 0)|
although it's a bad idea, cant you give me the htaccess code to do this anyway ?
| 11:35 am on Oct 28, 2010 (gmt 0)|
When someone announces they want to shoot themselves in the foot, my inclination is to hide the gun.
Other members may disagree.
| 1:29 pm on Oct 28, 2010 (gmt 0)|
lol, it's just for sake of learning more about htaccess rewrites, apache and everything that goes with it, good or bad
| 12:25 pm on Nov 3, 2010 (gmt 0)|
and... how about doing this:
normal htaccess 404 rule:
ErrorDocument 404 /notfound.php
but put header redirect in the php file. It will then redirect to a dedicated 404 page (/notfound.php)
| 1:49 pm on Nov 3, 2010 (gmt 0)|
This forum's Library docs (linked above) and gazillion posts, plus online Apache resources, etc., contain darn near everything you need to learn about htaccess, apache, mod_rewrite, etc.
(Aside to g1smd: Amen:)
| 7:32 pm on Nov 3, 2010 (gmt 0)|
Yes, but a redirect sends a 301, 302, or 307 code in response to the request. That is, a redirect does not send a 404 HTTP response code.
|but put header redirect in the php file. It will then redirect to a dedicated 404 page |
It might be that when the browser makes a new request for the new URL mentioned in the redirect header that it then receives a 404 header, for this second request, but that is not at all the same thing as geting the 404 error code returned for the very first URL request.
| 10:23 am on Nov 4, 2010 (gmt 0)|
i tested now, this is what happened
1. htaccess tells 404 error page to display notfound.php
2. then php tells browser to redirect to notfound.php using header redirect
3. then browser returns a 200 ok response because notfound.php is actually found afterwards
i need the browser to return a 404 error response only when the browser displayed the notfound.php coming from a broken link and not when accessed directly.
almost like an if-statement
| 5:58 pm on Nov 29, 2010 (gmt 0)|
You're still trying to shoot your self in the foot...
This is all you need:
ErrorDocument 404 /notfound.php
Then the script at /notfound.php should generate a nice "resource not found" page and help the user find what they wanted.
If you insist on *any* kind of redirect, then you likely won't have any users after a few months, because you will have created such a huge duplicate-content mess that Google and Big will likely just decide that it's easier to ignore your site than to try to index your infinite URL-space.
Your competitors can help you along to total obscurity by linking to as many non-existent URLs on your site as they feel they can get away with. Since your site will then return a 200-OK response, this will help the search engines decide to kick you out of their indices.
Really, if you want success, stick with the HTTP protocol requirements and find a compliant way to achieve whatever goal it is that continues to tempt you to hurt yourself. The correct response to a request for missing resource is a 404_not Found. The correct response to a request for an intentionally-removed resource is a 410-Gone. Only in the case where a typo in a powerful incoming link causes traffic for a high-traffic page to be lost should a 301 redirect to the correct page be used. And the code for that depends on *exactly* how the link is "wrong," and so can't be given without concrete examples to work from.
| 8:08 am on Nov 30, 2010 (gmt 0)|
thank you, this thread was a good learning curve
"ErrorDocument 404 /notfound.php" it will be !