Forum Moderators: phranque

Message Too Old, No Replies

Problem using ErrorDocument 404 to redirect to PHP script

404 response triggers BHO Spyware/Adware detection

         

justinanderson

6:51 pm on Jul 8, 2004 (gmt 0)

10+ Year Member



I am pretty new to Apache and have written a .htaccess file to redirect Error404's to index.php in the website's root folder.

Here is what I have in the .htaccess file:
----------------------------------------------
ErrorDocument 404 /index.php
----------------------------------------------

The purpose is to allow for virtual folders like 'www.mydomain.com/anything' and not need a folder called anything so the index.php can use the name of the folder in a database query to track page hits.

This works great except when the client (visitor) has a Browser Helper Object (BHO) that can figure out if the page being redirected to was the result of an Error404 and shows their little search page saying "oops.. we couldn't find the page, search our ads!" This is becoming more and more of an issue in my opinion and that is why I am looking for a solution now.

So my question is...
"Is there a method of ErrorDocument Handling that will allow Redirection from a 404 Error to a script on the same server without allowing the client's browser to know it is an Error 404 so it can't override the redirection?"

My thoughts...
"I think it is receiving the 'Error 404' part in a header while it is redirecting.. can that header be shut off so the client's browser can't receive it?"

"Is it possible that I have overlooked something in the .htaccess file, or not writing something correctly?"

Thanks!
Justin Anderson

[edited by: jdMorgan at 7:31 pm (utc) on July 8, 2004]
[edit reason] No sigs, please... See TOS [/edit]

jdMorgan

7:50 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Justin,

Welcome to WebmasterWorld [webmasterworld.com]!

I'd suggest that you replace the functionality of ErrorDocument (as you are using it) with mod_rewrite.


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . /index.php

This will internally rewrite any request for a non-existent resource to /index.php. Since this is an internal rewrite, the client is unaware of it. As written, it won't redirect existing directories, nor will it redirect files (such as index.php itself). If this does not fully meet your needs, see the references below for more options.

ErrorDocument 404 should only be used for reporting missing resources to the client.

Apache mod_rewrite documentation [httpd.apache.org]
Apache URL Rewriting Guide [httpd.apache.org]
Regular Expressions Tutorial [etext.lib.virginia.edu]

Jim

justinanderson

8:23 pm on Jul 8, 2004 (gmt 0)

10+ Year Member



KICK A$$!

It worked!

Thank you Jim!

phpmaven

5:25 am on Jul 9, 2004 (gmt 0)

10+ Year Member



Since I see you are using PHP, the other option would be to use the header() function in PHP and the Browser/Spider will never get a 404. For example:

header("HTTP/1.1 200 OK");

Then display your content. To the outside world it will look like the page was there.

jdMorgan

2:15 pm on Jul 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> the Browser/Spider will never get a 404.

That's really quite bad. If a spider gets a 200-OK response for *anything* it requests from your site, you're likely to end up with duplicate-content problems, a bunch of unnecessary/non-optimal URLs listed in the SERPs, good URLs not indexed, and a limited crawl depth on your site. Spiders fear these spider-trap sites where the URL-space is infinite, and will often arbitrarily limit their crawl depth on such sites.

Best practices require returning a 404-Not Found response for any URL that does not locate unique content on your site. It does not matter what technology you use, whether it's the ErrorDocument 404 or php writing the 404 response. Also note that HTTP/1.1 allows you to respond with 410-Gone, which unambiguously flags intentionally-removed content.

Jim

phpmaven

8:07 pm on Jul 9, 2004 (gmt 0)

10+ Year Member



That's really quite bad. If a spider gets a 200-OK response for *anything* it requests from your site, you're likely to end up with duplicate-content problems

Your point is well taken. I was certainly not suggesting that you just send a 200 header on every request for a page. I was merely trying to point out that if you were going to use the "ErrorDocument 404 /index.php to send every request to a monolithic do-everything script" method, you would need to issue the proper headers. I could have explained myself a bit better. Obviously your code would have to be well written to avoid the pitfalls you mentioned.

As you pointed out yourself this is not a good way to design your site. I tried this method myself when I first started using PHP and ran into many problems. I now use mod_rewrite to parse static urls to dynamic ones and it works very well. It did take me many hours of banging my head against the wall before I got it working correcty though ;-)