Forum Moderators: phranque

Message Too Old, No Replies

404 problem

404 ErrorDocument

         

bstras32

6:54 pm on Mar 17, 2010 (gmt 0)

10+ Year Member



I recently got my rewrite working and now I can't utilize 404 error. Here is my code:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /WebFrame/index\.php\?page=([a-z0-9]+) [NC,OR]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /WebFrame/\?page=([a-z0-9]+) [NC]
RewriteRule (.*) http://localhost/WebFrame/%1.html? [R=301,L]

If I type a wrong url in the browser I get a warning:

Warning: include(Page4456.php) [function.include]: failed to open stream: No such file or directory in C:\wamp\www\WebFrame\files\pageManager.php on line 17

Now if I comment out the code I just wrote I will get the 404 error. Is there anyway to fix this in .htaccess? I tried to use file_exists() in my PHP script but I still get the warnings.

[edited by: jdMorgan at 7:11 pm (utc) on Mar 17, 2010]
[edit reason] De-linked localhost url for clarity. [/edit]

jdMorgan

7:12 pm on Mar 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code above does a redirect to a .html URL, so it is unclear how or why you get a PHP warning. Do you also have an internal rewriterule that rewrites .html requests to .php? If so, we will need to see that code as well.

Jim

bstras32

7:39 pm on Mar 17, 2010 (gmt 0)

10+ Year Member



Here was my other RewriteRule:
RewriteRule (.*)\.html index.php?page=$1

So I am using PHP just rewriting it to html. Not sure if this is the right was of doing it. I tried just keeping if PHP and stripping off the query but I think I am getting an infinite loop.

bstras32

7:40 pm on Mar 17, 2010 (gmt 0)

10+ Year Member



Sorry of the spelling....

bstras32

8:17 pm on Mar 17, 2010 (gmt 0)

10+ Year Member



It seems that if I write the wrong url in PHP I get the 404 error and I get a warning if it is HTML. Would it be easier if I did not convert it to HTML? If I don't convert it how do I make this url:

index.php?page=Page1

to

Page1.php
Here is the code I used:

RewriteRule (.*)\.php index.php?page=$1

I've tried it a bunch of ways but keep getting the infinite loop.

jdMorgan

3:00 am on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't panic.

Don't change your whole design approach because of a minor problem. You remain in charge, and the server must do your bidding, not the reverse...

First steps:

With your internal rewrite rule, you are rewriting all requests for .html page URLs to the index.php file. Therefore, index.php decides "what exists" and what does not -- likely based upon your database (or hard-coded page names in the script). On your server, with that internal rewrite in place, *all* .html page URLs exist, because they all resolve to the file index.php, which exists.

Therefore, your php script must handle all 404 errors for all "so-called virtual/dynamic HTML pages" that do not exist because your script cannot produce a page for that URL. In fact, only your script can decide what pages do or do not exist regardless of "file extension", because it is tasked with producing them.

One other thing: I'm not sure if this is even related, but sometimes these odd-ball problems are caused because the server has MultiViews (content-negotiation) enabled by default. If you don't use them, turn them off, as this can simplify "404 problems" and save some otherwise-wasted CPU time as well. Use
 Options -MultiViews 

or combine this with your existing Options directive, e.g.
 Options +FollowSymLinks -Indexes -MultiViews 


Jim

jdMorgan

3:07 am on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, if you're on Apache 2.0+, you might want to disable AcceptPathInfo as well, unless you use it.
 AcceptPathInfo Off 


This can cause problems in the case where the requested URL does not exist, but a shortened version of that URL does exist. It's easy to hit the case where a non-existent URL gets "rewritten" by AcceptPathInfo to a shorter URL which in turn resolves to your script through another mechanism, such as mod_dir's DirectoryIndex directive. Since the original URL is truncated, with the removed part moved into the PATH_INFO variable, this can often cause unexpected results if you're not even aware that this function is active.

Jim

bstras32

12:55 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



Thanks jd, I finally got it working. I was able to get file_exists() working, it turned out I wasn't looking in the right directory for the file. So now my script looks for the file and if it doesn't exist it sends the user to the 404 page. On the other hand, if someone tries to enter "Page304.php" and it doesn't exist the .htaccess will redirect to the 404 page.

Just wanted to say this is a great forum! Much better than Experts Exchange.

g1smd

3:26 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am concerned about the words "redirects to the 404 page".

Please use "Live HTTP Headers" to examine the HTTP request and reply. Ensure the FIRST response contains the 404 status. If you see a 301 or 302 status, or anything else other then 404, then there is another fatal problem to fix.

jdMorgan

3:41 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> if someone tries to enter "Page304.php" and it doesn't exist the .htaccess will redirect to the 404 page.

I hope that this is just loose usage of the terminology, here... Be *very* sure that no redirect is involved, and that a 404-Not Found response is sent directly. If any 301, 302, or 303 redirect responses are being sent, this is what we call "SEO suicide." Verify with the "Live HTTP Headers" add-on for Firefox/Mozilla browsers (or a similar add-on or tool).

Jim

jdMorgan

3:42 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And that's what I get for leaving a thread open in my browser for too long... :)

Jim

g1smd

4:58 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That happens to me on a regular basis, too. :)

bstras32

6:50 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



Thanks for all the great advise! I think I just misspoke in haste. Basically I check to see if the file exits and if it doesn't then I utilize my Error.php file. Here is my result from Live HTTP Headers, let me know what you think:

[localhost...]

GET /WebFrame/Page4456.html HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: [localhost...]

HTTP/1.1 200 OK
Date: Thu, 18 Mar 2010 18:44:33 GMT
Server: Apache/2.2.11 (Win32) PHP/5.2.9-1
X-Powered-By: PHP/5.2.9-1
Content-Length: 1675
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

g1smd

7:09 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The '200 OK' says the file exists. It failed to send the correct 404 status code.

Your script should be sending a HTTP 404 Header before sending the error message out.

bstras32

7:50 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



That's because my script checks the file directory and if it doesn't exist it sends them to Error.php which does exist.

bstras32

7:54 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



Rather than doing it that way should I be trying to actually send a 404 status code?

bstras32

8:09 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



Just so there is no confusion here is my php:

if (file_exists($filename)) {
include($page.".php");
$page1 = new $page;
}else {
include("Error.php");
$page1 = new Error;
header('HTTP/1.1 404 Not Found');
}

Is this what you guys are getting at? That I do actually need to specify the header is a 404?
I added the header and now my HTTP header looks like this:

[localhost...]

GET /WebFrame/Page4456.html HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cache-Control: max-age=0

HTTP/1.1 404 Not Found
Date: Thu, 18 Mar 2010 20:04:05 GMT
Server: Apache/2.2.11 (Win32) PHP/5.2.9-1
X-Powered-By: PHP/5.2.9-1
Content-Length: 1675
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

g1smd

8:44 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, if a content page does not exist, the only way for the UA to know that it does not exist is for your server to send the 404/Not Found or 410/Gone status code.

Looking at the code, I think you need to move the HEADER line so that it is above the INCLUDE line. I don't know what the other line is for ($page1).


This is why the script needs to first analyse the URL request and check the content exists, next send any HTTP headers, then finally start sending HTML code and content. Once you have sent even a single character out to the browser, it is too late to send any HTTP headers.

Anytime I see a PHP script that starts off with the HTML DOCTYPE and <html> tag, way before it does any sort of looking at GET parameters, or way before opening the database and fetching stuff, I know the site will fail to provide the correct handling of Not Found URL requests. I see this problem very very often.

Many popular products are riddled with these problems, redirecting to a new URL to show an error message instead of sending the correct HTTP header (usually 404 or 410) directly followed by the appropriate content (usually some sort of error message and links to appropriate parts of the site).

Should you ever come across any of my coding, you would find the sending of DOCTYPE and start of the HTML page is usually at least 60 to 80% of the way through the PHP code. :)

bstras32

10:38 pm on Mar 18, 2010 (gmt 0)

10+ Year Member



Thanks for all the help, it makes sense and I will be sure to add the header before I do anything.