Forum Moderators: coopster

Message Too Old, No Replies

Check headers returned by my pages

My custom 404 page created an endless loop

         

barns101

12:36 am on Jun 16, 2006 (gmt 0)

10+ Year Member



I get a few requests that are URL encoded and this is not rewritten by my mod_rewrite setup and therefore generates a 404 Not Found error. So I decided that my custom 404 error page could URL decode the URI, check if the decoded page would return a 200 OK header and if so, I would provide the correct link to the visitor.

Not knowing the best way to check the headers returned by a URI on my own site, I tried FSOCKOPEN. The 404 error page URL decoded my test URI and offered the correct link. However, I realised that truly non-existent pages would create an endless loop because the 404 error page would check the URI again and result in another 404 error page being started, and so on...

(Don't I feel stupid! ;) )

I'm using mod_redirect and so file_exists() is not an option. Is there a way to do what I'm trying to achieve without creating an endless loop?

coopster

1:10 am on Jun 18, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



If you are getting to your 404 page in the first place, does this not mean that the page wasn't found? Can you clarify your post a bit? I think that it may be a bit confusing and therefore others aren't certain how to respond.

barns101

12:50 pm on Jun 19, 2006 (gmt 0)

10+ Year Member



Hi coopster, thanks for responding.

I have a "virtual" URL (http://www.example.com/city-centre/banker's-draft/) which mod_rewrite rewrites to http://www.example.com/search.php?area=city-centre&pub=banker's-draft The search script then searches the database based on the area and pub name provided.

That's all well and good. But a small number of spiders and browsers (e.g. Opera) URL encode the apostrophe and request [example...]

My mod_rewrite is set up to match a-z, 0-9, ampersands, full stops (periods) and hyphens like so:

RewriteRule ^([a-z-]+)/([a-z'0-9&\.-]+)/$ search.php?area=$1&name=$2 [NC,L]

When a request is made that has been URL encoded, mod_rewrite fails to make a match and so a 404 error is generated.

I thought that it would be a good idea to use a custom 404 error page (a PHP script) to URL decode the $REQUEST_URI and open a socket to see if the decoded URL would be matched by mod_rewrite and send back a 200 OK header.

That proved successful for URLs that had been URL encoded (like in the example above). However, for truly non-existent files (e.g. http://www.example.com/load_of_junk.htm) a loop would be created. This was because of my faulty logic!

Upon requesting http://www.example.com/load_of_junk.htm a 404 would be returned and the 404 script would URL decode that URI and try to request the decoded URI again. As the page really doesn't exist, this would trigger another 404 error and another instance of the script would URL decode the address and request it again... And so this would theoretically go on forever.

All I really want to do is check whether a URL will be matched with mod_rewrite and return a 200 OK header without triggering an endless chain of 404s!

Sorry for the long post and I hope that it makes more sense now!