Forum Moderators: phranque

Message Too Old, No Replies

ReWrite PHP to HTML, and return 404 on .php

rewrite,php,html,404

         

expresspotato

6:06 am on Oct 8, 2009 (gmt 0)

10+ Year Member



Hi,

My site makes use of PHP however I have made the following rewrite rule that allows me to link directly to .html pages:


RewriteEngine On
RewriteBase /forums
#
RewriteRule ^(.*)\.html $1.php [L]

Now I would like any request for a .php page to return a 404.

Trying


RewriteRule ^(.*)\.php $.h [R=404,S=1]

Either before or after the above rule didn't help. Any advice (or simply a solution) would be greatly appreciated.

[edited by: jdMorgan at 1:13 pm (utc) on Oct. 8, 2009]
[edit reason] Formatting [/edit]

expresspotato

6:08 am on Oct 8, 2009 (gmt 0)

10+ Year Member



PS: Odd I can't edit my message to fix the type=o...?

jdMorgan

1:31 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A proper solution is to examine the HTTP request line from the client, and if the client is directly-requesting a .php URL-path, externally redirect it to the corresponding .html URL-path. This prevents .php URLs from appearing (or remaining) in the search engine results.

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.php(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ http://www.example.com/$1.html [R=301,L]

If you insist on using a 404-Not Found, which tells the search engines that the URL can't be found right now, and that the reason for this error is unknown --that is, that the URL may be valid tomorrow-- then on Apache 2.x you can use:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.php(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^([^/]+/)*[^.]+\.php$ - [R=404,L]

while for Apache 1.3 or 2.x, internally rewriting the request to any filepath that does not exist and will never exist is another way to throw a 404:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.php(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^([^/]+/)*[^.]+\.php$ /path-which-will-never-exist.html [L]

Jim

g1smd

2:49 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Especially use the 301 redirect if .php URLs are already indexed. The redirect will get the reader to the right page, as well as forcing the searchengine to update their indexes.

expresspotato

6:42 pm on Oct 13, 2009 (gmt 0)

10+ Year Member



Hi,

Sorry but those solutions weren't exactly what I was looking for. Close though.

The .php files already link to a .html... So rather than redirecting .php to the .html, simply rewrite the .php for use with all the includes <files *.php> stuff, and return 404 for the .php requests.

expresspotato

6:58 pm on Oct 13, 2009 (gmt 0)

10+ Year Member



Hi,

Tweaking the rules ever so slightly got it to work the way it should.


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.php(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ $1.notfound [R=404]

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.html(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.html$ $1.php [L]

jdMorgan

7:08 pm on Oct 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We recommended using a 301 to preserve the existing links and 'ranking power' of the .php URLs. If those URLs were not linked and ranked, then all to the good. If they were linked and ranked, then you got the code working the way you wanted it to, but not "the way it should."

Keep in mind that server config code must be considered in terms of its effects on both server operation *and* search engine listings; We see Webmasters committing 'virtual SE suicide' on an almost daily basis, and we can only hope that you're not the latest... :)

[added]
Your second rule is unnecessarily complex, and contains an exploitable security hole. You can delete the RewriteCond entirely, and you should change the rule to:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.html(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.html$ [b]/$[/b]1.php [L]
[/added]
so that HTTP clients cannot control the beginning of the rewritten path, and use that control to attack your server.

Jim

expresspotato

7:20 pm on Oct 13, 2009 (gmt 0)

10+ Year Member



Hi,

I'm kinda confused, could you re-post the whole thing (both rules) as I should without the security whole jdMorgan?

jdMorgan

1:33 pm on Oct 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, that was a cut-n-paste error, and I included the RewriteCond that I had advised deleting in the second rule...

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/#?\ ]+/)*[^.#?\ ]+\.php(#[^?\ ]*)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ [b]/$[/b]1.notfound [R=404]
#
RewriteRule ^(([^/]+/)*[^.]+)\.html$ [b]/$[/b]1.php [L]

In both cases here, we're forcing a slash at the beginning of the substitution path to ensure that clients cannot get control of the very beginning of the new path.

The purpose of the RewriteCond on the first rule is to make sure that requests rewritten from ".html" paths to ".php" paths by the second rule do not then get subsequently 404ed by the first rule. We check the original client request to confirm that it is the client asking for a ".php" file, rather than an internal request that has already been rewritten by the second rule.

This same kind of test is not necessary on the second rule because we always want to deliver content using PHP scripts whenever a ".html" resource is requested, and because this rule is second (it follows the first rule), you "won't be able to reach" the second rule until the first rule has already been invoked.

Jim