Forum Moderators: phranque

Message Too Old, No Replies

htaccess rewrite .php -> html having 404 issue

         

stefanosc

9:57 am on Mar 4, 2011 (gmt 0)

10+ Year Member



Hello

I am transitioning many static html pages on a small website to php.
To maintain page rank and incoming links I thought of using htaccess rewrite so that all new page.php could still be accessed via the old links page.html

I am not very experienced with htaccess and I did some research on this forum and other forums. I tested various syntax, including:

    RewriteRule ^(.*)\.html$ $1.php [R=301,L]
    RewriteRule ^(.*)\.html$ $1.php [L]
    RewriteRule ^([^.]+)\.html$ $1.php [L]



The last one seemed to work fine, but pages that are not .php (native .html pages) now return a 404.

Could anyone help with this?
Any suggestion would be very much appreciated

Thank you!

g1smd

9:13 pm on Mar 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does [webmasterworld.com...] help with more ideas?

The mod_rewrite code runs directly the server receives the URL request, a long time before any content is served.

If you rewrite all .html requests to be fulfilled by .asp scripts, then no native .html pages can ever be served.

You need to construct a better pattern to sort out which requests should be rewritten and which should be served.

stefanosc

4:51 am on Mar 5, 2011 (gmt 0)

10+ Year Member



Thank you g1smd for your reply.

Unfortunatly I don't have enough knowledge to construct a pattern, or at least I don't know the syntax to use.

In general I would think that the appropriate pattern is to rewrite only pages that end with the .php suffix
That's what I need to do, and then leave the .html pages unaltered.

Is this possible?
Could someone help with the syntax?

Thank you so much

g1smd

8:32 am on Mar 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



rewrite only pages that end with the .php suffix. That's what I need to do, and then leave the .html pages unaltered.

That's the problem.
mod_rewrite
does not rewrite your .php files and "make" .html URLs for them.

What it does, is accept URL requests ending in .html and rewrite the internal pointer so that a .php file is fetched instead. There needs to be a clue in the requested URL that says to
mod_rewrite
that the request does or does not need to be rewritten. That clue should come from the structure of the URLs you decide to use within your site.

There is another way to approach this, but it is brutally inefficient and will slow the site down. Add a RewriteCond that checks whether the .html request matches a real .html file on the server, and if it does not, then perform the rewrite. This method is best avoided.

stefanosc

9:09 am on Mar 5, 2011 (gmt 0)

10+ Year Member



Hello g1smd

Thank you for clarifying.

Could you help me with the syntax to use in the htaccess to exclude directories or files?

Thanks a million

g1smd

9:20 am on Mar 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I just said, the method of checking whether the .html URL request matches an exisiting .html file is inefficient, slow, and is best avoided.

Before going down that route, do this first:

Look at all of your .html URLs and make a list of those that are directly served by .html files and which requests are rewritten to be served by .php files.

You are looking for a "pattern" in the two sets of URLs, a pattern you can build into the RewriteRule RegEx. That pattern might be as simple as all URLs with a certain folder prefix are rewritten or those are the URLs that are NOT rewritten. Maybe all the URLs to be rewritten contain a hyphen. Maybe they all begin with the same word, or the same letter. Whatever it is, that pattern can be used to signal how mod_rewrite should make a decision.

If the list of URLs to be rewritten or to NOT be rewritten is short, then those names could be built into the rule.

In short, investigate all of those other approaches first. You need to avoid the -f "exists" checks.

jdMorgan

5:10 pm on Mar 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Beware of a potential security problem: Do not start a substitution with a 'naked' back-reference such as "$1" if it can be avoided...

RewriteCond %{DOCUMENT_ROOT}/$1.html !-f
RewriteRule ^([^.]+)\.html$ /$1.php [L]

As g1smd points out, this code will result in a disk check for every .html URL requested from your server. For best performance, identify as many exclusions as possible and code them as negative-match RewriteConds in order to avoid unnecessary 'file exists' checks.

Jim

stefanosc

6:14 pm on Mar 10, 2011 (gmt 0)

10+ Year Member



Thank you g1smd and jdMorgan for your very useful suggestions.

Together with my partners we looked at the whole project and decided to stick with all .php, this way it makes it really easy no logic to figure out.

I have tested this in the hataccess and works fine:

Options +Indexes
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteRule ^([^.]+)\.html$ $1.php [L]

I did not fully understand your suggestion about security jdMorgan, if you had time and wanted to clarify I would very much appreciate.

And Thanks a million for all the help!
:)

jdMorgan

7:39 pm on Mar 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I will not elaborate on the exploit in a public forum, but you can make your rule 'safe' and more-robust:

RewriteRule ^(([^/]+/)*[^.]+)\.html$ /$1.php [L]

Jim

stefanosc

7:54 pm on Mar 17, 2011 (gmt 0)

10+ Year Member



Hello Jim,

Thank you for taking the time to answer, I appreciate very much your input on this.
Moreover I appreciate your concern regarding online security for all, I respect that.

Thank you for making this forum such a great resource and generous place.
Stefano