Forum Moderators: phranque

Message Too Old, No Replies

.htaccess and removing php and trailing slashes

Need to ammend this code but stuck! Any help welcome :)

         

intotheblue

9:55 am on Oct 20, 2011 (gmt 0)

10+ Year Member



Hi everyone,

I was a member by another name, but this was quite a few years ago and don't have access any more to the email so can't access that account! I've now come across a problem and knew this would be the place to come back to and ask! :)

Ok so I have some code that allows the .php extension to be removed:


Options +FollowSymLinks -Indexes
DirectorySlash Off

RewriteEngine On

## Remove trailing slashes...
# If it's a directory
RewriteCond %{SCRIPT_FILENAME} -d [OR]
# or it's a PHP file.
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
# Redirect to remove the trailing slash.
RewriteRule ^(.+)/$ /$1 [R=301,L]

## Remove .php
# If it's a .php file
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{SCRIPT_FILENAME} -f
RewriteRule ^(.+)\.php$ /$1 [R=301,L]

## Add .php.
# If a .php file exists
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^(.*[^/])/?$ /$1.php [QSA,L]


This seems to work well on one site I use it for (might not be the most efficient though, so apologies), however I'm trying to adapt it for use in another site but can't get it to work.

The problem I'm having the above site has all its files in the root, so works fine. The second site has its main files in /subdirectory/ and so doesn't work. I've ammended the very last line to:

RewriteRule ^(.*[^/])/?$ /subdirectory/$1.php [QSA,L]


which seems to work, but I don't really know enough to start messing around with the code too much. I tried using RewriteBase to set it to /subdirectory/ but that seemed to have no effect.

Any help or pointers would be very much appreciated. Sorry if this has been asked before! :)

g1smd

10:00 am on Oct 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don't use .* or .+ at the beginning of a pattern. Use something that parses left to right. The .* grabs the entire input and forces hundreds of back off and retry trial matches.

The /?$ allows a URL request with or without a trailing slash, and therefore promotes duplicate content.

Redirects should include the protocol and domain name in the target URL otherwise they either promote duplicate content or create an unwanted multiple step redirection chain for non-canonical requests.

The -f and -d checks are very slow as they make the server read the hard drive twice for every request. By selecting better RegEx patterns and in one instance testing THE_REQUEST instead of the hard drive, some of those conditions can be eliminated from the code.

intotheblue

10:59 am on Oct 20, 2011 (gmt 0)

10+ Year Member



Thanks g1smd, knew it wouldn't be the best ;) Doing some further reading here I came across this:

#unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L]

#redirect external .php requests to extensionless url
RewriteCond %{THE_REQUEST} ^(.*)\.php([#?][^\ ]*)?\ HTTP/
RewriteRule ^(.*)\.php$ $1 [R=301,L]

#resolve .php file for extensionless php urls
RewriteRule ^(([^/]+/)*([^/.]+))/$ /$1.php [L]


Which seems to address some of your suggestions. Unfortunately I can't test this at the moment, but would you recommend any improvements? The regex pattern in the last line assumingly redirects to a subdirectory if I read that right, so that might be useful.

Thanks again for your continued help :)

intotheblue

11:14 am on Oct 20, 2011 (gmt 0)

10+ Year Member



Or this which might be better?

RewriteEngine On
#
# Internally rewrite extensionless URL to corresponding .php
# file unless the URL exists as a directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(([^/]+/)*[^.]+)$ $1.php [L]
#
# Externally redirect (only) direct client requests for .php URLs to extensionless URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.#?\ ]+\.php([#?][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php http://www.example.com/$1 [R=301,L]

lucy24

10:45 pm on Oct 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^(([^/]+/)*[^.]+)$

Yes. This is the correct wording for capturing a request that may contain multiple directories, and whose final element does not end in / or a period. But better say [^./] for the last segment, to avoid malformed requests ending in double //

Reasons for not starting with .* are

#1 it will make your server do a lot of unnecessary work
and
#2 it will cause *someone* hereabouts to give you ###.

Neither one is desirable.

If you are rewriting all external requests in .php, it is probably sufficient to say (without anchors)

RewriteCond %{THE_REQUEST} \.php
RewriteRule { blahblah }

This may look impossibly minimalist, but it works for me. In my case, looking for THE_REQUEST excludes auto-indexing, which uses php somewhere in the innards of the server. If you use php-based analytics such as piwik, you have to exclude that too or you'll be locking yourself out!