Forum Moderators: phranque

Message Too Old, No Replies

.htaccess modrewrite - remove extension, add trailing slash

         

woemlavy

1:55 pm on Sep 29, 2015 (gmt 0)

10+ Year Member



I'm working on a site's htaccess and need a little guidance to do the following...

Force www
Remove .php extensions on root and in directories
Add trailing slash on root and in directories
Exclude specific directories from these rules

The following code works on root, but not in directories.

RewriteEngine on 

RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]


RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_URI} !/directory1
RewriteCond %{REQUEST_URI} !/directory2
RewriteCond %{REQUEST_URI} !/directory3
RewriteCond %{REQUEST_URI} !/directory4

RewriteCond %{REQUEST_URI} !\.php$
RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]

[edited by: engine at 2:47 pm (utc) on Sep 29, 2015]
[edit reason] please use example.com [/edit]

woemlavy

6:03 pm on Sep 29, 2015 (gmt 0)

10+ Year Member



OK.... Been reading the manual and there's some progress. The www. is forced. The .php extensions drop and the forward slash is added on root and directories. Now my issue is that files that do not exist (404) fall into a PHP loop adding .php/ after the url. Not sure how to fix the issue.

Here's my current HTACCESS. Any ideas?


RewriteEngine on

# CANONICAL URL
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

#REDIRECT INDEX.PHP TO ROOT
RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)index.php$ /$1 [R=301,L]

#REMOVE .PHP EXTENSION
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1/ [R=301,L]

# ADD TRAILING SLASH
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !/$
RewriteRule . %{REQUEST_URI}/ [L,R=301]

# INTERNALLY FORWARD (example: /dir/foo to /dir/foo.php)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*?)/?$ $1.php [L]

lucy24

9:03 pm on Sep 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add trailing slash on root and in directories

If you're talking about directories that physically exist, you shouldn't need mod_rewrite for this. mod_dir has two jobs, and that's one of them. (There's no such thing as a slash or no-slash in the root, because the server only receives one type of request; it's the browser's whim whether to physically display a trailing slash. So there are a few places where .* can be replaced with .+ in patterns)

Some of your rules are in the wrong order. The rule you call "force www" -- better seen as the domain-name-canonicalization redirect-- should be the very last external redirect, and its condition should be expressed as
^(www\.example\.com)?$

The second- or third-to-last external redirect is for index.xtn. (Generally second-last but in your case it should go before the trailing slash, if it turns out you still need it.) There are a few variations; one form is
RewriteCond %{THE_REQUEST} index\.php
RewriteRule ^(([^/]+/)*)index\.php http://www.example.com/$1 [R=301,L]
Always include the full protocol-plus-domain in the target to avoid a possible double redirect. The element ^.* is never necessary if you're not capturing. (Similarly .*$) Note too that you don't need the $ closing anchor after "index.php"; if you leave it out, the same rule will take care of extra stuff after the extension. If your filepaths never contain periods, you might instead express the pattern as
^([^.]+)index\.php
which is a teeny bit more efficient. (Literal periods are perfectly legal, and necessary on some sites--including apache.org itself--but it makes things a lot simpler if you don't use them.)

To prevent infinite loops, you need to ensure that the request doesn't already end in .php. This is another place where things are much easier if your filepaths don't contain literal periods, because then the pattern of the rule can be expressed as
^([^.]+[^./])$
Anchors here are, of course, essential.

The target of an internal rewrite should always start in / for safety.

Now, here's the problem:
^(.*?)/?$
If the request happens to end in a / slash, it will be captured. The ? can't be relied on to exclude the / slash; in fact I wouldn't use it in mod_rewrite at all. (Text editing, yes, sometimes, but it's not severe enough for situations like Apache where you have to be unambiguous.) Instead you need something like this:
^(.+[^/])/?$
Use .+ rather than .* because as already explained, requests for the root will never meet this rule.

One other issue I noticed: If your URLs end in / slash, you will get requests for /index.html even if you don't use html. Search engines do this as a matter of routine. Insert nasty comments about Entrapment. They will also request forms without trailing slash, but that's already taken care of. So, wherever your rules currently say "index\.php" you might want to substitute "index\.(htm|php)" without closing anchor.