Forum Moderators: phranque

Message Too Old, No Replies

Force www & trailing slash to URL using htaccess file rules

         

stemc

10:20 pm on Nov 11, 2010 (gmt 0)

10+ Year Member



Hi there,

I'm using a PHP/MySQL CMS (ExpressionEngine) on a Linux web server running Apache 2. I'm currently using these htaccess rules to remove index.php from the URL's:

RewriteEngine On

# Remove Index.php from URLs
RewriteCond $1 !(^index\.php|(\.(gif|jpe?g|png|css|js)))$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]


This means that mywebsite.com/index.php/about/ becomes mywebsite.com/about/, which is much nicer.

I'd like to add two more rules to this:

1. Force www if it's not typed in, so www.mywebsite.com instead of mywebsite.com
2. Add a trailing slash onto a URL if there's not one there already, so mywebsite.com/about/ instead of mywebsite.com/about

I've been searching around for standard rules to add for these, but none of them seem to play nicely with my existing rule to remove index.php.

When they even worked, then tended to convert www.mywebsite.com/about/ into www.mywebsite.com/index.php/about/, like with this rule:


# Force www
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]


I wonder if anyone could help me out with this please?

Thanks,

Stephen

g1smd

11:46 pm on Nov 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This should be fairly easy.

The first rule should be your add slash rule, and it should force the www for those requests at the same time.

The next rule will be the standard non-www to www rule.

Finally, you'll use your rewrite which you already have now.

Placing the rules in any other order will lead to unwanted side effects, the most common of which is the external redirect re-exposing the internal server folder structure back out onto the web as external URLs.

That looks somewhat like the problem you're currently experiencing.

This means that example.com/index.php/about/ becomes example.com/about/ which is much nicer.

Be very clear that .htaccess does not "make" or "change" URLs. What your rule actually does is accept a URL request for example.com/about and to rewrite it so that content from the internal server path /index.php/about/ is served for that request instead of from the default internal server location as originally suggested by the initial URL request.

One other rule you need to add, is an external redirect such that if there's an external request for example.com/index.php/about the request will be externally redirected to the new URL.

Finally, make sure that all of the pages of the site link to the right URL. It is bad form to click on an internal navigation link within a site, and for that request to then be redirected.

stemc

2:40 am on Nov 14, 2010 (gmt 0)

10+ Year Member



Hi g1smd,

Sorry for the delay in replying but I didn't seem to get the email notification of your reply (or it's gone in my junkmail). It took a few days for me to wonder if anyone had responded so I came back here manually to see your reply, thanks.

I think I was previously going wrong by adding the 'Remove index.php' rules first.

Seeing as I'm having trouble, I thought I'd start again and try adding one rule at a time. So this time, I thought I'd have a go at forcing the trailing slash and see if I could get this working.

Now I've got the rules in the right order, I'm experiencing the issue you mentioned about exposing the server folder structure.

Here's what I've got:


RewriteEngine On

# Add a trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule ^(.*)$ $1/ [L,R=301]

# Remove Index.php
RewriteCond $1 !(^index\.php|(\.(gif|jpe?g|png|css|js)))$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]


The above works fine if my trailing slashes are in place, but the second I remove a trailing slash for test purposes, it converts www.mywebsite.com/about to mywebsite.com/home/myaccount/public_html/about/

Also, I can confirm that I'm using internal links on my website that are in the correct format, so these htaccess rules should just tidy up any incoming external links that I can't control.

I wasn't sure what you meant when mentioning adding the external redirect - if you mean to account for people linking to www.mywebsite.com/index.php/about, then I'm pretty sure that's not going to happen so I'm happy to leave this. It's just the missing trailing slash (first of all), and then the www that I want to address.

Can you think of any ideas why the above is exposing my internal folder structure? I've googled for other variations of adding a trailing slash, but they all did the same thing of exposing the internal folder structure. Some of the options included adding 'RewriteBase /' after the 'RewriteEngine On', but that didn't help either.

Thanks,

Stephen

g1smd

8:52 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You must specify the protocol and domain for the redirect target, every time, otherwise the server will use the CanonicalName instead.

Another difficulty, is that a URL with a trailing slash indicates a folder or the index page of a folder. You'll find the going much easier if you use extensionless URLs for pages, without a trailing slash.

jdMorgan

11:24 pm on Nov 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



See also the concurrent discussion in [webmasterworld.com...] for other server functions which can cause such "slash" problems if they are enabled and your site does not require them to be enabled.

Jim

stemc

12:35 pm on Nov 24, 2010 (gmt 0)

10+ Year Member



Thanks g1dmd - I think the domain was where I was going wrong with this, as I wasn't specifying it!

Here's the working rules I now have:

RewriteEngine On

# Add trailing slash to URLs if it is not there
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ http://www.mywebsite.com/$1/ [R=301,L]

# Remove index.php from URLs
RewriteCond $1 !(^index\.php|(\.(gif|jpe?g|png|css|js)))$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]


I've tested this with all the URL's on the site and I haven't noticed any issues with this at all yet, so hope it's going to work out okay for me.

I just think URL's with with the trailing slash look nicer, though I'd be willing to re-think this view if you tell me that the above code is causing a performance hit or other issue that having a rule to always remove the trailing slash wouldn't add?

Now onto forcing the www bit - I might be back... :)

Thanks for your help as always,

Stephen

stemc

1:03 pm on Nov 24, 2010 (gmt 0)

10+ Year Member



Hi there,

Just a quick update, I added some code to force the URL's and this seems to work fine too, and works with the trailing slash rule too:

RewriteEngine On

# Force www in the URLs
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

# Add trailing slash to URLs
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ http://www.mywebsite.com/$1/ [R=301,L]

# Remove index.php from URLs
RewriteCond $1 !(^index\.php|(\.(gif|jpe?g|png|css|js)))$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]

Let me know if you can see any issues or can think of any performance improvements that could be made with this, but it seems to be working okay.

Thanks,

Stephen

g1smd

9:45 pm on Nov 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The pattern: [a-zA-Z0-9]

can be simplified to:

[a-z0-9] with the [NC] flag.

It will parse 33% faster.

jdMorgan

5:04 pm on Dec 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, always put the file- and directory-exists-check RewriteConds *last* when possible. This avoids unnecessary calls to the OS filesystem handler, which may in turn require *physical disk reads* -- a very slow operation to be avoided whenever possible, and a good way to make your hard drive last a bit longer as well.

In fact, I see no reason to check for file- and/or directory-exists at all in your trailing-slash rule, as directories will always have a trailing slash (or mod_dir will add one), and any 'real files' on the server must have a filetype, so that it will be possible for the server to identify the correct MIME-type to send with its response. Since these two cases are already covered, there is no use beating your disk to death checking them again.

You've apparently ignored or mis-interpreted g1smd's rule-order pointer above. The correct rule order, along with the exists-check optimization and others, is:

# Externally redirect to add missing trailing slash to URLs with no filetype
RewriteCond $1 !(\.[a-z0-9]{1,5}|/)$ [NC]
RewriteRule ^(.*)$ http://www.mywebsite.com/$1/ [R=301,L]

# Externally redirect non-blank non-canonical hostname request to canonical hostname
# (if not already done by the above rule)
RewriteCond %{HTTP_HOST} !^(www\.mysite.com)?$
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]

# Rewrite all requests which do not resolve to existing files to the CMS script, except
# for image, css, and JS file requests, none of which need to be handled by the CMS,
# and requests for index.php itself (to avoid a wasteful second-pass exists check).
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|png|css|js))$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php/$1 [L]

Jim

Adam_Khan

2:46 pm on Mar 3, 2011 (gmt 0)

10+ Year Member



I had this precise problem, thanks stemc for posting the solution and g1smd and jdMorgan for refinements.

stemc

3:19 pm on Mar 3, 2011 (gmt 0)

10+ Year Member



No worries Adam, and a belated thanks to Jim and g1smd for the additional refinements too.

Thanks,

Stephen

g1smd

8:38 pm on Mar 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am confused as to why this form of code is so widely used:

RewriteRule ^(.*)$ /index.php/$1 [L]


when this could be so such more simple:

RewriteRule (.*) /index.php?param=$1 [L]


It eliminates a whole chuck of (slow and inefficient) PHP code that chops up the URL request and attempts to parse the various parameters and values within it.

Using mod_rewrite to analyse the requested URL and then call the right scripts with the right parameters can be a lot more efficient, and yet many systems actively avoid using the right tool in the right way to do the best possible job.

jdMorgan

11:49 pm on Mar 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One reason is likely that on servers that support it, the whole rule in question can be omitted and replaced with "AcceptPathInfo on" -- if and only if the "script parameters" are appended to the URL-path as part of that URL-path (and not, for example, as a query string).

So, sticking the parms on the end of the URL as is done in the code above means that the script receiving those parms will work with either a mod_rewrite solution or with the AcceptPathInfo method without needing to be changed.

Jim