Forum Moderators: phranque

Message Too Old, No Replies

Redirect from non www and remove trailing slash and more!

         

maxed

12:15 am on Oct 17, 2008 (gmt 0)

10+ Year Member



Okay so im trying to redirect from

http://example.com/ to http://www.example.com/
http://example.com/name1 to http://www.example.com/name1
so that all non-www urls go to www urls.

also im trying to redirect from
http://www.example.com/name1/ to http://www.example.com/name1
http://example.com/name1/ to http://www.example.com/name1 (no www and unwanted trailing slash)
so that unwanted trailing slashes on these variables are 301 redirected to urls without the slash.

Additionally i wanted to organize my site so that categories are redirected to:
http://www.example.com/+london to http://www.example.com/index.php?r=london
http://www.example.com/+london+restaurant to http://www.example.com/index.php?r=london&c=restaurant
http://example.com/name1 to http://www.example.com/index.php?d=name1

After alot of googling i came up with the rewrite code in my .htacces:
------------------------------------------------------------------------
Options +FollowSymLinks
RewriteEngine On

# Redirect from non www to www version of domain
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,NC]

# Redirect to remove trailing slashes on extensionless URL requests
# If requested URL ending with slash does not resolve to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
# Externally redirect to remove trailing slash
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,NC]

# Redirect from non www to www version of domain
RewriteRule ^+([a-z0-9]+)+([a-z0-9]+)$ index.php?r=$1&c=$2 [NC,L]
RewriteRule ^+([a-z]+)$ index.php?a=$1 [NC,L]
RewriteRule ^([a-z0-9]+)$ index.php?d=$1 [NC,L]
-----------------------------------------------------------------------------

But it dosnt seem to work :(
help please!

Thanks
max

[edited by: jdMorgan at 12:22 am (utc) on Oct. 17, 2008]
[edit reason] Please use example.com only. [/edit]

maxed

9:17 pm on Oct 19, 2008 (gmt 0)

10+ Year Member



[sorry double post]

jdMorgan

9:51 pm on Oct 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you unconditionally remove trailing slashes, you'll be in deep trouble if you get a request for a directory [webmasterworld.com]. The result will be an infinite loop. Don't mix that function with the lowercasing.

Jim

maxed

10:08 pm on Oct 19, 2008 (gmt 0)

10+ Year Member



how about using something like -d to check if its a directory?
RewriteCond %{SCRIPT_FILENAME} !-d

-edit
Oh i guess this would be bad since then it wont rewrite urls that are physical yet contain uppercase characters.... i guess i will just have to do two 301 redirect if something like http://example.com/in-naME/ is requested....

jdMorgan

1:24 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you could chain two rules, the first checking to see if the file exists, and correcting the case if not, and the second one checking to be sure the lowercase URL exists, and doing the redirect if so.

I have to warn you: If you put *some* of your redirects in the server config file, then you need to put all of them into the server config file, otherwise, you will create many opportunities for stacked (multiple) redirects for a single client-invoked request. So, you need to take care of trailing slashes, direct index.php file requests (redirect to "/", domain canonicalization --all of it-- it one place to avoid multiple redirects.

Another thing you need to watch out for is that you don't want to do anything that limits your URLs or your domain/subdomain names if you cannot easily edit the config file and restart the server. For example, I included the checks for "file exists" in my lowercasing logic description above, because there are some files that you may want to add later that do have uppercase characters in them, notably the MSN/Live site authentication files, "LiveSearchSiteAuth.xml". So, you don't want to just blindly force lowercase and "www.example.com" because you might have a need for a URL with uppercase characters, or you might want to add a subdomain (for example, for a mobile site or to "export" your images for faster loading) later.

The only thing that comes to mind is to ask your host to simply define the RewriteMap in the config file, but not do the redirect there. Remember, you cannot define a RewriteMap in .htaccess, but you *can* call it from .htaccess. This would allow you to do all your checking and canonicalization redirects in the proper order to avoid stacked redirects.

I've been thinking about this situation half the day. If I reach any more useful conclusions, I'll re-post.

Jim

[edited by: jdMorgan at 1:24 am (utc) on Oct. 20, 2008]

maxed

1:42 am on Oct 20, 2008 (gmt 0)

10+ Year Member



Thank you very much jdMorgan, your help is invaluable.
I did not know i could define the rewritemap, that is indeed very interesting!... i shall tell my host to just add
--
rewriteEngine on
rewritemap lowercase int:tolower
--
to the httpd.conf so that i can add all my rules in .htaccess and sort it till it works as you suggested.

maxed

3:16 am on Oct 20, 2008 (gmt 0)

10+ Year Member



[double post -sorry]

maxed

7:25 am on Oct 20, 2008 (gmt 0)

10+ Year Member



Okay everything works now urls containing uppercase letters are redirected with all lowercase letters , the only problem is with trailing slashes.

There seems to be no way i can say from all uri's that contain uppercase letters and a trailing slash redirect to a url with the uri converted to lowercase and the trailing slash removed, except if a physical directory is requested, in which case leave the trailing slash and just redirect to a url containing the lowercased uri.

At the moment if a uri containing a trailing slash with any uppercase letters is requested, then there is redirection chain that first redirects to convert everything to lowercase then remove the slash.....

My httpd.conf contains:
--
RewriteEngine On
RewriteMap lc int:tolower
--

My .htaccess contains:
--
Options +FollowSymLinks
RewriteEngine On
#redirects to lowercased uri
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^(.+)$ http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?r=([a-zA-Z0-9]+)
RewriteRule ^index\.php$ http://www.example.com/in-${lc:%1}? [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^in-([a-z]+)$ index.php?r=$1 [L]
--

There must be some way to check if the lowercased uri that is genereated is requesting a physically existing directory, if so leave and simply redirect else remove the slash and then redirect.....

-edit-
The only method i can think of to solve my woes is to force the infamous "trailing slash" on everything, that way i cant go wrong!, even though technically it would be better if non directory uri's that just pass variables don't have a trailing slash (it also looks better!), at the moment i see no other way.....

argh mod_rewrite kills me!

g1smd

7:47 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why not have an extra rule that goes first? It would deal with *only" stuff that is *both* upper-case *and* has a trailing slash.

This new rule would be a combination of your first and second rules, and you would then follow it with everything that you already have, exactly as you have it now, including rule one and two. They do not change.

This works in the same way that while you have a specific non-www to www rule, the www forcing is also built into all the other rules. In this case you have separate rules for A and for B and you now add one up ahead of those for both A and B.

g1smd

7:57 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/${lc:$1} [R=301,L]

.

You've still got the issue that index file requests that are upper case will be double redirected.

In the same way, you could have a combined rule for those too.

It would be the first or second rule, it would have to go before the general upper to lower case rule.

g1smd

8:08 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I replace each of your "problems" with "short names", then your rules become something like this:

Fix URLs that have problems Upper and Index and www
Fix URLs that have problems Upper and Slash and www
Fix URLs that have problems Upper and www
Fix URLs that have problems Slash and www
Fix URLs that have problems Index and www
Fix URLs that have problems www (and Port).

This still leaves you with a double-redirect when the upper-case and/or trailing-slash URL has a port number and/or some other trailing punctuation attached. That's so rare that I would not cater for it as it would double the number of rules here.

maxed

8:44 am on Oct 20, 2008 (gmt 0)

10+ Year Member



the problem i think is that combining uri's that are uppercase and containing a slash dosnt seem to work with the rules i currently have:

# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/${lc:$1} [R=301,L]

this will not work as RewriteCond %{SCRIPT_FILENAME} !-d checks if the uppercase uri that has been requested is a physical directory, which there isnt. only the lowercase version so it will redirect to a url of a folder without the / and then all sorts of bad stuff seems to happen...

there is a folder /test/
i request http://example.com/TEST/ which the rule checks and finds that there is no folder TEST and so redirects to http://www.example.com/test

g1smd

9:03 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I need to think about this a bit more, but I still think the basic idea would be a rule that does several things at once, ahead of all the other rules.

jdMorgan

3:28 pm on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Something like this, perhaps? :)

# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{SCRIPT_FILENAME}} (.+)
RewriteCond %1 !-d
RewriteRule ^(.+)/$ http://www.example.com/%1 [R=301,L]

The extra RewriteCond is not strictly necessary, but I used it to prevent having to call "tolower" from both the RewriteCond that checks directory-exists and the RewriteRule. Using a single added RewriteCond to put the lowercased filepath into %1 and then back-referencing %1 twice should be much more efficient.

Jim

g1smd

3:36 pm on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I knew there would be one more step that was way above my head. Now that looks like a good solution.

Just need one more rule for "index URLs that have some upper-case somewhere within the path" to redirect to "www and lower-case, without index filename included". Currently those also see a double redirect.

maxed

8:04 pm on Oct 20, 2008 (gmt 0)

10+ Year Member



Thanks,it seems script_filename is the full server path (home/~user/public_html/example.com/name)so i tried using request_uri instead but it doesnt seem to successfully complete.

# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} (.+)
RewriteCond %1 !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]

It works with script_filename but dosnt seem to with request_uri or SCRIPT_URL.... any ideas?

maxed

8:17 pm on Oct 20, 2008 (gmt 0)

10+ Year Member



I think i got it working with :

# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} ^(.+)/$
RewriteCond %1 !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]
#

-edit
actually this dosnt work because the last rewritecond checks if the uri without the slash is a directory not the original so maybe its better to do
--
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} ^(.+)/$
RewriteCond ${lc:%{SCRIPT_FILENAME}} !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]
#
--

maxed

9:17 pm on Oct 20, 2008 (gmt 0)

10+ Year Member



Okay im quite happy how its all turned out with both your help:

--
Options +FollowSymLinks
RewriteEngine On
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ [example...] [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?r=([a-zA-Z0-9]+)
RewriteRule ^index\.php$ http://www.example.com/in-${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with index and upper-case" to "lowercased uri without index"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} ^(.+)/$
RewriteCond ${lc:%{SCRIPT_FILENAME}} !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} ^(.+)/$
RewriteCond ${lc:%{SCRIPT_FILENAME}} !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]
#
# Externally redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^(.+)$ http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^in-([a-z]+)$ index.php?r=$1 [L]
#
# Internally rewrite "friendly" URL requests to view.php
RewriteRule ^([a-z]+)$ view.php?d=$1 [L]
--

Actually this for some reason makes a direct request for http://www.example.com/index.php?r=name go to http://www.example.com/?r=name mm...
-Edit
i solved that by putting the redirects for ?r=ddd first..

is there anything else i need to take care of?.... i think ive covered all cases of duplicated content from uri's!

jdMorgan

10:55 pm on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



 # Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond ${lc:%{REQUEST_URI}} ^(.+)/$
RewriteCond ${lc:%{SCRIPT_FILENAME}} !-d
RewriteRule ^(.+)/$ http://www.example.com%1 [R=301,L]

The Second RewriteCond in this snippet is mostly redundant, since the RewriteRule already tests for the trailing slash. You're also calling "tolower" twice in a row for each request, which is really inefficient: Remember, this is a call to the operating system's command interpreter, outside of mod_rewrite and outside of Apache.

You should be able to code that more efficiently as:


# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/$ http://www.example.com/%1 [R=301,L]

Where DOCUMENT_ROOT + REQUEST_URI is equivalent to REQUEST_FILENAME, with some adjustments for slashes because I'm using the $1 back-reference to the previously-lowercased RewriteCond pattern instead of REQUEST_URI. This should work identically to what you have, but faster, since there's only one call to "tolower".

It also appears that there are two copies of this snippet in your file... ?

Jim

maxed

12:39 am on Oct 21, 2008 (gmt 0)

10+ Year Member



Yes i seem to have added two identical rulesets by mistake :o,
i have added the more efficient ruleset and everything seems to be working fine!
thanks for all the help.

g1smd

12:53 am on Oct 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A few very minor changes...

^(.+)$
can be simplified to
(.+)

(\.php¦\.html)
can be simplified to
\.(php¦html)

maxed

2:02 am on Oct 21, 2008 (gmt 0)

10+ Year Member



oo thanks have made the changes, is there any major speed downside to this many rules? Theres so many rules and every request has to go through until it either finds a match or has nothing wrong in which case it would have gone through every rule...

g1smd

7:20 am on Oct 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am currently working on a file with over 50 rules, and the associated site works just fine.

Its not the number of rules but the efficiency of each one that is the most important.

This 52 message thread spans 2 pages: 52