Forum Moderators: phranque

Message Too Old, No Replies

Redirecting vhost from www to non-www and forcing to lowercase

mod_rewrite canonical redirectMatch virtualhost

         

DyeA

7:29 am on May 7, 2008 (gmt 0)

10+ Year Member



Hello folks,

Ok, I give up, I have spent hours trying to figure this out, searched this site a ton but can't seem to find my exact situation and I can't afford to have the dedicated virtual server and the 15 sites it's running come crashing down for any extended length of time! If anyone could give any assistance I would bake you cookies, wear glasses with tape in the middle for a week, and vote against my preferred candidate(ok, not quite that one :))

--------------------

My goals
1. Any url prefixed with "www." gets forced to the non-www version.

2. Force all urls to lowercase: there are links to my sites out there on the net with the tld capitalized as well as the file path and file names so I need to detect it everywhere and lowercase it.

--------------------

My attempt at www to non www rewrite:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule (.*) http://example.com/$1
I put this code in a newly created vhost.conf file in var/www/vhosts/example.com/conf/ and then asked the server to reconfigure the webhost thus
usr/local/psa/admin/sbin/websrmng --reconfigure-vhost --vhost-name=example.com
and restarted with
etc/init.d/httpd restart
Everything went well but there was no redirection upon visitng www.example.com in my browser.

Also, I know this is a sloppy way to write this if it did work, because urls that came in as www.example.com would be rewritten as example.com/ (notice the trailing slash) and I wouldn't want that for SEO I don't think, but I don't know how to look for the / properly and only include it if its present and has a file or file path after it.

My attempt at force tld, file path, and file name to lowercase:
I haven't actually tried to implement this yet as I don't want all the sites to come crashing down around my ears. My plan, however was this -
Add this to my newly created var/www/vhosts/example.com/conf/vhost.conf file in a newly created <VirtualHost> container:
RewriteEngine On
RewriteOptions Inherit
This should allow me to write the rules in the main server config and have all the virtual hosts use them.
Adding this
RewriteEngine on
RewriteMap lowercase int:tolower

RewriteCond %{PATH_INFO} [A-Z]
RewriteRule (.*) {lowercase:$1} [R=301]

RewriteCond %{HTTP_HOST} [A-Z]
RewriteRule (.*) {lowercase:${HTTP_HOST}$1} [R=301]

###
Anyway I am in over my head here, any help would be much appreciated!

Thanks,
Ward

DyeA

10:28 pm on May 7, 2008 (gmt 0)

10+ Year Member



Update to this:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule (.*) http://example.com/$1

Is now working, not sure why the delay. My regex is incorrect though, it gives some inappropriate responses - example below -

www.example.com/apple.htm --is correctly rewritten to --- example.com/apple.htm

however

www.example.com --is rewritten to --- example.com// in Firefox Win XP only - in Safari Win XP and IE Win XP it rewrites fine

Anyone have an idea? Do I sound crazy saying that one browser does it differently since we are discussing code that is run on the server?

Thanks,
Ward

g1smd

10:44 pm on May 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Firstly. Be very sure you want to redirect to the without-www version for your sites. There are some very good reasons to be redirecting to the with-www version instead. There was a very long thread on this subject only a few months ago.

Next, the URLs in the links on your site should point directly to the correct version. When navigating your site you should never encounter a redirect when you click on a link.

The double / is a problem. In httpd.conf be aware that the leading / of the folder and file part of the URL is seen by RewriteRule, but in .htaccess it is not.

Next, be aware that the case of domain names and the TLD is not important. However, folder and file paths are case sensitive on Apache servers. The wrong case usually delivers a 404 error, and i prefer it to work that way.

For IIS you can mix the case any way you like, and the server will send back the content. That is a massive duplicate content issue. Apache is generally immune to that.

If you have pre-existing wrongly-cased links pointing at your site, then a 301 redirect might help you out a bit.

You need to clear your cache each time, otherwise you will not see the right results for the code you just changed.

If $1 contains the / at the beginning of it, then you can do this:
RewriteRule (.*) http://example.com$1

g1smd

10:53 pm on May 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Be aware that the canonical forms for various URLs should be these:

- www.domain.com/ - domain name with trailing / at the end.

- www.domain.com/folder/ - folder with trailing / at the end.

So, yes, you should be redirecting to include the trailing / at the end if it is missing from the request. Use a 301 redirect.

DyeA

11:36 pm on May 7, 2008 (gmt 0)

10+ Year Member



Thanks for the great response!
Firstly. Be very sure you want to redirect to the without-www version for your sites. There are some very good reasons to be redirecting to the with-www version instead. There was a very long thread on this subject only a few months ago.
*** We made the move to non www probably more than a year ago and are registered everywhere with goog as non. I wasnt aware there was any difference. I ran a goog search on this site for "www vs non www" but i came back with a ton of results and couldnt find the specific discussion you were talking about. Do you remember what forum it was in?

Next, the URLs in the links on your site should point directly to the correct version. When navigating your site you should never encounter a redirect when you click on a link.
*** We have no internal www links, we have a couple of meta redirects for print/magazine published shorty urls but these should not be indexed and will be switched to redirectMatch asap. (we are just moving to a server where we can now control and implement 301's) - our problem links are out there all over the internet.

The double / is a problem. In httpd.conf be aware that the leading / of the folder and file part of the URL is seen by RewriteRule, but in .htaccess it is not.
*** Thanks for that distinction, I'll try and file that in my brain. We won't be doing any .htaccess redirects if we can help. As I understand they are the least efficient in terms of server load.

Next, be aware that the case of domain names and the TLD is not important. However, folder and file paths are case sensitive on Apache servers. The wrong case usually delivers a 404 error, and i prefer it to work that way.
***

For IIS you can mix the case any way you like, and the server will send back the content. That is a massive duplicate content issue. Apache is generally immune to that.
***
We were originally on a IIS server and have moved to Linux and are going through a thorough lowercasing effort and comprehensive 301 campaign.

If you have pre-existing wrongly-cased links pointing at your site, then a 301 redirect might help you out a bit.
***
Yes we do.

You need to clear your cache each time, otherwise you will not see the right results for the code you just changed.
***
Thanks, I will. Is there a browser that is best to use (read most reliable at actually clearing its cache when told to?) when implementing these? I would hate to be running in circles thinking I am doing something wrong when actually it's IE not clearing its cache even though I am telling it to.

If $1 contains the / at the beginning of it, then you can do this:
RewriteRule (.*) http://example.com$1
***
I guess I should implement your example above and text that out since I am working in a vhost.conf file - am I mistaken to think that I am only getting a double / in Firefox? Shouldnt that happen in all clients?

Should I implement a redirect for example.com/index.htm to example.com? I know there are links on the net pointing there.

Where is the most server efficient spot to redirect individual pages? In a configuration file with a redirectMatch?

Thanks so much for your help - this has been killing me! You know what would be amazing would be if some elite Javascript coder put together a Apache simulator app - pick your file location, add your rules, punch in your url and see how things are treated - would probably help people get a lot further in this without the running in circles and google searching on and on.

Thanks,
Ward

DyeA

12:04 am on May 8, 2008 (gmt 0)

10+ Year Member



I found a clear cache extension for firefox that puts a button on your toolbar - convenient. Made the changes which eliminate the unnecessary slash between the tld and the $1 backreference in the replacement expression, used websrvmng to mark the vhost for reconfiguration by Plesk and restarted the server and it seems to work just fine!

Could i write something more abstract in my main server config and set my vhosts to inherit and not have to code for each host?

Something like this?

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteCond %{HTTP_HOST} (^www\.)([^.])(.*)
RewriteRule ^(.*)$ [%2%3...] [R=301]

I am assuming the implicit AND after a RewriteCond means that if the first RewriteCond makes no match Apache will skipt down until it passes the RewriteRule?

My other thinking is we get a match for www. and so continue with next RewriteCond, the next one makes three matches one is the www. , next is everything till the . , and third is everthing remaining. The rewrite rule discards the first backreference and using #2 and #3 puts essentially everything else back in to the url and prefixes it with http

How off am I? Lol. This wouldnt work for subdomains because it relies on the sites only having one dot in a correctly written tld - but thats fine for me - there are no subdomains in my sites.

Thanks!
Ward

jdMorgan

12:47 am on May 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This will strip "www." off the fornt of any domain, sub-domain, sub-sub-domain, etc. It will also drop the valid-but-unwanted trailing "." and/or port number from the domain. For example, www.sub.example.com.:80/foo.html is redirected to sub.example.com/foo.html

RewriteEngine on
#
RewriteCond %{HTTP_HOST} ^www\.(([^.]+\.)+([^.]+))
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]

Note the nested parentheses; Count left-parentheses to resolve the back-reference number. The end-anchor on the hostname pattern was omitted intentionally.

Jim

DyeA

1:52 am on May 8, 2008 (gmt 0)

10+ Year Member



Perfect! I'm giving that a try tonite. Do you have any input on the lowercase code I have going? Does the PATH_INFO server var include the filename?

RewriteMap lowercase int:tolower
RewriteCond %{PATH_INFO} [A-Z]
RewriteRule (.*) {lowercase:$1} [R=301]

A couple of questions on protocol - you marked a blank line by commenting it with the # symbol - do I have to do that with blank lines? Can blank lines just have nothing on them? No hash?

Also, you marked the RewriteRule as the last rule to process with that L in the square bracket. If there is only one rule in my file do I need to do that? If there are multiple rules in a file should we be explicit about marking the last one as the last? If there is nothing more to process does it hurt to leave it out?

I am also trying to write a rule to force characters to lowercase wherever they might exist, so I will add that code after this and then mark that rewrite rule as the last [L].

Have you ever come across an Apache simulator, perhaps coded in JavaScript? Seems like that would be the perfect learning tool and a do-able project for someone with the JavaScript chops.

Thanks Jim!

DyeA

2:13 am on May 8, 2008 (gmt 0)

10+ Year Member



Oops one more

Where is the most server efficient spot to redirect individual pages? In a configuration file with a redirectMatch?

Ward

g1smd

6:35 pm on May 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** I couldn't find the specific discussion you were talking about. Do you remember what forum it was in? ***

It was started by Pageoneresults and was called something like No more www.

*** Should I implement a redirect for example.com/index.htm to example.com? ***

Yes, you should redirect for the index file filename, but not just for root. Do it for folders too.

*** clear your cache each time ***

I forgot to mention that I much prefer to use the Live HTTP Headers extension (for Mozilla browsers), or somesuch, to check redirects out.

*** Does the PATH_INFO server var include the filename? ***

Can't remember, but SCRIPT_NAME does (and returns "index.html" for index files, even if you only requested "www.domain.com/" for the URL).

*** last rule to process with that L in the square bracket. If there is only one rule in my file do I need to do that? ***

I put the L on the end of almost every rule.