Forum Moderators: phranque
Ok, I give up, I have spent hours trying to figure this out, searched this site a ton but can't seem to find my exact situation and I can't afford to have the dedicated virtual server and the 15 sites it's running come crashing down for any extended length of time! If anyone could give any assistance I would bake you cookies, wear glasses with tape in the middle for a week, and vote against my preferred candidate(ok, not quite that one :))
--------------------
My goals
1. Any url prefixed with "www." gets forced to the non-www version.
2. Force all urls to lowercase: there are links to my sites out there on the net with the tld capitalized as well as the file path and file names so I need to detect it everywhere and lowercase it.
--------------------
My attempt at www to non www rewrite:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule (.*) http://example.com/$1
I put this code in a newly created vhost.conf file in var/www/vhosts/example.com/conf/ and then asked the server to reconfigure the webhost thus
usr/local/psa/admin/sbin/websrmng --reconfigure-vhost --vhost-name=example.com
and restarted with
etc/init.d/httpd restart
Everything went well but there was no redirection upon visitng www.example.com in my browser.
Also, I know this is a sloppy way to write this if it did work, because urls that came in as www.example.com would be rewritten as example.com/ (notice the trailing slash) and I wouldn't want that for SEO I don't think, but I don't know how to look for the / properly and only include it if its present and has a file or file path after it.
My attempt at force tld, file path, and file name to lowercase:
I haven't actually tried to implement this yet as I don't want all the sites to come crashing down around my ears. My plan, however was this -
Add this to my newly created var/www/vhosts/example.com/conf/vhost.conf file in a newly created <VirtualHost> container:
RewriteEngine On
RewriteOptions Inherit
This should allow me to write the rules in the main server config and have all the virtual hosts use them.
Adding this
RewriteEngine on
RewriteMap lowercase int:tolower
RewriteCond %{PATH_INFO} [A-Z]
RewriteRule (.*) {lowercase:$1} [R=301]
RewriteCond %{HTTP_HOST} [A-Z]
RewriteRule (.*) {lowercase:${HTTP_HOST}$1} [R=301]
###
Anyway I am in over my head here, any help would be much appreciated!
Thanks,
Ward
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteRule (.*) http://example.com/$1
Is now working, not sure why the delay. My regex is incorrect though, it gives some inappropriate responses - example below -
www.example.com/apple.htm --is correctly rewritten to --- example.com/apple.htm
however
www.example.com --is rewritten to --- example.com// in Firefox Win XP only - in Safari Win XP and IE Win XP it rewrites fine
Anyone have an idea? Do I sound crazy saying that one browser does it differently since we are discussing code that is run on the server?
Thanks,
Ward
Next, the URLs in the links on your site should point directly to the correct version. When navigating your site you should never encounter a redirect when you click on a link.
The double / is a problem. In httpd.conf be aware that the leading / of the folder and file part of the URL is seen by RewriteRule, but in .htaccess it is not.
Next, be aware that the case of domain names and the TLD is not important. However, folder and file paths are case sensitive on Apache servers. The wrong case usually delivers a 404 error, and i prefer it to work that way.
For IIS you can mix the case any way you like, and the server will send back the content. That is a massive duplicate content issue. Apache is generally immune to that.
If you have pre-existing wrongly-cased links pointing at your site, then a 301 redirect might help you out a bit.
You need to clear your cache each time, otherwise you will not see the right results for the code you just changed.
If $1 contains the / at the beginning of it, then you can do this:
RewriteRule (.*) http://example.com$1
- www.domain.com/ - domain name with trailing / at the end.
- www.domain.com/folder/ - folder with trailing / at the end.
So, yes, you should be redirecting to include the trailing / at the end if it is missing from the request. Use a 301 redirect.
Next, the URLs in the links on your site should point directly to the correct version. When navigating your site you should never encounter a redirect when you click on a link.
*** We have no internal www links, we have a couple of meta redirects for print/magazine published shorty urls but these should not be indexed and will be switched to redirectMatch asap. (we are just moving to a server where we can now control and implement 301's) - our problem links are out there all over the internet.
The double / is a problem. In httpd.conf be aware that the leading / of the folder and file part of the URL is seen by RewriteRule, but in .htaccess it is not.
*** Thanks for that distinction, I'll try and file that in my brain. We won't be doing any .htaccess redirects if we can help. As I understand they are the least efficient in terms of server load.
Next, be aware that the case of domain names and the TLD is not important. However, folder and file paths are case sensitive on Apache servers. The wrong case usually delivers a 404 error, and i prefer it to work that way.
***
For IIS you can mix the case any way you like, and the server will send back the content. That is a massive duplicate content issue. Apache is generally immune to that.
***
We were originally on a IIS server and have moved to Linux and are going through a thorough lowercasing effort and comprehensive 301 campaign.
If you have pre-existing wrongly-cased links pointing at your site, then a 301 redirect might help you out a bit.
***
Yes we do.
You need to clear your cache each time, otherwise you will not see the right results for the code you just changed.
***
Thanks, I will. Is there a browser that is best to use (read most reliable at actually clearing its cache when told to?) when implementing these? I would hate to be running in circles thinking I am doing something wrong when actually it's IE not clearing its cache even though I am telling it to.
If $1 contains the / at the beginning of it, then you can do this:
RewriteRule (.*) http://example.com$1
***
I guess I should implement your example above and text that out since I am working in a vhost.conf file - am I mistaken to think that I am only getting a double / in Firefox? Shouldnt that happen in all clients?
Should I implement a redirect for example.com/index.htm to example.com? I know there are links on the net pointing there.
Where is the most server efficient spot to redirect individual pages? In a configuration file with a redirectMatch?
Thanks so much for your help - this has been killing me! You know what would be amazing would be if some elite Javascript coder put together a Apache simulator app - pick your file location, add your rules, punch in your url and see how things are treated - would probably help people get a lot further in this without the running in circles and google searching on and on.
Thanks,
Ward
Could i write something more abstract in my main server config and set my vhosts to inherit and not have to code for each host?
Something like this?
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.
RewriteCond %{HTTP_HOST} (^www\.)([^.])(.*)
RewriteRule ^(.*)$ [%2%3...] [R=301]
I am assuming the implicit AND after a RewriteCond means that if the first RewriteCond makes no match Apache will skipt down until it passes the RewriteRule?
My other thinking is we get a match for www. and so continue with next RewriteCond, the next one makes three matches one is the www. , next is everything till the . , and third is everthing remaining. The rewrite rule discards the first backreference and using #2 and #3 puts essentially everything else back in to the url and prefixes it with http
How off am I? Lol. This wouldnt work for subdomains because it relies on the sites only having one dot in a correctly written tld - but thats fine for me - there are no subdomains in my sites.
Thanks!
Ward
RewriteEngine on
#
RewriteCond %{HTTP_HOST} ^www\.(([^.]+\.)+([^.]+))
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]
Jim
RewriteMap lowercase int:tolower
RewriteCond %{PATH_INFO} [A-Z]
RewriteRule (.*) {lowercase:$1} [R=301]
A couple of questions on protocol - you marked a blank line by commenting it with the # symbol - do I have to do that with blank lines? Can blank lines just have nothing on them? No hash?
Also, you marked the RewriteRule as the last rule to process with that L in the square bracket. If there is only one rule in my file do I need to do that? If there are multiple rules in a file should we be explicit about marking the last one as the last? If there is nothing more to process does it hurt to leave it out?
I am also trying to write a rule to force characters to lowercase wherever they might exist, so I will add that code after this and then mark that rewrite rule as the last [L].
Have you ever come across an Apache simulator, perhaps coded in JavaScript? Seems like that would be the perfect learning tool and a do-able project for someone with the JavaScript chops.
Thanks Jim!
It was started by Pageoneresults and was called something like No more www.
*** Should I implement a redirect for example.com/index.htm to example.com? ***
Yes, you should redirect for the index file filename, but not just for root. Do it for folders too.
*** clear your cache each time ***
I forgot to mention that I much prefer to use the Live HTTP Headers extension (for Mozilla browsers), or somesuch, to check redirects out.
*** Does the PATH_INFO server var include the filename? ***
Can't remember, but SCRIPT_NAME does (and returns "index.html" for index files, even if you only requested "www.domain.com/" for the URL).
*** last rule to process with that L in the square bracket. If there is only one rule in my file do I need to do that? ***
I put the L on the end of almost every rule.