Forum Moderators: phranque
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
RewriteRule ^(.*)$ [%1...] [R=301,L]
This would be so URL always ends up in its WWW form.
I already saw somewhere that NC would not be a good choice.
Thanks
{2,6} part? I assume that is for matching the .com part. You shouldn't end anchor host names in case there is an appended port number. Your code does not allow for subdomains other than the www and does not allow for hyphens in the domain name. Does that fit the specification of what you want it to do? As for capitalisation, host names can be upper or lower or mixed case and will still refer to the same resource. Case changes in folder or file names are treated as being for a different resource by the HTTP specification and servers such as Apache which follow those specs. (note that M$ servers such as IIS break this rule).
Using
^(.*)$ is over-specified. The (.*) will suffice. As well as fixing the non-www canonicalisation, don't forget the index canonicalisation too.
RewriteCond %{HTTP_HOST} !^www\. [NC]
This directive is a condition that checks for the presence of the www prefix in the URL. Processing stops here if the URL already contains the www prefix. The [NC] flag renders the string as case-insensitive.
RewriteCond %{HTTP_HOST} ^([^.]+\.[a-z]{2,6})$ [NC]
This directive is a condition that matches the general pattern of a domain name. The regular expression matches any string of valid characters that is followed by a literal dot ( . ) and an alphabetic string containing two to six characters. For example, the common example of a domain name, domain.tld, will be matched by the regex. Likewise, the condition is designed to match any domain name.
RewriteRule ^(.*)$ [%1...] [R=301,L]
This directive is where the actual URL rewriting takes place. Whenever both of the previous conditions prove true, the RewriteRule directs Apache to rewrite the URL such that it includes the www prefix. The ^(.*)$ pattern matches any valid character string proceeding the domain name (and top-level domain). Finally, the [%1...] serves as the pattern for the rewritten URL. The [R=301,L] flag signals that the change is permanent (i.e., 301), and also that this happens to be the last directive in this sequence of Rewrite rules.
In the case of domain I need this for, hyphens are a must.
I see I’ll have to read more about this.
This would match example.com but it would not match example.co.uk if I understand this right.
The pattern
!^www\. is far more simple and to the point (doesn't begin "www.") *** This directive is where the actual URL rewriting takes place. ***
Technically, this is not a rewrite. It is a redirect.
While this code could "work", it only works for certain input formats. If those exactly match what you are doing then you will never see a problem.
Because you must match the end of the hostname, the best way to fix it is to look for these optional parameters specifically. At the same time, we can allow hyphens within the domain and address the ".co.uk"-type hostnames that g1smd mentioned:
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^([a-z0-9][a-z0-9\-]*[a-z0-9]\.(co\.)?[a-z]{2,6})\.?(:[0-9]{1,5})?$ [NC]
RewriteRule (.*) http://www.%1/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^([b]([a[/b]-z0-9][a-z0-9\-]*[a-z0-9][b]\.)+[/b](co\.)?[a-z]{2,6})\.?(:[0-9]{1,5})?$ [NC]
Jim
Now, instead of matching characters, how about putting the domain name into the code?
Something like this:
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com /$1 [L,R=301]
I know I started the thread with universal code example, but now I’m thinking it may be easier for me to grasp it if I put my domain name into it.
From the above, how would the code below look like, if we replace matching characters with real domain name:
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^([a-z0-9][a-z0-9\-]*[a-z0-9]\.(co\.)?[a-z]{2,6})\.?(:[0-9]{1,5})?$ [NC]
RewriteRule (.*) [%1...] [R=301,L]
The point of matching is only to have something universal, right?
Thanks
[edited by: jdMorgan at 7:46 pm (utc) on Oct. 16, 2008]
[edit reason] example.com [/edit]
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]{1,5})$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
# Canonicalize all non-www domain variants
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^([a-z0-9][a-z0-9\-]*[a-z0-9]\.(co\.[a-z]{2}¦[a-z]{2,6}))\.?(:[0-9]{1,5})?$ [NC]
RewriteRule (.*) http://www.%1/$1 [R=301,L]
#
# Canonicalize all www domain variants
RewriteCond %{HTTP_HOST} ^www\.([a-z0-9][a-z0-9\-]*[a-z0-9]\.(co\.[a-z]{2}¦[a-z]{2,6}))(\.¦\.?:[0-9]{1,5})$ [NC]
RewriteRule (.*) http://www.%1/$1 [R=301,L]
Replace the broken pipe "¦" characters in all code above with solid pipes before use; Posting on this forum modifies the pipe characters.
Jim
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]{1,5})$ [NC]
What’s this for:
(\.¦\.?:[0-9]{1,5})$
Also, would the universal code be both non-www and www domain variants? In other words, I put the whole thing into .htaccess, not just one?
I guess that part of my (understanding) problem is not just getting the regex part to my brain, but also having a good idea what falls under those incoming URLs that are defined as non-www and www domain variants.
Thanks
If you didn't do that, then all your content could be indexed both with and without a trailing dot on the hostname like www.example.com/yourfile.html and www.example.com./yourfile.html and again both with and without a port number like www.example.com/yourfile.html and www.example.com:80/yourfile.html for every page of your site.
By allowing for those as "inputs" and then removing them at the same time as you make other fixes to the URL, you eliminate the issue of Duplicate Content indexing for any and all pages of your site.
I use this and works fine for me:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html? [NC]
RewriteRule ^(([^/]*/)*)index\.html?$ [mydomain.com...] [R=301,L]
[Rewrite /index.html --> / (main page or folders)]
RewriteCond %{HTTP_HOST} ^mydomain\.com [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]
[Rewrite mydomain.com --> www.mydomain.com]
I'm not an expert at all. I got this code here in WW.
jdMorgan, g1smd:
Without adding more rules, could I also redirect some wrong links to main page? Links that point, for example, to
www.mydomain.com/default.htm
www.mydomain.com/index.aspx
www.mydomain.com/.
www.mydomain.com/,
www.mydomain.com/%20
www.mydomain.com*/
Also I have links that points to www.mydomain.com/somefolder/page.htm/ (with /) or www.mydomain.com/somefolder/page.htm#*$!(all kind of characters after .htm)
The code is correct, but the code for both are for a redirect. That's what the R=301 bit does. Change your note to say Redirect.
.
You can extend the rule that currently caters for index.html and index.htm and make it work for other names; something like this:
(index¦default)\.(html?¦php[45]?¦[aj]spx?)¦cfm) Yes. You will need one more rule to fix most of the trailing stuff. Several variants have been posted quite a lot recently.