Forum Moderators: phranque
Here's the rewrite code I'm currently using to rewrite non-www to www, and I see that it addresses http, but it doesn't address https....
# force www
RewriteEngine on
RewriteCond %{HTTP_HOST}!^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
As I search through the forums, I see several discussions that seem to apply, though they're all a bit above my head, and all seem to leave at least some of the important questions unanswered.
I'm looking for a generic set of rewrites (if such a thing is possible) that I can apply to the current site so the developer can go ahead, one which I can also apply to future sites that have both http and https pages.
Here are the most relevant prior threads I've found...
Apache mod_rewrite non-www to www
...for both http AND https (is this the best strategy?)
[webmasterworld.com...]
redirect non-www to www considering both HTTP and HTTPS
best syntax?
[webmasterworld.com...]
mod_rewrite question
[webmasterworld.com...]
[edited by: Robert_Charlton at 7:51 pm (utc) on July 29, 2006]
Options +FollowSymlinks
RewriteEngine on
# Redirect HTTPS non-canonical domain requests
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
#
# Redirect non-HTTPS non-canonical domain request cases
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
I'm looking for a generic set of rewrites (if such a thing is possible) that I can apply to the current site so the developer can go ahead, one which I can also apply to future sites that have both http and https pages.
You gave me what I asked for, code that forces www with both http and https. But in the redirect non-www to www considering both HTTP and HTTPS [webmasterworld.com] thread you say something that caught my eye now that I've struggled through what the rewrites are doing...
My emphasis:
I wouldn't tinker with the SSL side of things; It shouldn't be necessary unless you let robots crawl your SSL stuff.You can prevent your HTTP domain redirect code from messing with the HTTPS side by checking the server port number:
# Setup
RewriteEngine on
# Redirect non-www to www
RewriteCond %{HTTP_HOST} ^example\.com
RewriteCond %{SERVER_PORT}!^443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
As I think about it, I'm not concerned about canonical consistency with the HTTPS stuff at all. To use your wording, what I do want is to "prevent [the] HTTP domain redirect code from messing with the HTTPS side," and otherwise I want to minimize any possible problems (secure icon problems and the like) that might result from rewriting the HTTPS. This suggests to me that the best approach might be an .htaccess version of the code you posted in the above-cited thread (your message #:1520225).
Apart from the fact that it wasn't .htaccess code, one of the things that gave me pause about the approach in that thread was skyflye's last post...
However,
http://example.com/
...is not getting rewritten at all, and I'm not sure why.
Obviously, I would want example.com to get rewritten.
Doing that, you're left with:
# Redirect non-www domain to www
RewriteCond %{HTTP_HOST} ^example\.com
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
# Redirect all non-www domains to www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
Jim
Before using the new rewrite, I pored over the code and I could pretty much understand what every line is doing except one....
RewriteCond %{HTTP_HOST} . Why is this here? What does it do?
My regular expressions syntax guides tell me that a period matches any character except \n. I'm not understanding how that works in the rewrite.
If you're not sure, it's cheap insurance to include it.
Jim
...that is, it's not needed on shared hosting where your server 'account' does not have a unique IP address.
I assume this means that it is needed on shared hosting where my server account does have a unique IP address. If so, I have some .htaccess files to update. Thanks.
And a PS to this... I also assume that since this is a "test for 'not blank,'" this line must come first, after
RewriteEngine on [edited by: Robert_Charlton at 5:50 am (utc) on Aug. 23, 2006]
That is, if you have code with a negative hostname match like
RewriteCond %{HTTP_HOST} [b]!^[/b]www\.example\.com
RewriteRule (.*) http://www.example.com [R=301,L]
Therefore, the code will loop until either the server or the browser reaches its maximum redirection limit, as configured in the server and browser settings, and the access will eventually fail after wasting a lot of CPU time and TCP/IP packets on both ends of the connection.
HTTP/1.0 does not send a Host header when making requests. Without this hostname, it is impossible for HTTP/1.0 to tell a shared-IP-address server which site is to be accessed. Therefore, sites without a unique IP address are not accessible with HTTP/1.0, and this whole issue of the code looping on a blank hostname becomes a non-problem.
HTTP/1.1 added the Host header as a way to support name-based virtual servers. The Host header specifies the name, allowing the server to select among the sites sharing one IP address.
Note that many search engine spiders (and a few other clients) will 'advertise' HTTP/1.0 in their requests, but they *do* send a Host header. These are referred to as "extended HTTP/1.0 clients." That's why you may see HTTP/1.0 requests in the log files of name-based servers that don't have a unique IP address, even though it is technically impossible to access such a server with a true HTTP/1.0 request.
Essentially, that client added hostname support at some time in the past, and is lying about its protocol version for much the same reason that most clients today state that they are "Mozilla/n.n" clients -- These extended-protocol HTTP/1.0 clients didn't want to be rejected by server-side tests requiring a match on the HTTP/1.0 protocol string, and yet did not fully-implement the requirements to claim that they were HTTP/1.1 clients. (We humans have a penchant for creating some really ugly pitfalls for ourselves in the name of expediency).
---
The order or RewriteConds is not important unless the code contains a mixture of RewriteConds with and without [OR] flags. If the [OR] flag is present on all but the last RewriteCond, then all RewriteConds are ORed together, and *any one* that is true will invoke the RewriteRule. If no RewriteConds have an [OR] flag, then all are ANDed together, and *all* must be true in order to invoke the RewriteRule. Only when ANDs and ORs are mixed does order become important. I'll have to defer to basic logic tutorials on this point, though, because explaining operator precedence and Boolean logic is beyond the scope of this forum (and my time). :)
Jim
I was hoping to report that everything's worked out successfully with the https issue, but life is never that simple. It turns out that there's an additional twist that I trust will be resolved next week.
The developer and client had been relying on the shared SSL certificate that comes with the hosting account. If I understood the host's tech support correctly, it turns out that the shared SSL is incompatible with our directive for the rewrite to ignore Port 443, and that we need to get a private SSL certificate. The client will be doing that next week.
I'm not quite sure why the shared SSL is being affected by part of our mod_rewrite (the attempt to rewrite https) and not being affected by another part (to prevent us from rewriting https).
I'll let you know whether the private SSL fixes things. The tech support technician felt that our rewrite should work.