Forum Moderators: phranque

Message Too Old, No Replies

Forcing www for both http and https

updating current http-only mod_rewrite in .htaccess

         

Robert Charlton

7:51 pm on Jul 29, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm hearing from a developer installing a Miva store on a client site that our .htaccess file "is trying rewrite the url from https to http and it is causing the Miva admin from logging in securely."

Here's the rewrite code I'm currently using to rewrite non-www to www, and I see that it addresses http, but it doesn't address https....

# force www
RewriteEngine on
RewriteCond %{HTTP_HOST}!^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

As I search through the forums, I see several discussions that seem to apply, though they're all a bit above my head, and all seem to leave at least some of the important questions unanswered.

I'm looking for a generic set of rewrites (if such a thing is possible) that I can apply to the current site so the developer can go ahead, one which I can also apply to future sites that have both http and https pages.

Here are the most relevant prior threads I've found...

Apache mod_rewrite non-www to www
...for both http AND https (is this the best strategy?)
[webmasterworld.com...]

redirect non-www to www considering both HTTP and HTTPS
best syntax?
[webmasterworld.com...]

mod_rewrite question
[webmasterworld.com...]

[edited by: Robert_Charlton at 7:51 pm (utc) on July 29, 2006]

jdMorgan

3:20 pm on Jul 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the absense of any other discussion, I'll argue the case for simplicity:

Options +FollowSymlinks
RewriteEngine on
# Redirect HTTPS non-canonical domain requests
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
#
# Redirect non-HTTPS non-canonical domain request cases
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim

Robert Charlton

1:55 am on Aug 16, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - Many thanks. As I stickied you, I've put off replying until after I got back from SES and had a chance to check over things. Now that I've looked at the code and re-read the previous threads, I'm wondering whether I've in fact made the most appropriate request....

I'm looking for a generic set of rewrites (if such a thing is possible) that I can apply to the current site so the developer can go ahead, one which I can also apply to future sites that have both http and https pages.

You gave me what I asked for, code that forces www with both http and https. But in the redirect non-www to www considering both HTTP and HTTPS [webmasterworld.com] thread you say something that caught my eye now that I've struggled through what the rewrites are doing...

My emphasis:

I wouldn't tinker with the SSL side of things; It shouldn't be necessary unless you let robots crawl your SSL stuff.

You can prevent your HTTP domain redirect code from messing with the HTTPS side by checking the server port number:

# Setup 
RewriteEngine on
# Redirect non-www to www
RewriteCond %{HTTP_HOST} ^example\.com
RewriteCond %{SERVER_PORT}!^443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

As I think about it, I'm not concerned about canonical consistency with the HTTPS stuff at all. To use your wording, what I do want is to "prevent [the] HTTP domain redirect code from messing with the HTTPS side," and otherwise I want to minimize any possible problems (secure icon problems and the like) that might result from rewriting the HTTPS. This suggests to me that the best approach might be an .htaccess version of the code you posted in the above-cited thread (your message #:1520225).

Apart from the fact that it wasn't .htaccess code, one of the things that gave me pause about the approach in that thread was skyflye's last post...

However,
http://example.com/
...is not getting rewritten at all, and I'm not sure why.

Obviously, I would want example.com to get rewritten.

jdMorgan

1:07 pm on Aug 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only difference between .htaccess code and code for use in httpd.conf is that, in a per-directory (.htaccess) context, the path to the location of the .htaccess file is stripped from the path info that RewriteRule 'sees'. In practical terms, this means you need to remove the leading slash (and any path info to the current directory) from the pattern in the RewriteRule.

Doing that, you're left with:


# Redirect non-www domain to www
RewriteCond %{HTTP_HOST} ^example\.com
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

An alternative form of the code that will redirect all non-www domain requests as opposed to just redirecting example.com, would be:

# Redirect all non-www domains to www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Because there are many ways to configure (and to misconfigure) servers, all I can say is that if that code is executed for http requests to example.com, it will redirect to www.example.com. But for practical reasons, it's up to those who wish to use this code to fully test it and modify it to suit; All I can really do is to point the way to the documentation and provide an example.

Jim

Robert Charlton

8:11 am on Aug 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - Thanks. I uploaded the new .htaccess tonight and tested it every way I could imagine, and it works perfectly. It will now be up to the developer to test that it leaves his https alone.

Before using the new rewrite, I pored over the code and I could pretty much understand what every line is doing except one....

RewriteCond %{HTTP_HOST} . 

Why is this here? What does it do?

My regular expressions syntax guides tell me that a period matches any character except \n. I'm not understanding how that works in the rewrite.

jdMorgan

12:03 pm on Aug 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It prevents an infinite loop for HTTP/1.0 requests, where the HTTP_HOST will be blank, and therefore not match the canonical domain.

Jim

Robert Charlton

10:18 pm on Aug 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It prevents an infinite loop for HTTP/1.0 requests, where the HTTP_HOST will be blank, and therefore not match the canonical domain.

Thanks for that extra bit of info... very good to know. Is that something that's wise to add as a precaution to all rewrites involving HTTP_HOST?

jdMorgan

10:29 pm on Aug 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The test for "not blanK" must be included in rewrite code that uses a negative hostname match on any server that is accessible via HTTP/1.0. So, it's not needed on any purely-name-based virtual server -- that is, it's not needed on shared hosting where your server 'account' does not have a unique IP address.

If you're not sure, it's cheap insurance to include it.

Jim

Robert Charlton

5:45 am on Aug 23, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



...that is, it's not needed on shared hosting where your server 'account' does not have a unique IP address.

I assume this means that it is needed on shared hosting where my server account does have a unique IP address. If so, I have some .htaccess files to update. Thanks.

And a PS to this... I also assume that since this is a "test for 'not blank,'" this line must come first, after

RewriteEngine on

[edited by: Robert_Charlton at 5:50 am (utc) on Aug. 23, 2006]

jdMorgan

12:45 pm on Aug 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your site is accessible by typing "http://192.168.0.1/" in the browser address bar, where 192.168.0.1 is changed to be your actual IP address, and no other path info (such as "~account_name") is required in order to successfully acces the site, then the RewriteCond test for blank is required if a negative match on HTTP_HOST RewriteCond is also present in that code section.

That is, if you have code with a negative hostname match like


RewriteCond %{HTTP_HOST} [b]!^[/b]www\.example\.com
RewriteRule (.*) http://www.example.com [R=301,L]

and this code is executed in response to a true HTTP/1.0 request which does not include a hostname header, then a redirect will be invoked, because a blank hostname does not equal "www.example.com". After the redirect, the client will issue a new request, again using HTTP/1.0, and the redirect will be invoked again, because the redirected HTTP/1.0 request will still not contain a hostname header.

Therefore, the code will loop until either the server or the browser reaches its maximum redirection limit, as configured in the server and browser settings, and the access will eventually fail after wasting a lot of CPU time and TCP/IP packets on both ends of the connection.

HTTP/1.0 does not send a Host header when making requests. Without this hostname, it is impossible for HTTP/1.0 to tell a shared-IP-address server which site is to be accessed. Therefore, sites without a unique IP address are not accessible with HTTP/1.0, and this whole issue of the code looping on a blank hostname becomes a non-problem.

HTTP/1.1 added the Host header as a way to support name-based virtual servers. The Host header specifies the name, allowing the server to select among the sites sharing one IP address.

Note that many search engine spiders (and a few other clients) will 'advertise' HTTP/1.0 in their requests, but they *do* send a Host header. These are referred to as "extended HTTP/1.0 clients." That's why you may see HTTP/1.0 requests in the log files of name-based servers that don't have a unique IP address, even though it is technically impossible to access such a server with a true HTTP/1.0 request.

Essentially, that client added hostname support at some time in the past, and is lying about its protocol version for much the same reason that most clients today state that they are "Mozilla/n.n" clients -- These extended-protocol HTTP/1.0 clients didn't want to be rejected by server-side tests requiring a match on the HTTP/1.0 protocol string, and yet did not fully-implement the requirements to claim that they were HTTP/1.1 clients. (We humans have a penchant for creating some really ugly pitfalls for ourselves in the name of expediency).

---

The order or RewriteConds is not important unless the code contains a mixture of RewriteConds with and without [OR] flags. If the [OR] flag is present on all but the last RewriteCond, then all RewriteConds are ORed together, and *any one* that is true will invoke the RewriteRule. If no RewriteConds have an [OR] flag, then all are ANDed together, and *all* must be true in order to invoke the RewriteRule. Only when ANDs and ORs are mixed does order become important. I'll have to defer to basic logic tutorials on this point, though, because explaining operator precedence and Boolean logic is beyond the scope of this forum (and my time). :)

Jim

Robert Charlton

7:00 am on Sep 4, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - Thank you for your very thorough reply. Setting up a feedback loop on a server has been one of my great fears in playing with mod_rewrite, so I'm glad to have this bit of understanding.

I was hoping to report that everything's worked out successfully with the https issue, but life is never that simple. It turns out that there's an additional twist that I trust will be resolved next week.

The developer and client had been relying on the shared SSL certificate that comes with the hosting account. If I understood the host's tech support correctly, it turns out that the shared SSL is incompatible with our directive for the rewrite to ignore Port 443, and that we need to get a private SSL certificate. The client will be doing that next week.

I'm not quite sure why the shared SSL is being affected by part of our mod_rewrite (the attempt to rewrite https) and not being affected by another part (to prevent us from rewriting https).

I'll let you know whether the private SSL fixes things. The tech support technician felt that our rewrite should work.