Forum Moderators: phranque

Message Too Old, No Replies

another simple rewrite question

         

finlander

12:20 am on Nov 28, 2010 (gmt 0)

10+ Year Member



I'm looking for the .htaccess rewrite code that will allow some seemingly simple functions. I understand the basic Apache rewrite, but have not been able to get the https rewrite that simply tacks on 'www' for a page that is normally secure.

So, for the really basic stuff, of course, I want this:

'domain.com' redirects to
'http://www.domain.com'

and

'domain.com/index.php?main_page=faq' redirects to
'http://www.domain.com/index.php?main_page=faq'

but then I also want this for secure pages:

'domain.com/index.php?main_page=login' redirects to
'https://www.domain.com/index.php?main_page=login'

It's easy to see this in action at a website such as sears.com. If you remove the beginning part of the URL, the non-secure pages will add on 'http://www' to a raw non-secure URL and the secure pages will add on 'https://www' to a raw secure URL. Is this a simple thing to do? Oh, forgot to mention, yes of course the secure pages are already set up with SSL and are already served secure when arriving at them through a site link. I'm just trying to figure out how to force rewrite the URLs, correctly, when the front part of the URL is removed.

g1smd

12:36 am on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, it is relatively simple and is a question that is asked at least several times each month.

There's a lot of threads with example code.

You need several sets of redirects: those that force HTTPS when it is needed, and those that force HTTP when HTTPS is not needed. You MUST have both parts of the code installed.

All of the necessary code uses RewriteRule syntax.

finlander

1:25 am on Nov 28, 2010 (gmt 0)

10+ Year Member



okay, thank you, I have been having trouble finding threads, but I found a mention here of redirecting a http request -- for a secure page -- back to https. This is similar for me? Since I want the request for 'domain.com/login' to redirect to 'https://www.domain.com/login'?

So, if that's the case, then I could use something like this, below, but I need to know what the variable is for "any secure page." Is there a variable for 'any secure page' that would be put in place of 'checkout/payment' so that a truncated request (front-end) for any secure page redirects to [domain.com...]

RewriteCond %{SERVER_PORT} !=443
RewriteRule ^(basket/checkout/payment)$ [%{HTTP_HOST}...] [R=301,L]

I would replace the part in bold with a variable for 'any secure page'?

After I get this part of the rewrites done, then I can work on adding in a redirect of non-secure 'http' pages to go to 'http://www' pages.

finlander

1:36 am on Nov 28, 2010 (gmt 0)

10+ Year Member



This experiment with a single page didn't work. I've been searching this forum for hours, but I don't know where to look for threads that talk about redirecting truncated URLs to https when it's needed, and redirecting truncated URLs to http when https is not needed. It's such a simple thing that is in place on all major commerce sites. Could you point me to any threads you might know about? thanks.

g1smd

9:11 am on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have to list the secure page paths within the rule.

You tell the system what does and does not need redirecting.

[google.com...]

finlander

9:35 am on Nov 28, 2010 (gmt 0)

10+ Year Member



(The purpose of this code is to secure pages when URLs are typed in unsecure or when the first part of the URL gets truncated by the shopper in the browser.) This is what I came up with (code below). I run a Zen Cart which serves pages as query_string through the index.php page, so I had to use query_string to identify the pages I want to always be secure. I used some extra code to cover all possibilities for https on those pages, as well as converting all pages all the time to www. Here is an explanation and the code is down below.

    1. example.com/index.php?main_page=login becomes
    [example.com...]
    2. [example.com...] becomes
    [example.com...]
    3. www.example.com/index.php?main_page=login becomes
    [example.com...]
    (this seemingly indestructible URL broke for awhile after I had written some rules that apparently conflicted; when entering a www URL in the browser, for which there was a https rule, the browser was not able to add the http or https to it, and so the page completely broke, and so I had to rewrite the rules to avoid this mistake in the rules)
    4. all non-secure pages get www under all circumstances.
    5. secure pages do not have redundant redirect when form is already correct, i.e., if https is already 'on' and www is already present.


RewriteEngine On
# first, all pages w/o www get www rewrite under all circumstances
RewriteCond %{HTTP_HOST} ^vintage-adventures\.com$ [NC]
RewriteRule ^(.*)$ [%{HTTP_HOST}%{REQUEST_URI}...] [R=301]

# then, certain pages that already have www, but that
# also need to be secure, get the https rewrite
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} ^www\.vintage-adventures\.com$ [NC]
RewriteCond %{QUERY_STRING} login|logoff|account|checkout|contact|address
RewriteRule ^(.*)$ [%{HTTP_HOST}%{REQUEST_URI}...] [R=301]

# then, certain secure pages currently w/ or w/o https, but that
# are also missing www, get www rewrite, and https rewrite if missing
RewriteCond %{HTTP_HOST} ^vintage-adventures\.com$ [NC]
RewriteCond %{QUERY_STRING} login|logoff|account|checkout|contact|address
RewriteRule ^(.*)$ [%{HTTP_HOST}%{REQUEST_URI}...] [R=301,L]


PS: how do you make posts here w/o URLs becoming links (above)? I experimented with the forum 'code' style, but the resulting text was way too small. And if anyone sees a more efficient way of achieving these rewrites in the Zen php system, please let us know! thank you!

finlander

9:36 am on Nov 28, 2010 (gmt 0)

10+ Year Member



thanks, g1smd, I was posting when you were posting. check out what I did and let me know what you think. I'm not a progammer .. teaching myself this stuff.

g1smd

6:21 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hard encode the domain name in the redirect for both of the last two rules.

Delete the very first set of rules, you must avoid a double redirect for non-www requests.

For any request, the user must be redirected to the final URL in a single hop, not in a chain.

You don't need to test HTTP_HOST in any of the rules, but do need to test HTTP on/off in both rules. However, it is usually more reliable to test SERVER_PORT instead of HTTP.

The rules are simply:

1. If "these" paths are requested with HTTP, redirect to https and www, irrespective of www or non-www in the original request.

2. If the path requested is NOT "these" and they have been requested with HTTPS, then redirect to http and www, irrespective of www or non-www in the original request.

The final rules should be the non-www to www canonicalisation rules. These apply only if the page has been requested as non-www but with the correct protocol, and will catch any requests that haven't been caught by the previous rules.


By the way, these pages don't "get a http rewrite". This code is not for a rewrite. These are external 301 redirects.

You must also add the [L] flag to every rule.

wilderness

6:28 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PS: how do you make posts here w/o URLs becoming links (above)?


You do not use active domain names or URLS.
The Forum Charter stipulates the requirement to obscure all such names.

Best practice for a domain name is "example.com".

finlander

8:04 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



thank you, g1smd, everything you have explained makes sense and I have followed your instructions precisely. I am close to being done, but I have not been able to locate the canonicalisation rule for https. Canonicalisation for http is the most common thing in the world, but how to write canonicalisation so that [example.com...] becomes [example.com....] If you can point me to this answer, then I will be able to finish and test everything and reply to this thread if everything is working. Thanks again.
Jim

g1smd

8:44 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well I cheated a bit in the previous answer, to simplify it. You can actually do the canonicalisation within the existing two rules. It just requires a bit of extra thought.

Your two rules should each have a RewriteCond for testing the requested path and/or query string. This will contain a 'list' of matching values. One condition will be looking for a positive match, the other a negative match.

Look at the existing rule that redirects to https. There will be a RewriteCond that says to do this if the request was http. Add another [OR]d RewriteCond that tests for non-www requests.

Look at the rule that redirects to http. There will be a RewriteCond that says to do this if the request was https. Add another [OR]d RewriteCond that tests for non-www requests.

The rules will now look like this:

Redirect to https and www if "request matches 'list' AND (request was for http OR non-www)".

Redirect to http and www if "request does NOT match 'list' AND (request was for https OR non-www)".

In this example, 'list' is your list of pages that must be delivered by HTTPS.


The two rules will be mirror images of each other. One will have a RewriteCond that checks for SERVER_PORT 443 and the other for SERVER_PORT NOT 443. Not is a ! in the code. Likewise one will test that the request matches the 'list', and the other that it does NOT match the 'list'.

The test for non-www requests is best done as
RewriteCond {%HTTP_HOST} !^(www\.example\.com)?$
which means "not EXACTLY 'www.example.com'", as this will also redirect requests with unwanted trailing port numbers too.

Check carefully that ALL combinations for 'list' and 'not list', http and https, non-www and www are catered for.

finlander

9:12 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



thanks, again, I am working on it now. I am having some trouble with the syntax for saying "NOT any of this pipe-separated list" on the query_string condition. The pipe-separated list that says "ANY of this pipe-separated list" works fine, but I have to find the syntax for "not any of these" pipe-separated items. I'm working on it now.

finlander

9:29 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



okay, I am getting ready to test this. Thanks again g1smd for so much help for a newbie. What I have below is the basic form of two rules that each contain an [OR] statement. Perfect, except that I suspect I will have a problem with the query_string NOT statements in the second rule. I am guessing at the syntax for saying "not any of the items in this list of pipe-separated items" and I'm sure I am guessing wrong. Want to chime in?

RewriteEngine On

RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond {%HTTP_HOST} !^(www\.zzz-zzz\.com)?$
RewriteCond %{QUERY_STRING} login|logoff|account|checkout|contact|address|time_out
RewriteRule ^(.*)$ [www\.zzz-zzz\.com...] [R=301,L]

RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond {%HTTP_HOST} !^(www\.zzz-zzz\.com)?$
RewriteCond %{QUERY_STRING} !login|!logoff|!account|!checkout|!contact|!address|!time_out
RewriteRule ^(.*)$ [www\.zzz-zzz\.com...] [R=301,L]

g1smd

9:39 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use example.com to stop forum auto-linking.

The NOT rule pattern is simply:
!(log(in|off)|account|checkout|contact|address|time_out)

but check whether you need to start or end anchor any of it, to make sure it doesn't inadvertently match URLs it should not.

That is, without any anchoring, "account" will also match "accounts", "your-account" and "account-information" etc.

The target URL will be http(s)://www.example.com/$1 - make sure you add the $1 to both.

finlander

9:45 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



ok, cool, I think it's okay, actually good, that 'account' matches all variations, as all variations are also supposed to be secure. Likewise w/ the other pages.

I will adjust the NOT statement you supplied and test it. Earlier today I had some funky thing happening with secure pages serving everything except images. It was caused I'm sure by my wrong syntax in that NOT rule. I'll be back in a little while w/ report of results!
Jim

g1smd

10:00 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have to avoid the https pages linking to images and scripts using http:... in the href.

Links to images should be href="/path/image.ext" and not href="protocol://domain/path/image.ext".

The latter will cause "mixed content" security warnings.

The only way you can have the protocol in the link is if your pages are dynamically generated and you can detect the protocol requested and adjust the protocol in the links on the fly to match that.

finlander

10:07 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



re: images, yeah I've seen that also, that shortest path references rather than full URL is best for images. Thanks for explaining the 'why' of it!

OK, here is the final code, and it works perfectly in all instances where it is supposed to work. Thanks again g1! I could not have done it without you. It seems simple now, but was quite daunting at first. Have a nice Sunday.


RewriteEngine On

RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

g1smd

10:18 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's one final thing to add to the code: clear concise comments as to what each code block does:

# Redirect specific http OR non-www requests to https AND www.

# Redirect all other https OR non-www requests to http AND www.


You need a "reminder" for when you look at the code in several years time and have no idea what it all does.


Mod_rewrite code is incredibly powerful, and is very compact to the point of being obtuse. Once you're clear on the differences between URLs "used out on the web" and filepaths "used inside the server", and between rewrites and redirects, there's some incredibly cool things you can get the server to do. The http/https auto-fix is but one of them.


The best part of this thread is that because YOU did all the work YOU now understand what it does and how it works. This means YOU have a fighting chance of being able to maintain the code should you need to alter it in some way.

Hope you can now also appreciate why the "please tell me the code so I can cut and paste it, I haven't got time to learn this" type of threads are of no use to anyone except the original poster and are not encouraged here. There's not enough forum respondents to offer a global free code-writing service; however this thread is useful for every person with a similar problem. It will cover many of the same thought processes that other potential posters will have.

finlander

11:20 pm on Nov 28, 2010 (gmt 0)

10+ Year Member



Good thoughts, all, g1. Thanks again. Yes, the comments (#) are a good idea even for just me (and anybody), as I will forget things I don't see everyday. Below is a possible final iteration of our work. Feel free to adjust as necessary -- I'm not a coder and don't know all the conventions. I also had to add some code at the beginning to prevent the https areas of the admin area of Zen Cart from being interrupted/fouled by our code -- this was a better solution I think than putting our code in a dozen directories under the 'includes' (shopping cart) directory. I can specify redirects for the admin area in the .htaccess of that directory, if necessary.

RewriteEngine On

# Do not apply following rules to admin area of Zen Cart.
RewriteRule ^(zc_admin) - [L]

# Redirect specific http OR non-www requests to https AND www.
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

# Redirect all other https OR non-www requests to http AND www.
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

Jim

finlander

6:26 am on Nov 29, 2010 (gmt 0)

10+ Year Member



Well, it was too good to be true. The [OR] statement in our second rule, which adds redirect to http and/or adds www (for https requests that don't need to be https or valid http requests that are missing www), is causing the SSL padlock to appear and disappear quickly for all https pages, including when the page does not seemingly take part in any redirection. I've narrowed it down and it's definitely the [OR] statement causing the padlock to disappear. I'll try to write that final condition as a separate rule ... see if separating it, from the SERVER_PORT = 443 condition, will bring back the padlock and still allow all conditions and redirects.

RewriteEngine On

# do not apply following rules
# to admin area of Zen Cart
RewriteRule ^(zc_admin) - [L]

# redirect to https (port 443) and/or add www, when needed,
# for all secure pages in RewriteCond list
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

# redirect to http (port 80) and/or add www, when needed,
# for all pages other than those in RewriteCond list
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

finlander

7:05 am on Nov 29, 2010 (gmt 0)

10+ Year Member



ah ha! (hopefully)
We (I) should have caught this earlier.

The condition for the query_string in the second rule, to NOT contain certain words, has to say 'does NOT contain ALL of these words' but we earlier wrote it as 'does not contain this word OR this word OR this word' so that condition was ringing true when it needs to be false when request is for valid https page. (For example, it's true that the login page does not contain the word 'account' but we don't want the condition to be true because of that; we want the condition to be false when dealing with ANY of those secure pages.) I'll make that change and see what happens.

this is wrong way; I need to make it 'not' any of these pages:
RewriteCond %{QUERY_STRING} !(log(in|off)|account|checkout|contact|address|time_out)

will post soon with results.

Jim

finlander

8:43 am on Nov 29, 2010 (gmt 0)

10+ Year Member



I'm sorry, I can't figure it out. Even when just testing on the login page, with only !login as the third Cond of the second Rule, the padlock still disappears. If I delete the first Cond of the second Rule, then all is well with the padlock, but with the loss of the one goal of changing https to http for pages that don't need https. There is something about everything that is happening with the second rule that bungles things up. I think I give up .. don't know what's going wrong so don't know how to fix it.

finlander

8:53 am on Nov 29, 2010 (gmt 0)

10+ Year Member



for what it's worth, here is a new statement of the goal:

Certain secure pages get https and www if they are missing the 's' or are missing the www or both.

All other pages get http and www if they happen to have an 's' or are missing the www or both.

Padlock in browser bar for any page with https.

g1smd

8:59 am on Nov 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the code
"NOT (this or that)"
is wrong.

You actually need:

NOT this and 
NOT that


so you'll need a RewriteCond for each item, in that rule only.

The "OR" pattern in the other rule is fine.

I wasn't thinking straight.

finlander

9:11 am on Nov 29, 2010 (gmt 0)

10+ Year Member



thanks, g1smd, yes, I went ahead and tested the login page with only the single !login in that Cond, just to see if that was actually the problem, but the padlock still disappeared, so the padlock disappearing seems to still be related to the combination of Cond 1 and Cond 2 in that second Rule. It seems to be something about 443 being true in that second Rule, even though the second Rule should not even get used when testing the login page and www is present. Baffling.

finlander

9:23 am on Nov 29, 2010 (gmt 0)

10+ Year Member



just to be thorough and test it, I made the change for the NOTs in second Rule, but padlock still disappears on bona fide secure pages.

#
RewriteEngine On
#
# do not apply following rules
# to admin area of Zen Cart
RewriteRule ^(zc_admin) - [L]
#
# redirect to https (port 443) and/or add www, when needed,
# for all secure pages in QUERY_STRING list
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]
#
# redirect to http (port 80) and/or add www, when needed,
# for all pages other than those in QUERY_STRING list
RewriteCond %{SERVER_PORT} ^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} !login
RewriteCond %{QUERY_STRING} !logoff
RewriteCond %{QUERY_STRING} !account
RewriteCond %{QUERY_STRING} !checkout
RewriteCond %{QUERY_STRING} !contact
RewriteCond %{QUERY_STRING} !address
RewriteCond %{QUERY_STRING} !time_out
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]
#

g1smd

9:27 am on Nov 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Remove the "OR non-www" and see if it works for some requests without it.

finlander

9:38 am on Nov 29, 2010 (gmt 0)

10+ Year Member



will do ... I'll let you know.

here's a side thought:

Does Apache know that when you use [OR] between Cond1 and Cond2 (with Cond3 present), that you are 'really' saying Cond1 and Cond3 have to both be true 'or' Cond2 and Cond3 have to both be true?

Or, does Apache think you are saying Cond1, by itself, is all that needs to be true 'or' Cond2 and Cond3 together have to both be true?

If Apache thinks the latter, then in our second Rule, the [OR] after Cond1 makes Cond1 satisfy the entire rule, and creates a conflict since the login page was already satisfied in Rule 1.

finlander

9:46 am on Nov 29, 2010 (gmt 0)

10+ Year Member



As expected, I removed [OR] non-www in second rule, and the faq page will change https to http, but will not add in www. So that's as expected, however, the padlock still disappeared on login page. It really seems to be something about that first Cond in second Rule monkeying with something. I'm going break http(s) rules apart from www rules, now that we addressed the multiple NOT issue, and see what happens. I'll post a revision and results in a minute.

finlander

10:00 am on Nov 29, 2010 (gmt 0)

10+ Year Member



I broke Rule 2 apart (code below) and tested and it is exactly the same as when Rule 2 was combined -- total functionality but padlock disappears. I guess that also answers my question about how Apache interprets [OR] prior to two or more Condition lines ... it apparently does just fine in a combined format because I saw no difference in a separated format that would have otherwise occurred.

So, padlock problem still seems to be related to Cond1 in Rule2, even though evaluation of Rule2 should not apply to login page.

RewriteEngine On
#
# do not apply following rules
# to admin area of Zen Cart
RewriteRule ^(zc_admin) - [L]
#
# redirect to https (port 443) and/or add www, when needed,
# for all secure pages in QUERY_STRING list
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTP_HOST} !^(www\.vintage-adventures\.com)?$
RewriteCond %{QUERY_STRING} (log(in|off)|account|checkout|contact|address|time_out)
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]
#
# redirect to http (port 80), when needed,
# for all pages other than those in QUERY_STRING list
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{QUERY_STRING} !login
RewriteCond %{QUERY_STRING} !logoff
RewriteCond %{QUERY_STRING} !account
RewriteCond %{QUERY_STRING} !checkout
RewriteCond %{QUERY_STRING} !contact
RewriteCond %{QUERY_STRING} !address
RewriteCond %{QUERY_STRING} !time_out
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]

# add in www, when needed,
# for all pages other than those in QUERY_STRING list
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{QUERY_STRING} !login
RewriteCond %{QUERY_STRING} !logoff
RewriteCond %{QUERY_STRING} !account
RewriteCond %{QUERY_STRING} !checkout
RewriteCond %{QUERY_STRING} !contact
RewriteCond %{QUERY_STRING} !address
RewriteCond %{QUERY_STRING} !time_out
RewriteRule ^(.*)$ [www\.example\.com...] [R=301,L]
#
This 40 message thread spans 2 pages: 40