Welcome to WebmasterWorld Guest from 3.81.29.226

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Rewrite to redirect HTTPS to HTTP

Problem trying to negate a regexp

     
8:57 pm on Mar 26, 2015 (gmt 0)

New User

joined:Mar 26, 2015
posts: 4
votes: 0


I have a surprisingly simple problem (as in, it should be simple, but it's a problem!). I want to ensure all access to pages in /admin, /user or /system directories are redirected to HTTPS if the request is to HTTP :

# Now redirect HTTP pages we want protected to SSL...
RewriteCond %{HTTPS} off
RewriteRule ^(admin/|user/|system/) https://%{HTTP_HOST}%{REQUEST_URI} [QSA,R=301,L]

This seems to work fine. The problem comes when I try to add a rule for the opposite - i.e. any path NOT starting with one of those should get sent back to HTTP if the request was HTTPS :

# ... and return any others to HTTP
RewriteCond %{HTTPS} on
RewriteRule ^!(admin/|user/|system/) http://%{HTTP_HOST}%{REQUEST_URI} [QSA,R=301,L]

This negation doesn't work. Other pages requested with HTTPS stay there.

I don't have a lot of of experience negating Regexps - am I missing something obvious? Anyone know why this doesn't seem to work, or able to suggest an alternative?
10:19 pm on Mar 31, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11867
votes: 244


welcome to WebmasterWorld, cbfuk!


RewriteCond %{HTTPS} on
RewriteRule ^!(admin/|user/|system/) http://%{HTTP_HOST}%{REQUEST_URI} [QSA,R=301,L]


the Pattern of a RewriteRule is used to match the request, but in order to exclude a request you must use RewriteCond.
the [QSA] flag is unnecessary unless you are specifying a query string in your RewriteRule Target.
you should specify the canonical hostname in the RewriteRule Target instead of %{HTTP_HOST}.
also note that the HTTPS apache variable is only available when mod_ssl is active, so in that second ruleset it might be more robust to test SERVER_PORT.
RewriteCond %{SERVER_PORT} !=80
RewriteCond %{REQUEST_URI} !^/(admin|user|system)/
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


what are you doing about a general hostname canonicalization redirect ruleset?
1:49 am on Apr 10, 2015 (gmt 0)

New User

joined:Mar 26, 2015
posts: 4
votes: 0


@phranque - Hey many thanks, I'd almost given up on anyone answering. Fantastic, I'll try it that way - so what you're saying is that negation within a RewriteRule match just doesn't work? That seems a bit odd, but I'll have a think about it over a coffee - I'm sure there must be a good logical reason for separating negation into RewriteConds, I just haven't realised what it is!

Yeh, I run the server so %{HTTPS} is OK.
I use %{HTTP_HOST} just to save typing - the same basic .htaccess file is copied when creating a new vhost so that saves editing.

what are you doing about a general hostname canonicalization redirect ruleset?

Yep, earlier rules enforce canonical redirects to www.example.com for base domain requests as well as handling any domain aliases, so by this stage I know what HTTP_HOST should be.
3:06 am on Apr 10, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15928
votes: 884


what you're saying is that negation within a RewriteRule match just doesn't work?

It isnt that negation in the body of a rule can't work. It's that it very seldom does what you want. For example
RewriteRule ^!(admin|user|system)/
doesn't mean "a request for any pages outside these directories"; it means "a request for any content at all that doesn't match this pattern". If, on your site, all https content is in these directories, and all non-https content is elsewhere, it might work ... except that your RewriteRule presumably involves a capture. It is not possible to capture a negative.

Yep, earlier rules enforce canonical redirects
Why earlier? Domain name canonicalization is generally your very last external redirect. Sometimes it can be combined with an http/https redirect, but only if your whole site uses the same protocol.

Get rid of those {REQUEST_URI} and {HTTP_HOST} elements in the target. The host at this point is not whatever the user happened to type in; it's one and only one acceptable form. (That's why your domain-name canonicalization redirect typically comes last. It's only for requests that have not already been picked up in the course of other rules.) And there's no reason for {REQUEST_URI} when you can just capture.


I'd almost given up on anyone answering

If someone happens to put up a new post just as the likeliest responders are refreshing their Unread Posts window, it might happen that the brand-new post falls on the "already read" side and thus goes unnoticed. I've got a lurking suspicion that phranque (specifically) comes through now and then and does an Unanswered Messages request, which is everyone's second chance ;)
6:19 am on Apr 10, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11867
votes: 244


I use %{HTTP_HOST} just to save typing - the same basic .htaccess file is copied when creating a new vhost so that saves editing.

%{HTTP_HOST} is whatever was requested which isn't necessarily the canonical hostname.

Get rid of those {REQUEST_URI} and {HTTP_HOST} elements in the target.

what she said...
2:55 am on Apr 12, 2015 (gmt 0)

New User

joined:Mar 26, 2015
posts: 4
votes: 0


%{HTTP_HOST} is whatever was requested which isn't necessarily the canonical hostname.

Yep, understood - which is why I do my canonicalisation (is that a word?!) early on so that later I can assume HTTP_HOST on anything that gets through is what I expect - as say, saves detailed editing deeper inside .htacesses. Or am I missing something important there?
3:03 am on Apr 12, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11867
votes: 244


I do my canonicalisation (is that a word?!) early on so that later I can assume HTTP_HOST on anything that gets through is what I expect

explain "early on".
it sounds like you are okay with chained redirects.
ideally you should redirect the first request to the canonical url instead of redirecting the first request to the canonical hostname and then possibly fixing other url issues in subesequent response(s).
6:35 am on Apr 12, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15928
votes: 884


Or am I missing something important there?

Only that fact that this will lead to multiple (i.e. more than one) redirects. Search engines absolutely hate this-- and so do humans with slowish connections. Satellite, for example, which people only use if nothing else is available where they live.

Besides, most redirects-- including the ones you were asking about-- are site-specific. So they'll have to be typed in one at a time anyway. Redirects that are the same on all domains can pretty well be counted on the fingers of one hand.

Changing HTTP_HOST to www.example.com should not be time-consuming. It's a single global replace. I recommend something like
http://%{HTTP_HOST}/
>>
http://www.example.com/
(a form that would only occur in a RewriteRule target) so you don't accidentally change any RewriteConds at the same time.
2:04 am on Apr 13, 2015 (gmt 0)

New User

joined:Mar 26, 2015
posts: 4
votes: 0


Thanks for that, I see where you mean about the multiple redirects. I'll review for that. Cheers guys.