Forum Moderators: phranque

Message Too Old, No Replies

force https on one page only

         

garibaldibiscuit

10:23 am on May 4, 2010 (gmt 0)

10+ Year Member



I am having trouble trying to force use ssl (or not) on entering (or leaving) a checkout page.

I think part of the problem is that all my links are relative throughout the site and are also already being rewritten to something else.

i.e I have rules already like:

RewriteRule ^products/([^/\.]+)/([^/\.]+)/?$ index.php?mc=$1&sc=$2 [L]
RewriteRule ^basket/checkout/delivery?$ checkout_delivery.php [L]
RewriteRule ^basket/checkout/summary?$ checkout_summary.php [L]
RewriteRule ^basket/checkout/payment?$ checkout_payment.php [L]

This means if the checkout_summary.php page form posts to https checkout_payment.php then all other links in the page are now https. So I want users to click a page that might have an https link but be forced back to http if it is not the checkout_payment.php page as this is the only page I want to be https.

I have tried things I have found on the web like:

RewriteCond %{HTTPS} on
RewriteCond %{REQUEST_URI} !^basket/checkout/payment?.*$
RewriteRule ^(.*)$ ht tp://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

but this ends something that doesn't redirect properly. I have tried putting these rules before the first ones but that doesn't work either. anyone have any ideas where I am going wrong?

[edited by: jdMorgan at 12:38 pm (utc) on May 4, 2010]
[edit reason] Corrected title as requested. [/edit]

jdMorgan

1:09 pm on May 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> So I want users to click a page that might have an https link but be forced back to http if it is not the checkout_payment.php page

This is the basic problem, and you should not rely on mod_rewrite to try to "fix it" -- By the time your redirect executes, the "wrong" protocol has already been requested, and the current HTTP request must be terminated with a server redirect response telling the client to ask again at the corrected (SSL) URL for what it wanted. The pages which have links from HTTP to HTTPS, or which have links from HTTPS to HTTP, should be coded so that those links are always correct. This can be easily be done by using complete and canonical links such as <a href="https://www.example.com/basket/checkout/payment"> on the non-SSL pages, for example.

Another problem is the repeated used of the "?" in the patterns above, which indirectly create duplicate content. For example, on the second rule posted above, the effect is to rewrite requests for either "/basket/checkout/delivery" or "/basket/checkout/deliver" to the script at /checkout_delivery.php. This means that the same content will be delivered with a 200-OK response for either requested URL-path -- duplicate content.

This may not have been intentional, but if not, then my advice is to be very careful with mod_rewrite and with the regular-expressions patterns in mod_rewrite code -- A tiny error in the code or in a pattern can have unforeseen effects -- including ranking problems in the search engines, as in this case.

The general solution to prevent problems caused by direct-type-in errors, incorrect bookmarks, and incorrect links on third-party Web sites would look something like this on your site. Note that the redirects must come first.

# Redirect HTTPS requests for non-SSL pages back to HTTP. (Note that shared objects
# such as images on both HTTP and HTTPS pages are excluded from this rule)
RewriteCond %{SERVER_PORT} =443
RewriteCond $1 !^basket/checkout/payment
RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js)$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [R=301,L]
#
# Redirect HTTP requests for SSL checkout page to HTTPS
RewriteCond %{SERVER_PORT} !=443
RewriteRule ^(basket/checkout/payment)$ https://%{HTTP_HOST}/$1 [R=301,L]
#
# Redirect extensionless page requests to remove trailing slash
RewriteRule ^(products/[^/.]+/[^/.]+)/$ http://%{HTTP_HOST}/$1 [R=301,L]
RewriteRule ^(basket/checkout/(delivery|summary|payment))/$ http://%{HTTP_HOST}/$1 [R=301,L]
#
#
# Internally rewrite extensionless page requests to scripts
RewriteRule ^products/([^/.]+)/([^/.]+)$ index.php?mc=$1&sc=$2 [L]
RewriteRule ^basket/checkout/(delivery|summary|payment)$ checkout_$1.php [L]

I assume that you are using this code to support multiple domains, because you are using the HTTP_HOST variable in the RewriteRule substitutions instead of hard-coding the domain, and because there is no domain-canonicalization redirect rule in evidence here. Do be sure to enforce proper domain canonicalization by including that function in each of the redirect rules above, again in order to avoid duplicate content problems.

Jim

garibaldibiscuit

10:00 am on May 6, 2010 (gmt 0)

10+ Year Member



Firstly, thank you very much for your answer.
You have brought up some very interesting points that I didn't realise were problems before.

My reason for using relative links on the web pages and in the htaccess is so I can develop the site on my local computer without every link taking me to the live site.
I have a variable at the beginning of every url which depending on the http_host var it was either:
'/site123' - for my local server
'/~admin17' - for my live temp site
'/' - for the live production site
So it was actually very easy to just a $_SERVER['HTTP_HOST'] var before each of those variable definitions thereby creating absolute links and solving my http/https problem.

I kind of had your code working until I tested adding an 's' to 'http' on a non https page and it forwarded me back to the root as it should. But as I was working on the live temp site in subdirectory /~admin17/ I have tried to alter your code making it:

# Redirect HTTPS requests for non-SSL pages back to HTTP.
RewriteCond %{SERVER_PORT} =443
RewriteCond $1 !^basket/checkout/payment
RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js)$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/~admin17/$1 [R=301,L]

# Redirect HTTP requests for SSL checkout page to HTTPS
RewriteCond %{SERVER_PORT} !=443
RewriteRule ^(basket/checkout/payment)$ https://%{HTTP_HOST}/~admin17/$1 [R=301,L]

# Redirect extensionless page requests to remove trailing slash
RewriteRule ^(products/[^/.]+/[^/.]+)/$ http://%{HTTP_HOST}/~admin17/$1 [R=301,L]
RewriteRule ^(basket/checkout/(delivery|summary|payment))/$ http://%{HTTP_HOST}/~admin17/$1 [R=301,L]

# Internally rewrite extensionless page requests to scripts
RewriteRule ^basket/checkout/(delivery|summary|payment)$ checkout_$1.php [L]


But for some reason now the /~admin17/basket/checkout/payment url displays as /~admin17/checkout_payment.php.
Can you see a reason why this is the case?


Assuming the above code would work, is there a way I can define a variable once in the htaccess and just use that instead of retyping /~admin17 or /site123 or nothing?

BTW, Is it really necessary to create absolute urls (domain-canonicalization - a new term for me) within the htaccess file?

jdMorgan

2:54 pm on May 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) Consider using the RewriteBase directive once at the top of your file to define "/site123", "/~admin17", or "/". See the Apache mod-rewrite documentation for details.

2) To support different hostnames, you can define and use a variable:

# Declare hostname (one of three, comment-out any two of these)
# RewriteRule ^ - [E=HostName:www.example.com]
RewriteRule ^ - [E=HostName:localhhost]
# RewriteRule ^ - [E=HostName:test-domain.com]
#
# Do some redirect using variable hostname defined above
RewriteRule ^foo\.html$ http://%{ENV:HostName}/bar.html [R=301.L]

Do beware of casing on your variable name. If it's referenced with incorrect case, it may not work (it will return a blank value).

3) If you do not enforce domain canonicalization, then you allow duplicate content to be created -- either due to accidental incorrect links created by you or others, or by incorrect links created maliciously by your competitors. While you may find the phrase "duplicate content penalty" all over the Web, the reality is that multiple URLs leading to the same content simply compete with each other for links, ranking, and traffic.

For example, let's take the 'home page' of a typical PHP site. On many servers it is possible to access it using any of these:
example.com/
example.com./
example.com:80/
example.com.:80/
example.com/index.php
example.com./index.php
example.com:80/index.php
example.com.:80/index.php
www.example.com./
www.example.com:80/
www.example.com.:80/
www.example.com/index.php
www.example.com./index.php
www.example.com:80/index.php
www.example.com.:80/index.php

This home page URL starts life with fifteen ready-made and equally-relevant competitors -- Not to mention its actual business competition!

Add in the possibility that arbitrary and bogus query strings could be appended to any or all of those variations, and the number of possible URLs becomes practically infinite.

Remember that search engines index and list URLs -- not sites, not domains, not pages, not files. Only URLs. Except for allowable casing variations in the domain name, even a single character different makes it a different URL.

To clarify, hostnames are case-insensitive, but everything else is case-sensitive. While Microsoft servers are internally case insensitive, Apache servers and search engines are not.

Take control now, and run a very tight ship; Allowing even small leaks can cost you money. I do not even define the DNS for a new site until all protocol, domain, subdomain, FQDN, port number, URL, index-page, and query string canonicalization is already in place... From day one, all non-canonical requests are 301-redirected to the canonical URL. Saves a lot of headaches and "repair work."

Jim