Forum Moderators: phranque

Message Too Old, No Replies

Assistance with a Rewrite for Specific Directory

         

RandallK

4:21 pm on Oct 7, 2010 (gmt 0)

10+ Year Member



My website is www.example.com, but my SSL certificate is for example.com.

Normally, I'd use something like the following to make sure everyone is accessing the site through www:

RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]


My shopping cart is the only place where the SSL cert is used, and all the URLs one takes during the checkout process contain: /cgi-bin/sc/

Can I make some sort of rewrite rule that will rewrite all URLs unless they contain /cgi-bin/sc/?

Thanks in advance for any advice.

jdMorgan

5:03 pm on Oct 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it "contain" or "start with" /cgi-bin/sc/ ?

When possible, be precise: This can make a big difference in the efficiency of the rules.

Jim

[edited by: jdMorgan at 3:16 pm (utc) on Oct 11, 2010]

sublime1

6:49 pm on Oct 7, 2010 (gmt 0)

10+ Year Member



Ooh, that's not pretty. I had some luck once getting Verisign to switch a cert for a customer that had this issue. Granted, I needed to reconfigure the server with the new cert, and it was a pain, etc. Another option would be to get a wildcard cert ... though: more money, of course.

Having two different variants of URLs is bound to get confusing -- you really don't want to have two separate URL structures on your site, it's confusing in many ways, and unless you're scrupulous, you'll end up needlessly redirecting users, loading your servers and angering the gods of Google.

Another option would be to go the other way and canonicalize on the non-www variant of the domain name.

I think consistency is the more important objective.

And if none of this makes you reconsider, and assuming you mean "start with", not "contain", one way would be to write two rewrites, that say "if the path doesn't start with /cgi-bin/sc/ and the host is example.com (two RewriteCond statments), redirect to www with a 301." Then, the opposite, "if path does start with /cgi-bin/sc/ and the host is www.example.come, redirect to example.com with a 301.

But even just writing this gives me a queazy feeling :-)

Tom

RandallK

10:43 pm on Oct 8, 2010 (gmt 0)

10+ Year Member



Little backstory (skip down if you just want to assist with the code):
Yes, the example cert WAS supposed to have www. There was a misunderstanding and it was purchased without the www. A year went by, it was time to renew. I spoke to my hosting company that installed it to see if they would help me transition it to 'www'.

I was told that "It takes up to 5 business days" to install the cert, and they wouldn't give me a more exact time frame, or contact me before they made the switch so I could change where my resources were coming from (www vs without). So I couldn't risk having an unsecured cart if they did it a time I wasn't sitting in front of the computer. Quite frustrating. In the interest of not losing customers, I kept the same name on the cert for another year.

I am making heavy use of canonical already, but some pages still slip through the cracks, and as much as Google SAYS they respect the canonical element... it seems to take some time for it to straighten itself out.

Back to the problem at hand.

There are in fact only two pages on the whole site that use example; Step One, the actual cart, still is in http, it only switches to https once you hit the 'checkout now' button.

Step Two is entering billing info at:
[example.com...]

Step Three is reviewing the order at:
[example.com...]

And Step Four is on the same page, where the order is actually processed:
[example.com...]

I've tried to be as exact as possible. Thank you very much for your assistance.

sublime1

1:35 am on Oct 9, 2010 (gmt 0)

10+ Year Member



I feel your pain.

So could it be as simple as this? No internal rewriting, no messing with query strings, just these cases you note?


# normal canonicalization for non cart cases
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteCond %{REQUEST_URI} !^/cgi-bin/sc/(billing|thankyou)\.cgi
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

# Special case anti-www-canonicalization for cart
RewriteCond %{HTTP_HOST} ^www.example.com [NC]
RewriteCond %{REQUEST_URI} ^/cgi-bin/sc/(billing|thankyou)\.cgi
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]


My assumptions:
  • Second RewriteCond will match the thankyou.cgi with and without parameters. I am sure there's a more efficient way to get at everything before the query string (maybe ${PATH_INFO}) in which case you could use the "=" operator. The parameters, if any, should get passed along in the actual RewriteRule without further effort.
  • I think it's correct, regardless of .htaccess or server-config contexts, that the REQUEST_URL or PATH_INFO values will start with a /, but I could be wrong.
  • I am assuming there are some other cases than billing.cgi and thankyou.cgi that start with /cgi-bin/sc/ -- if not, then the pattern could be simplified.


Give it a try (back up any working .htaccess first!) and see if this is close.

Tom

g1smd

5:48 pm on Oct 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's no issue with having example.com pages as HTTPS and www.example.com pages as HTTP just as long as you always link to the correct version (protocol and domain name included) from within the site. That is, whatever link a user clicks within the site, they immediately access the correct URL. Clicking any internal link should NEVER result in a user being redirected to a different protocol or URL. Additionally ALL links to images, scripts and stylesheets must be coded as root relative links beginning with a slash and must not include the protocol or domain name. This latter point avoids users continually seeing "mixed protocol" security warning pop-ups all over your site.

RandallK

9:46 pm on Oct 9, 2010 (gmt 0)

10+ Year Member



Thanks sublime1. I will definitely give that a try. Anyone else have an opinion?

g1smd, if someone, at some point, on some site, were to say, "Hey! Check out the great stuff on <a href="example.com">example.com</a>. Wouldn't that lead to Googlebot ended up indexing your entire site as the non www version?

Or am I misunderstanding your point?

g1smd

10:27 pm on Oct 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need to make sure that internal links on your site point to the right protocol and URL. That way it is not possible for anything on your site to suggest an incorrect URL.

Next, you install the various redirects to http and to https so that if an external site tries to suggest a non-canonical version of the URL, the redirect makes the user's browser make a new request for the correct URL.

What I was saying earlier is there is no problem for your non-secure pages to be http://www.example.com/.... and your secure pages to be [example.com...] BUT you must get both the internal linking and the non-canonical redirects all correct too.

sublime1

1:45 am on Oct 10, 2010 (gmt 0)

10+ Year Member



RandallK --

The fundamental concern (not a problem, just a concern) is that your site does not have a canonical domain name -- in most cases it'll be www and in a few it will be non-www. If the rewrite rules I provided are correct, your example, where someone enters
http://example.com
would match the first two rewrite conditions (yes: host is example.com, yes, path is not the cgi-bin/sc/...) and send an HTTP 301 response back to the browser.

A 301 doesn't contain the actual page, but instead contains the new location that the browser should use to make a second request -- in this case a request to the document root but using the domain
http://www.example.com
and path
/
. Likewise, if a search engine finds a link having the "wrong" variant of the domain name for any given path, the 301 will cause the same behavior, and search engines know to update their links such that what they present in results is consistent and as directed.

As g1smd points out, it is important that all of the URLs (to pages) on your site are produced having the correct fully qualified domain name (either
http://www.example.com
or
http://example.com
when used for the special cart pages). In the case of static resources like images, css, javascript, etc. that are used in both contexts, your pages should generate root relative paths, such as
/css/styles.css
-- the browser will know to use whatever protocol and domain name was used to request the page. These pages are not (or should not, at least) be indexed by search engines, so do not need to be canonicalized -- it's OK if some have the https version and others have the http version of the URL.

Make sense?

Tom

jdMorgan

3:48 pm on Oct 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some tweaks to improve efficiency and robustness:

# normal http www canonicalization for non-cart cases
# (and excluding common included objects to prevent mixed-content warnings)
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [OR]
RewriteCond %{SERVER_PORT} =443
RewriteCond %{REQUEST_URI} !^/cgi-bin/sc/(billing|thankyou)\.cgi$
RewriteCond %{REQUEST_URI} !\.(gif|jpe?g|png|ico|css|js)$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# Special case https non-www canonicalization for cart
RewriteCond %{HTTP_HOST} !^(example\.com)?$ [OR]
RewriteCond %{SERVER_PORT} !=443
RewriteRule ^(cgi-bin/sc/(billing|thankyou)\.cgi)$ https://example.com/$1 [R=301,L]

Not perfect, but pretty close if all internal links are correct.

Note that these two rules will only go into the same .htaccess file if both SSL and non-SSL requests resolve to that same .htaccess file. Otherwise, the first rule goes in your https .htaccess file and the second rule goes in your http .htaccess file.

Jim

RandallK

6:52 pm on Oct 16, 2010 (gmt 0)

10+ Year Member



Thanks to everyone for your assistance, and jdMorgan I installed it and it looks like it is working like a charm. Thank you so much!

RandallK

9:03 pm on Oct 29, 2010 (gmt 0)

10+ Year Member



I meant to ask this at the time... since I am trying to learn how/why this works... can you explain the PORT 443 lines. What they do, and why to use them?

If not, no big deal, just curious and trying to learn!

jdMorgan

9:10 pm on Oct 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Testing for port 443 detects HTTPS requests without relying on a non-native server variable.

Testing for NOT port 443 detects non-HTTPS requests, again without relying on a non-native server variable.

You will also find code posted on-line Testing the variable %{HTTPS} for a value of "on", but that variable must be set by a module outside of the Apache core. So the %{SERVER_PORT} testing method is just a bit more robust.

Jim