Welcome to WebmasterWorld Guest from 3.81.29.226

Forum Moderators: Ocean10000 & phranque

the canonicalization redirect

     
10:18 pm on Oct 24, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


A think piece prompted by a recent offline conversation:

Consider the typical canonicalization redirect on an HTTPS site without subdomains:
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
or possibly
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule ^ https://www.example.com%{REQUEST_URI} [R=301,L]

Anyone remember Flowers for Algernon? (I think the movie adaptation was called Charly.) Early in the book, there's a scene where the main character's bakery co-workers are trying to help him out by showing him how to make rolls, so he can move up from being a janitor. He comes away hopelessly confused because no two men have identical procedures, and he doesn't have the mental equipment to figure out which parts are essential and which are a matter of personal preference.

Looking at this simplest of all redirects, I see a minimum of four variables. (16 possibilities, all perfectly valid for the job at hand.)

#1 You can say off or you can say !on
#2 You can make the hostname optional
#3 You can use the [NC] flag with the hostname
#4 You can capture the request, or you can use %{REQUEST_URI}

What's the poor user to do?

#1 Given a binary toggle, there shouldn't be any alternative to “off” or “on”. (Servers don't have soft bits, do they?) But can it hurt to cover all possibilities?
#2 This may be the easiest choice: in shared hosting, or any situation using a <VirtualHost> envelope where the site in question isn't the catchall, there will never be a request with no hostname.
#3 Human browsers currently flatten hostnames when sending in a request: You can type “ExAmple.com” or “examPLE.com” but your browser will send in a request for “example.com”. Show me a hostname with capitals, and I’ll show you a malign robot. (But I won't be able to show you many: less than 1/10 of 1% of all requests use capitals in the hostname.)
#4 Has anyone ever benchmark-tested the choice between the two options, capture and REQUEST_URI? One way, the server has to capture something which will, 99% of the time, end up not being used. Another way, the server has to ask itself “What was it they were looking for?” (or does it even need to ask?) How enormous would a site need to be before the difference in speed or CPU becomes significant?

Food for thought.
2:33 am on Oct 25, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4558
votes: 363


My guess is that the average person understands why they should use a canonicalization redirect but because it isn't like learning a hard and fast rule that they give up trying to understand what they are doing and just look at enough examples and choose one. If nothing bad happens they don't think they've solved the riddle, they just think, "Whew! Glad that worked!"

Because there is more than one way to write several of the parts of that puzzle, it confuses and confounds. I have seen a number of variations on the 'Rule' part of that exercise expressed as "^(.*)$" or "(.*)" and others, most of which do the same job but there isn't some common knowledge to tell people which they should use or why. The first time I faced the task I was proud to be able to do it. Then I learned that it should be a 301 redirect, not a 302. That was about 20 years ago. I haven't meddled with it a lot. The old thing about if it ain't broke.
1:01 pm on Oct 25, 2019 (gmt 0)

Junior Member from CA 

10+ Year Member Top Contributors Of The Month

joined:Oct 1, 2002
posts:151
votes: 14


This is mine, it seems to be the only way I can get it to validate HSTS preload at [hstspreload.org ]

RewriteCond %{HTTPS} !=on
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=301,L]

RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L,E=HTTPS:1]

RewriteCond %{THE_REQUEST} /index\.(.*) [NC]
RewriteRule ^(.*?)index\.(.*)$ /$1 [R=301,L]
1:59 pm on Oct 25, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3153
votes: 7


... the only way I can get it to validate HSTS preload


Yes, a requirement of the HSTS preload list is that you need to redirect HTTP to HTTPS on the same host first - before any other redirects. So you end up having to have two separate redirects (if you later want to redirect non-www to www or vv).