|Every redirect should include the canonical hostname in the target?|
Rather than hijacking some other thread, I thought it'd be a good idea to start a new discussion about this.
On these forums, some have frequently conveyed a "rule" that every redirect should include the canonical hostname in the target. But I think this is a gray area with trade-offs.
The downside of not including the canonical in all redirects is that, in some circumstances, the user would go through an extra hop.
The downside if we do include the canonical in all redirects is that we have to repeat it throughout our htaccess, and ideally we want our code to be DRY [en.wikipedia.org].
I actually don't think either of these downsides is too terribly significant. Yes, there might be some repetition, but htaccess files are typically small, and search-and-replace should be able to deal with this easily. Or yes, there might be an extra hop, but these occurrences should be rare. All your internal links should already be canonical; people's bookmarks are highly likely to be canonical; etc. And not only would the user have to visit through a non-canonical, but the URL they access would have to have other redirects associated with it as well.
I see validity in both options, and I don't think either is necessarily the "right" way.
Personally, I opt to not include the canonical in all redirects. My philosophy that led to this preference is this:
|Write your code first and foremost for humans. Focus on readability and maintainability. Then profile your application to find the bottlenecks. Use that information to optimize smartly. Ideally, your code will be optimized for speed where it matters, and optimized for human maintainability everywhere else. |
But I want to emphasize again that I see validity in both options. The point I want to make is that I think we do a disservice to other users here when we talk about this subject as if there's only one right way. I think we could better educate them if we were to tell them about the options and trade-offs.
|The downside of not including the canonical in all redirects is that, in some circumstances, the user would go through an extra hop. |
please help me understand how ease of configuration and maintenance of the server trumps optimal response time for every visitor requesting a non-canonical path on a non-canonical hostname.
how many intentional extra hops are acceptable?
It largely comes down to how significant the optimization is. Not everyone here may agree -- but hopefully we're all aware -- that the prevailing wisdom is that micro-optimizations are a bad reason to complicate our code. But, as your question implies, where do we draw the line? At what point does an optimization become a micro-optimization?
Sometimes the answer is obvious. For example, if *every* request on a site has an unnecessary redirect hop, then that's an opportunity for a significant optimization, not because of the number of hops, but because of the number of requests it affects.
In this case, we need to know how often a non-canonical URL is requested that also has other redirects associated with it. This number will vary from one site to another, but in a typical case, I would expect it to be minutely small. If, for example, just 1-in-10,000 requests had an extra redirect hop, then I would consider that to be an insignificant optimization, and not a worthy reason to sacrifice maintainability.
Though, I do consider it close enough to the gray area where we each have to decide whether we consider it a worthwhile optimization. That's why I emphasized that my preference is not "the" right way. BUT, conversely, it's just as wrong to tell people that it's a "rule" that you must always include the hostname.
you called it a "rule".
i call it a "best practice".
a best practice is used to prevent surprises down the road when, for example, that "just 1-in-10,000 requests" ends up being googlebot following a valuable link and your content gets indexed on a non-canonical url.
it's about risk management - when you get sloppy or lazy or you simplify your configuration for the benefit of the webmaster rather than thinking of the visitor first, you increase your risk of unintended consequences.
it also means you have to do more testing for unintended consequences and/or reacting to the unwanted results that you missed in test.
this forum is loaded with examples of problems stemming from "starting out on the easy path".
|...for example, that "just 1-in-10,000 requests" ends up being googlebot following a valuable link and your content gets indexed on a non-canonical url. |
Well, actually no, that wouldn't happen. You're confusing "optimization" with "correctness." The user, whether a bot or not, *will* still be redirected to the correct canonical URL.
In fact, focusing on maintainability is really how we can avoid surprises down the road. If our code relies on copy-pasting and search-replacing, then it's easier to make mistakes and to introduce bugs. That's why it's important that we use profiling information so that we can optimize for speed where it truly matters, and optimize for maintainability everywhere else.