Forum Moderators: phranque
I have searched through the forum and found different ideas on achieving what I'm trying to do but the thought has occurred to me - am I understanding what I'm trying to do (probably not)?
I'm trying to get all calls to redirect to http://www.example.com/
A few months ago, I did a 301 htaccess redirect -
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]
That seems to work well enough when it comes to simply the www part of the URL. But I had thought it would also work with the http://www.example.com/index.php duality as well.
It doesn't appear to be working that way - I seem to have a far higher link popularity with index.php than to root.
Are these being generated by internal site links pointing to the home page, because the internal links do href to index.php? I've no doubt some links from outside might target index.php directly but the number surprises me if that's the case. So do I simply change my internal links to '/' or is a redirect still required?
Could I go for the canonical tag and hope ... or would it be a useful thing to put in anyway even if I change other things?
[edited by: jdMorgan at 8:32 pm (utc) on Jan. 14, 2010]
[edit reason] example.com [/edit]
Fix your links first. Once search engines start requesting "/" instead of "/index.php", you can redirect all further direct client requests from "/index.php" to "/". This redirect *will not* help with your problem unless your links are fixed first.
There is no "duality" here. You are linking to filepath instead of linking to the correct URL. The duality exists only in confusing a URL with a filepath -- They are two very different things, associated by the action of the server, but not at all equivalent.
You also need an [L] flag on your existing rule.
Jim
The existing rule redirects non-www to www. You'll need a separate rule to redirect index requests and that rule also needs to fix the domain name for those requests. The new rule should be placed before the existing rule.
I follow what you mean about duality. I picked up the problem on Google Webmaster Tools when they indicated duplicate title tags - would the linking of the filepath to index.php cause this or have I missed something else?
I didn't use the [L] flag on the redirect rule as the htaccess runs through to other rules, such as blocking some referers and bots so I have the [L] at the end of all of the rules. As I'm not using ifmodules, I thought that's the place it should go as it indicates the last rule. I don't mind if I'm wrong as I'm learning (slowly ... but I'm learning).
Ray
I'll do as Jim says first and redo the internal links - then do the redirect before the existing one in htaccess. I've seen a couple of versions of that redirect, both on here and elsewhere and I've noticed the odd warning about loops - if you don't mind me asking, which version do you advise?
The [L] flag means "stop processing if this rule is invoked" -- and only if the rule is invoked. You've left it out for a rather bad reason.
Put your rules in this order:
Rule-type ordering mnemonic: abcd [EFG] ... [R] ... xyz
In this way, stacked/chained/multiple redirects are avoided, and the filepaths to which URLs have been internally rewritten won't be exposed to clients as URLs.
And all rules end with an [L] flag. :)
Jim
Am I glad that I came on here - I put my hand up to thinking that I knew and as a result, I'm grateful for being lucky enough not to come unstuck - so far ... not a good situation methinks.
I thought the [L] was taken as the last rule whether it was invoked or not - many thanks for putting me wise.
My stacking order needs a look too in that the 301 needs moving - in my defence on that, I found several references putting it as the first rule but I'll certainly follow your guidelines. It makes sense really in that anything getting a 403 won't need to know anyway.
Along those lines - up till now, I've been led to believe that 'ErrorDocument', any <FilesMatch> and 'Deny from ...' statements go prior to 'RewriteEngine on' - then the rules. I might as well get this right once and for all.
I really appreciate you taking the time to come off the original topic and help me on this as well.
I shall, of course, use the 'I put it down to old age' excuse for all that it's worth :-)
Ray
Be aware that Apache modules (e.g. mod_rewrite, mod_access, etc.) each execute in turn, with each handling only the directives that it understands in your .htaccess file. Therefore, the directives only "execute in order" if they belong to the same module. You cannot control the module execution order from within .htaccess, as that is determined by the reverse-ordering of the LoadModule list on Apache 1.x and by an internal priority scheme on Apache 2.x.
Therefore, you cannot view your code as a "linear sequential program" except within/among directives all targeted to the same Apache module.
So, in other words, it makes no difference at all whether you put your "Deny from" directives before or after your RewriteRule directives; Either all "Denys" will execute first, or all RewriteRules will execute first, and nothing you can do in your .htaccess file can change that. This is the main reason that several contributors here recommend *not* mixing mod_alias Redirect and RedirectMatch directives with mod_rewrite RewriteRule directives: The execution order might change after a server upgrade, a 'tweak' by your host, or an elective change in your hosting provider...
The module execution order is not totally arbitrary, and 98% of the time mod_access will execute before mod_alias, followed by mod_rewrite (just to name three). This is because that execution order "makes the most sense to the most server administrators." But the fact is that we do see exceptions here.
Another aspect to consider is that if a redirect is invoked, that terminates the current HTTP transaction, and informs the client that it should start a new one, using the new URL provided in the server's redirect response. It's important to realize that this means that all of your server-config code will be re-executed from the top, with no "memory" whatsoever of the previous transaction. This can also make .htaccess directives appear to execute out-of-order, if you don't realize that you're handling a second HTTP request distinct from the first one.
The domain canonicalization redirect rule should be the last external redirect 99.99% of the time. Why? Well, taking into account the above discussion, imagine that you put the recommended "index.php" canonicalization rule *after* the domain canonicalization rule, and a client requests "example.com/index.php" from your server. With the domain canonicalization rule first, that client will first get redirected to "www.example.com/index.php" and so will issue a second HTTP request for that new URL from your server. This time, since the domain is correct only the second rule will fire, and redirect that same client to "www.example.com/". And then the client will come back using a third HTTP request for the now-fully-canonicalized index page URL (which triggers neither rule), and it will finally get the content that it wanted in the first place.
Now reverse the rule, and let the request for "example.com/index.php" get redirected straight to "www.example.com/" -- One redirect, two HTTP transactions, and 50% faster/less request-handling work/time for the client and your server.
Note that even if you don't put the [L] flag on the first rule in this scenario, the server still has to process two rule instead of only one...
So anyway, that's the explanation of the "most-specific redirects first" part of the recommendations above.
By all means, if you have a question about how a particular directive or flag works, go straight to the source instead of getting "opinions" on some forum somewhere. Apache docs are all online at apache.org, and they are far more correct than most second-hand knowledge (including ours here). :)
I added *just a few* citations to our Apache Forum Charter when I came on board here, and I commend them to you.
Jim
P.S. "...use the ... excuse for all that it's worth."
Note grey chin in profile pic -- You're not the only one!
Many thanks for taking the time to run through the process with me. It's bit of an eye opener on how much I didn't know compared with what I simply assumed. Assumptions made, over the years by lumping together snippets that I gleaned when I wanted to do something specific ... I take your point about experience and this has been a good 'un for me.
I've changed all the internal links to "/" and moved the htaccess around and changed it according to what you and g1 have advised. I'll be putting the index.php redirect on as well.
Once again, many thanks for your help and to g1 too.
Ray
I've used one that you've shown several times -
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]
I've tested it coming into the site by calling 'http://www.example.com(/)', 'http://example.com(/)', 'http://www.example.com/index.php' and 'http://example.com/index.php' and they have all come up showing the required 'http://www.example.com/' in the browser bar.
Testing the internal links has the desired effect as well.
I've used different browsers and also refreshed and cleared history. There was no looping, all pages appeared, no glitches or hang-ups and everything appeared to be working normally.
Is it the case that if it works ... then it works (subject to some change in the future) or is there a situation that I've not thought through and accounted for above?
Ray
Sounds dangerous, so let's be very clear... If you have not changed all of those 'internal links' to point to "/" instead of /index.php, then you are forcing every client that 'clicks' on one of those links to do *two* HTTP requests -- One resulting in a redirect response, and the second actually returning the desired content. Users will be slowed down, search engines will cast a jaundiced eye, and your logs and stats will be severely skewed.
You cannot count on this redirect as a "magic fix" for your own site's linking errors; It will truly "help" only with linking errors on sites that you do not control.
You must link only to "/" from your own pages to avoid trouble. If you cannot do that, then remove the redirect and just live with the ugly URLs.
Jim
I've got my links contained in 'include' files so that all pages call the same link set relative to their level - it was a matter of changing those. The odd exception has also been changed, as has the link to the home page from the forum. I've also changed the internal redirects where they apply to 'home' when logging in and out.
I've simply changed the links to ./ or ../ as that keeps it all in the relative form being used. The linking 'error' was using index.php and up until doing all of this, Google has shown no problems at all with any page or link. Plus, using the site, all links are working OK - I double check links every time something goes in or gets changed. But I'll still run LinkSleuth to treble check.
Nothing appears to have slowed and I've spent quite a time running it through.
I do now appreciate the difference between the internal link aspect and that the htaccess redirect should only affect those coming in from outside links if a change is needed.
If all that has been done helps a bit or a lot in achieving a 'one site view' - instead of bits scattered under different banners - then I'll be pleased. I don't expect miracles but it doesn't hurt to improve things if they can be bettered.
Once again, many thanks for your continuing assistance.
Ray
I've tended to use it ... well, simply because it's more concise and has worked trouble free so far. I have sometimes had a wander through web looking at thoughts on which form to use and found both appear to have their good and bad points.
Again, that is me, perhaps basing my approach through the dogma that Jim refers to and it would be refreshing to know your reasons based on sound experience.