Forum Moderators: phranque

Message Too Old, No Replies

Redirect rules for canonicalization not quite working

htaccess, apache, redirect, 301, canonicalization

         

pyaar

10:51 pm on Aug 15, 2014 (gmt 0)

10+ Year Member



Greetings fine people of the web!

In my .htaccess file I have rules both to redirect a) non-www to www URLs and b) index.html to the root. The non-www to www redirect is successful when visiting example.com. However I notice that when visiting sub-pages such as example.com/page.html, the www is not affixed to the URL as I was expecting.

Secondly, the index.html redirect is failing to redirect to the root. If it matters, I'll mention that this particular domain is a sub-domain/add-on domain to another domain, but the .htaccess file is placed in the root of the subdomain.

My current .htaccess file is below.

RewriteEngine on

# Redirect index in any directory to root of that directory
RewriteRule ^(([^/]+/)*)index\.(php|html?)$ http://www.example.com/$1? [R=301,L]

# Redirect non-canonical hostname to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Appreciate your help! Interestingly, the redirects work completely as intended when testing this code in [an htaccess tester], but not so much on the live site. I also cleared my browser cache to ensure that that wasn't the problem. I am using a shared host so not sure I can use the RewriteLog directive for debugging as I don't see a conf file available for editing.

[edited by: phranque at 10:08 pm (utc) on Aug 27, 2014]
[edit reason] removed link to buggy htaccess tester [/edit]

not2easy

3:05 am on Aug 16, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi pyaar, welcome to the Forums. Your first rule has no condition set. You need to define what activates the rule for the rule to be activated.

pyaar

3:47 am on Aug 16, 2014 (gmt 0)

10+ Year Member



Hi there not2easy, thanks for the welcome! :) Ah yes, I previously had a variant with the condition but wasn't having luck with it, then saw some other examples on the web where it was suggested that the rule alone could handle the redirect, without a condition. But here is the previous snippet I attempted, with the condition included. Maybe there is a flaw here that is preventing the condition from being met?

RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lucy24

3:49 am on Aug 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Edit: Oops, overlapped.
Your first rule has no condition set.

It doesn't need one. It does need the [NS] flag, to keep the rule from deploying in response to mod_dir activity. (In some cases you need something involving {THE_REQUEST} and "index\.xtn", but in most situations [NS] is enough.)

The final ? in the target isn't really needed unless you are much plagued with requests that come with extraneous query strings.

I am using a shared host so not sure I can use the RewriteLog directive for debugging

You are right: the RewriteLog can only be declared in the config file, not in htaccess. But since each rule is a redirect, you should be able to see it in effect simply by looking at your browser's address bar. At most, use an extension like LiveHeaders. Then you can verify that your browser is sending only the requests you want it to send.

the index.html redirect is failing to redirect to the root

Do you mean that it works with inner directories, but not with requests for the root? If the rule is exactly as you've shown it, with pattern
^(([^/]+/)*)index\.(php|html?)$

it ought to work everywhere: you've got a * "zero or more" in the correct place. (If you had used a + "one or more" the rule would only work on inner directories.)

Incidentally you only need the form
\.(php|html?)$

if all three extensions actually occur on your site. Otherwise, people who ask for bogus extensions can get the 404 they deserve.

not2easy

4:07 am on Aug 16, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I saw the [R=301,L] so it looked like it would have needed a rule.

Part of the problem may be that the ? at the end of the target URL in your first rule will clear a query string rather than add it on.

pyaar

1:41 pm on Aug 16, 2014 (gmt 0)

10+ Year Member



Do you mean that it works with inner directories, but not with requests for the root?


Hi Lucy, thanks, on this statement I meant that visiting example.com/index.html does not redirect to www.example.com as desired. It simply stays on example.com/index.html.

I tweeked my code to the following to incorporate the feedback. Still, no luck on the index.html redirect, and example.com/page1.html and other sub-pages are still not redirecting to the www variant. I double-checked using LiveHeaders as suggested, and the only 301 being carried out is when example.com is input, in which case the redirect to www.example.com is working as intended.

RewriteEngine on
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L,NS]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lucy24

8:06 pm on Aug 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the only 301 being carried out is when example.com is input, in which case the redirect to www.example.com is working as intended

OK, I'm mystified, which probably means we're overlooking something obvious.

-- index.html redirect is not taking place at all
-- www redirect ONLY takes place when request is for wrong form of root

Double-check, please: Do you have any other htaccess files in any other directories deeper than the root? And if so, so any of those htaccess files contain RewriteRules? It doesn't matter whether the rules actually apply, only that they exist.

pyaar

9:14 pm on Aug 16, 2014 (gmt 0)

10+ Year Member



Hiya, the only other .htaccess file is at the root of the primary domain, but it is completely blank - no text whatsoever in that one.

And this .htaccess file in question is placed at the root of the sub-domain.

I'm sorry to be such a troublemaker! :)

lucy24

8:30 am on Aug 17, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Uh, wait, what do you mean "root of the subdomain"? What's the physical directory structure?

wilderness

10:51 am on Aug 17, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



example.example.com

pyaar

12:07 pm on Aug 17, 2014 (gmt 0)

10+ Year Member



Hi, yes, the directory structure is as follows:

-public_html
--.htaccess for the primary domain (blank file)
--(other primary domain files)
--example.com (subdomain) folder
---.htaccess (inside the subdomain folder. contains the redirect rules for the subdomain.)
---(other subdomain files)

Hope that clarifies!

lucy24

7:50 pm on Aug 17, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, thanks, that's what I meant. The existence of an empty htaccess file-- assuming it really is empty-- has no effect. I just wanted to make sure there were no other RewriteRules (or even the statement "RewriteEngine on") on the same path. Can I assume there are no other htaccess files containing mod_rewrite further inside the subdirectory?

:: scratching head in continuing puzzlement because I'm still overlooking something obvious ::

example.example.com

Haha. But that's the URL. Server directories-- and hence htaccess files-- operate as if they only had two dimensions. It would be easier if you could hop into a third dimension and have parallel directories for all the subdomains.

pyaar

3:11 am on Aug 18, 2014 (gmt 0)

10+ Year Member



Can I assume there are no other htaccess files containing mod_rewrite further inside the subdirectory?


Correct! No other .htaccess files in existence. :)

You don't suppose an infinite redirect loop could possibly be occurring in the scenario of the index.html redirect do you?

Thanks again for your patience on this Lucy! I am taking solace in the fact that there must be a logical explanation for this, if only I knew what it was, ha.

lucy24

7:02 am on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If there were an infinite (external) redirect loop, you would get an error message from the browser. But you're saying that the redirect simply isn't happening at all, right? No extra activity in LiveHeaders or equivalent.

Of course you have emptied your browser cache, tried the same request with a different browser, etc. Right? Ahem. :)

pyaar

1:55 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



G'morning Lucy, correct, the redirect simply isn't occurring.

Yes, tried after emptying cache and using a different browser. I've even had someone else check out on their own machine in case I was going mad. :) No luck.

I may just open a ticket to my host service to see what zany thoughts they have in mind. They are probably not as brilliant as you but I suppose no harm trying. :)

lucy24

3:39 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the only 301 being carried out is when example.com is input

YES, consult your host. This specific redirect may be happening on the config file level; I know that my host lets you select this option (with or without www) and then your domain name gets canonicalized even if you don't have an htaccess of your own. It should still be redirecting all the time, though, unless the host's rule is simply ineptly worded.

Check one more thing before talking to the host. Put this rule before other RewriteRules (use your own domain name!), but after the "RewriteEngine on" statement:

RewriteRule foobar http://www.example.com/widget.html [R=301,L]


Request anything with "foobar" in the path and you should end up on your 404 page, with address bar saying /widget.html. (This is assuming you don't have any content named ..foobar.. otherwise make up something else.) If this does NOT happen, it means mod_rewrite is not working at all.

pyaar

5:02 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



Thanks. Tried that and got the expected outcome. When my request had foobar in the path, I did get the 404 page with the address bar displaying /widget.html.

pyaar

12:07 pm on Aug 19, 2014 (gmt 0)

10+ Year Member



Ack, going to support was like pulling teeth unfortunately, with a lot of muddled attempts at adding Redirect 301s (at one point support added one Redirect 301 for every sub-page!), but absolutely no improvement. Finally my request was escalated and they indicated they are not familiar with advanced apache rewrite rules and that I should check my code with a good programmer! lol. He said he is certain it is not an issue with the Apache configuration though, because the basic redirect rules are working. And in their view, an index.html redirect to the home was "impossible."

Lucy, I don't suppose I can send you a private mail and have you take a peek at my site? Feel free to say no. I don't want to needlessly burden you as I know it's already taken up some of your time. :)

wilderness

12:41 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



pyaar,
lucy charge 500 Euros an hour ;)

lucy24

3:26 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, wilderness, what a dreadful liar you are. You know very well I've never charged you more than 250 :)

pyaar, it is theoretically possible to use mod_alias (Redirect by that name) in place of any conditionless RewriteRule. Use RedirectMatch to enable Regular Expressions with captures. But this isn't good practice because in htaccess mod_rewrite always executes before mod_alias, so you will end up with things happening in the wrong order at best. At worst, things will get thoroughly messed up.

they are not familiar with advanced apache rewrite rules and that I should check my code with a good programmer! lol. He said he is certain it is not an issue with the Apache configuration though, because the basic redirect rules are working. And in their view, an index.html redirect to the home was "impossible."

Hm, this is the point where we start talking about the "change hosts" option, because that last sentence is manifestly nonsense. But he's correct on one point: we did establish that mod_rewrite is enabled for your site.

Does your htaccess currently contain redirects using mod_alias? If so, we'll need to change and consolidate them as noted above.

pyaar

6:00 pm on Aug 19, 2014 (gmt 0)

10+ Year Member



lol, 250 rupees and you have yourself a deal. :P

Does your htaccess currently contain redirects using mod_alias?


Lucy, naw, my .htaccess does not presently contain any redirects using mod_alias. Actually the rewrite rules I posted earlier represent the entirety of my htaccess file contents - so at present there is nothing else but the lines below (with my domain of course):

RewriteEngine on
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L,NS]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Use RedirectMatch to enable Regular Expressions with captures. But this isn't good practice because in htaccess mod_rewrite always executes before mod_alias, so you will end up with things happening in the wrong order at best. At worst, things will get thoroughly messed up.


That's good to know Lucy. So from that I gather one should preferably not have both mod_rewrite and mod_alias in the htaccess file. In my situation, could I actually replace both of the RewriteRule's with RedirectMatch expressions, so that I am exclusively using mod_alias? Or would the non-www to www redirect not be achievable through a RedirectMatch?

lucy24

10:45 pm on Aug 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



mod_alias (Redirect by that name) doesn't have conditions, and it looks only at the "path" of a request. So if you need to look at any other element-- such as your domain name-- you need mod_rewrite.

If you comment-out the last ruleset-- the RewriteRule with preceding RewriteCond that deals with HTTP_HOST --does the remaining rule (the index.html redirect) still not work?

We did the widget/foobar experiment earlier, right? That confirms one other thing: For RewriteRules to work in htaccess, you need to set the FollowSymLinks option. But normally this will be set in the config file on any site that allows mod_rewrite within htaccess. (Not for Apache reasons, but because if they didn't, the host's technical support would be flooded with questions from people whose RewriteRules didn't work.) So no need to think about that, either.

pyaar

12:23 am on Aug 20, 2014 (gmt 0)

10+ Year Member



Hiya,

After commenting out the final ruleset, the index.html redirect still does not work unfortunately.

I did try just for kicks to add Options +FollowSymlinks before RewriteEngineOn, but it did not produce any change in behavior.

If I can't achieve the redirect, I will probably settle for at least adding link rel canonical to my index.html file and other files. I've lowered my expectations, lol.