Forum Moderators: phranque

Message Too Old, No Replies

Removing www doesn't work on requests for a dir w/no trailing slash

www.example.com/dir/ works, www.example.com/dir doesn't

         

MichaelBluejay

5:52 am on Nov 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm trying to remove the www from requests to directory, via the .htaccess file in that directory, but it doesn't work if the request doesn't have a trailing slash.

Requests for www.example.com/dir/ are properly changed to example.com/dir/.

But requests for www.example.com/dir get changed to example.com/dir//home/username/example.com/dir

Here's my code in /home/user/example.com/dir/.htaccess:

RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/dir/$1 [R=permanent,L]

I know I could inherit the rule from the .htaccess at the root level, but there are lots of rules in the main .htaccess file that I don't want to run one level down.

Any ideas?

jdMorgan

3:51 pm on Nov 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It looks like you may not have a choice, as it appears that an internal rewrite in the root is executing, then your external redirect kicks in, and so the internal filepath is being exposed as a URL.

At the least, you need *all* external redirects to execute before *any* internal rewrites, and this may dictate the necessity of doing all domain and URL canonicalization in the root .htaccess (or even better, at the per-server config level).

When viewed over-all, with all subdirectory .htaccess files taken into consideration, you want all external redirects first, in order from most-specific to least-specific, followed by any internal rewrites, again in order from most- to least-specific. This prevents unintended operation and avoids exposing internally-rewritten filepaths to the client as URLs.

Also, note that the [NC] in your RewriteCond should be removed, so that hostnames with casing errors are also redirected. However, this will necessitate the addition of another RewriteCond, in case '/dir' is already present in the URL:


RewriteCond $1 !^dir

Note also that in cases like this one, having MultiViews enabled can cause trouble. If you don't use content-negotiation, then disable MultiViews -- See the core "Options" directive.

Jim

MichaelBluejay

8:10 am on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, jdMorgan, though I'm afraid I understood very little of what you said.

First, I tried to just use "RewriteOptions inherit" in /dir/.htaccess, but then requests for "www.example.com/dir/page.html" get redirected to "example.com/page.html". (The "www" is removed successfully, but the directory is improperly removed from the path.) My code in example.com/.htaccess is:

RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [R=permanent,L]

I thought maybe I needed to add a RewriteBase to /example.com/dir/.htaccess, but none of the following solved the problem, whether RewriteBase was above or below RewriteRules inherit:

RewriteBase dir
RewriteBase /dir
RewriteBase dir/
RewriteBase /dir/

I thought maybe I could solve the original problem if I had dir/.htaccess add the missing slash, but when I added the code to do that, I had the original problem (the internal filepath winds up in the address bar).

I'm willing to pursue either adding the missing slash so the www-removal code can work properly, or inheriting the main .htaccess rules, if I can get either to work. How do you suggest I proceed?

TheMadScientist

12:28 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have you tried the following:

1.) Remove the Inherit option from the sub dir.
2.) Place the following in the root .htaccess:

RewriteEngine on
RewriteRule ^dir - [L]

3.) Place all your rules, including canonicalization in the /dir/.htaccess.

* For this to work the way I'm thinking, you need to have all your rules (including the internal rewrites) for the sub-dir in the sub-dir .htaccess and the files you would like to access (including via rewrite) in the sub-dir, so it might be you need to move a couple of files around if you are rewriting to the root. (I don't know how practical this is for you, but if you isolate the directory and the files it requires to operate in the sub-dir you should be able to eliminate the sub-dir from the main .htaccess and have the desired result. IOW: It's not quite ideal, but could be a fairly easy fix if you don't understand (or can't figure out) exactly how to fix it any other way.)

jdMorgan

1:16 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a comment... There is nothing seriously wrong with this/your code, and it should work just fine in example.com/.htaccess:

RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [R=permanent,L]

The only real-but-minor error here is the use of [NC] on the hostname pattern. Any other differences between this code and the code that I have deployed on hundreds of servers is down to 'style' and has nothing to do with effect.

Your code for use in the subdirectory (in example.com/dir/.htaccess) is also correct, with the same reservations as noted above:


RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/dir/$1 [R=permanent,L]

So, you have --somewhere-- another bad rule, script, Option setting, or other directive that is incorrectly "removing the directory-level" for these requests. Rather than *adding code* to try to fix this problem, you should be trying to find the root cause of the problem and to correct that -- Remove the error, rather than try to add something to cover it up.

Jim

TheMadScientist

1:53 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, I've actually had this happen once before and I can't remember what I finally did to fix it or I would have posted... I know I used a work-around as a solution until I found and resolved the issue, but I can't remember exactly what it was. (It happened about 3 years ago.)

To fix it I would recommend looking at (and adjusting) the little things in an isolated directory (or on a test server)....

I usually start by removing (commenting out) everything, except what I know works (like basic canonicalization) and then adding in one ruleset at a time until it 'breaks' again. It *can* have to do with server configuration too.

Here's one I ran into the other day... This is an .htaccess for an extensionless site I'm building that runs off the root and requires a login. It's the *entire* file except for Expires and Cache-Control.

It looks like it should work to me, but this version does not work and it took me an hour to figure out why and fix. (The ErrorPages did not show...) I actually didn't figure out what was wrong until I requested one of the error pages directly and doing so exposed the requested file path.

##### ### #####

Options -Indexes +FollowSymLinks
ErrorDocument 403 /ErrorDocs/403.php
ErrorDocument 404 /ErrorDocs/404.php
ErrorDocument 410 /ErrorDocs/410.php

RewriteEngine on
RewriteRule \.(txt¦gif¦jpg¦css¦ico¦gif¦js)$ - [L]
RewriteRule ^ErrorDocs/403¦404¦410\.php$ - [L]

RewriteCond %{THE_REQUEST} \?
RewriteRule .? http://www.example.com/? [R=301,L]

RewriteCond %{THE_REQUEST} !^[A-Z]{2,6}\ /(Page1¦Dir/Page1¦Page2¦Page3)?\ HTTP/1
RewriteRule .? - [F]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule .? http://www.example.com/? [R=301,L]

RewriteRule ^Page1$ /Page1.php [L]
RewriteRule ^Dir/Page1$ /Dir/Page1.php [L]
RewriteRule ^Page2$ /Page2.php [L]
RewriteRule ^Page3$ /Page3.php [L]

##### ### #####

Adding the ErrorDocs directory to the 'exclusions' in the condition with THE_REQUEST exclusions did not work. (It was the first thing I tired.)

The file path exposed was /cgi-bin/php-cgi/ErrorDocs/404.php

This did not work:
RewriteRule ^cgi-bin/php-cgi/ErrorDocs/403¦404¦410\.php$ - [L]

This did not work:
RewriteRule php-cgi/ErrorDocs/403¦404¦410\.php$ - [L]

This did not work:
RewriteRule ^ErrorDocs/403¦404¦410\.php$ - [L]

This worked:
RewriteRule ErrorDocs/403¦404¦410\.php$ - [L]

Maybe Jim can explain it to me, because I can't figure out how I cannot detect the file path, or at least start anchor the rule if the full file path is not present for me to detect with mod_rewrite in the root .htaccess, but my main point for posting the preceding is to highlight the 'little things' you need to look at sometimes to get everything working together and correctly, and if I remember correctly it was some silly little 'obscure' nuance it took me quite a while to find when I ran into the same issue you are having.

jdMorgan

2:03 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since error documents are served in the context of the current/original request, you won't be able to detect (or exclude) an error document request using THE_REQUEST unless that request is sent directly by the client.

You'd need to exclude those pages separately, testing a variable that *does* get updated by internal rewrites, such as %{REQUEST_URI} or %{REQUEST_FILENAME}.

[added] To clarify/expand, since the concept of 'request context' isn't widely-understood: When an error occurs, the server internally rewrites the request to an error document, so there is no client redirect. Therefore, the client still shows the originally-requested URL in its address bar, no new client HTTP request is invoked, and therefore, THE_REQUEST remains as it was -- containing the originally-requested URL-path, not the error document path. [/added]

No feedback so far on the OP's MultiViews setting here, so I'm 'idling' on this one.

Jim

TheMadScientist

2:23 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the info...

I actually tried %{REQUEST_URI} !^ErrorDocs preceding %{THE_REQUEST}, but it did not work either... I did not think to try %{REUQEST_FILENAME} though.

The more interesting thing to me is the inability to detect the path or start anchor the rule, but that's probably a question for another thread, so I'll just leave it at mildly confused and will know better next time.

MichaelBluejay

2:32 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks again, jdMorgan. Here's where I'm at.

(1) I don't see why I can't use [NC] to do a case-insensitive match on the hostname pattern, but in any event, I removed it, and I still have the same problem.

(2) I've never turned on MultiViews, and never even heard of it.

(3) I took your advice and simplified everything to an extreme, but I still have the same problem. The problem is exactly as I described it in the original post even if I have *no* .htaccess file at the root level, and even if the *only* code in /dir/.htaccess is:

RewriteEngine on
RewriteCond %{HTTP_HOST} !^example\.com$
RewriteRule (.*) http://example.com/dir/$1 [R=permanent,L]

If you think this should work, can you confirm that it does indeed work for you?

Thanks, -MBJ-

jdMorgan

2:57 pm on Nov 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It works for me on hundreds of servers, as I said. Given that several people have likely copied my code posted here, there may perhaps be thousands, but I can only vouch only for a few hundred myself.

The reason you don't want [NC] is because the whole point of the rule is to redirect if the requested hostname is non-canonical, that is, exact and singular. An upper- or mixed-case variant of the hostname, by definition is a variant, and therefore not exact or singular.

If you exclude uppercase or mixed-case hostnames from redirection (as [NC] would), then every other rule and script on your site that looks at hostnames would also have to make allowances for these non-canonical mixed-case hostnames. Removing [NC] makes the rule say, "If the hostname is anything but exactly example.com, then redirect." This is of no concern with normal browsers -- which almost always force lowercase on requested hostnames, but becomes a concern with badly-coded 'bots, many of which are malicious.

Again, beyond that, your code is fine, and you need to be looking elsewhere... If you haven't heard of MultiViews/content-negotiation, then with all respect, I suggest a stop at Apache.org, and a review of the Options directive in Apache core, and an overview of the mod_negotiation module for your server version.

If you've got MultiViews enabled because of your server's default config settings, that could be the *only* cause of your problem, and looking into that (or just forcing them off) is a far better use of your time that waiting for us to 'guess' what your problem might be.

Jim

MichaelBluejay

8:06 am on Nov 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, jdMorgan. I believe you that it's a settings issue, but this isn't my specialty and I'm at a loss as to how to continue. I found documentation about Content Negotiation, but it might as well have been in Greek. From what I could gather, if that were the culprit then I should have seen either Multiviews or something in the Options line in my httpd.conf file. But my httpd.conf file doesn't contain the string "MultiViews", and the Options directive in httpd.conf doesn't look like it's a problem:

Options Includes Indexes SymLinksIfOwnerMatch ExecCGI

So I've stripped my .htaccess files down to nothing, and I don't see anything bad in the httpd.conf file. Should I be looking at other files? Or should I be looking for something else in httpd.conf?

MichaelBluejay

4:55 am on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had an unusual resolution to this problem: My webhost has an option in their webpanel to either force or remove the www. When I originally signed up with them they didn't have this option, so I used my own code. So I just turned on the option to remove the www, and that fixed my original problem. I then looked at httpd.conf to see what code they added, and it's a little different:

RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]
RewriteRule ^(.*) http://example.com$1 [R=301]

They're matching the wrong host rather than negative-matching the desired host. But whatever, it works.

I still don't know what setting in httpd.conf or elsewhere was causing my original code to fail, but since my problem is solved, that's good enough.

jdMorgan

12:48 pm on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, they're matching a string that 'is only sort of' the wrong host.

That pattern matches "www<anycharacter>example<anycharacter>com" in all uppercase or lowercase variations.
It does not match if FQDN-format is requested (a period following "com").
It does not match if a port number is appended.
It does not match if any of a possibly-infinite number of "wrong" subdomains is requested (e.g. "naff.example.com").

... which is why we use a negative-match when possible.

In addition, if any internal rewrite is invoked after this rule, then this rule will expose the rewritten filepath as a URL, giving you an instant duplicate-content problem of a different sort. This is because the [L] flag is missing from this rule.

IOW, this solution is quite flawed, and I wouldn't use it either. You'd be better off just leaving the www/non-www problem as-is that to use this code.

Ask your host to fix the problem, or start looking for a host where simple and correct mod_rewrite code works correctly.

Jim

g1smd

3:19 pm on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nasty code:

RewriteCond %{HTTP_HOST} ^www.example.com$ [NC]

should be:

RewriteCond %{HTTP_HOST} [b]![/b]^www[b]\[/b].example[b]\[/b].com$

You'll also need an L with that R=301.

Hosting companies seem to produced some of the most broken code on the web.

jdMorgan

4:07 pm on Nov 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



should be:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$

as previously discussed, but it doesn't work. And that's the problem.

Jim