Forum Moderators: phranque
I'm having a hard time with a rewrite rule. I want to redirect any URL that does not begin with a particular directory name. I am using this as my rule:
RewriteCond %{HTTP_HOST} ^.*mydomain\.com [NC]
RewriteCond %{REQUEST_URI}!^/SOME_DIR$
RewriteRule (.*) http://www.mydomain.com/SOME_DIR/ [R,L]
The result is that any URL entered that does not begin with [mydomain.com...] gets redirected to the SOME_DIR index page. The problem is that invalid URLs are now returning 302 status codes where they were returning 404 before the above rule. For example:
http://www.mydomain.com/SOME_DIR/bogus.html
Thanks!
Welcome to WebmasterWorld!
You'll need to add a check for 'file exists' then. Something like this:
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI} !^/SOME_DIR$
RewriteCond /SOME_DIR%{REQUEST_FILENAME} -f
RewriteRule .* http://www.example.com/SOME_DIR/ [R=301,L]
File-exists checking is inefficient, and so should be done only when necessary -- as the *last* RewriteCond in this example.
I changed the redirect to a 301 to avoid the well-known problems with Google's handling of 302s, and cleaned up a few more instances of unecessary regular-expressions tokens, like ^.* and the unused back-reference, in the interest of efficiency.
In some cases, it is necessary to use the construct
RewriteCond %{DOCUMENT_ROOT}/SOME_DIR%{REQUEST_FILENAME} -f
If you have trouble
A major problem with this technique is that the expansion of the file-exists check is invisible. So if it doesn't work, it's hard to figure out why. There's a good possibility that I've got a slash in the wrong place, for example, in which case the resulting malformed URL-path will never exist, and the rule will always be applied.
So, for a temporary test, you can copy the path into a query string, where you can see it in your browser, in order to reveal what is being tested for existence:
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI} !^/SOME_DIR$
RewriteCond /SOME_DIR%{REQUEST_FILENAME} -f [OR]
RewriteCond /SOME_DIR%{REQUEST_FILENAME} !-f
RewriteRule .* http://www.example.com/SOME_DIR/?tested-path=/SOME_DIR%{REQUEST_FILENAME} [R=301,L]
For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].
Jim
Thanks for the quick reply! I'm still having some problems, and perhaps you can clarify my thinking on this. Consider my original rewrite rule (which you cleaned up for me):
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI} !^/SOME_DIR$
RewriteRule .* http://www.example.com/SOME_DIR/ [R=301,L]
http://www.example.com/SOME_DIR/bogus.htm
I actually only need to catch one particular 404 (I should have mentioned this in my original message, but didn't want to complicate things.) The three lines above do exactly what I want, but they break a Java applet on my site. Because of a bug in Java 1.5, the applet will always send a request for:
http://www.example.com/SOME_DIR/META-INF/services/javax.xml.parsers.DocumentBuilderFactory
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI} !^/SOME_DIR$
RewriteCond %{REQUEST_URI} !^DocumentBuilderFactory$
RewriteRule .* http://www.example.com/SOME_DIR/ [R=301,L]
I would think that the above code would not rewrite any URL that contains "DocumentBuilderFactory", but, here I am! If there is an explicit way to force any URL that contains "DocumentBuilderFactory" to return a 404, that would be perfect.
By the way, I'd like to thank you for all your efforts on this site. I know that you hear only the problems people are having most of the time. I'd like to say that lurking on your forums has helped me solve all the many other rewrite rule questions I've had. It is much appreciated!
[edited by: jdMorgan at 8:02 pm (utc) on Oct. 4, 2005]
[edit reason] Example.com [/edit]
http://www.example.com/SOME_DIR/META-INF/services/javax.xml.parsers.DocumentBuilderFactory
has a local URL-path that starts with "SOME_DIR" it should not be affected by your code. Adding the special exclusion is not necessary.
However, if it *were* necessary, your pattern would not work because you have start-anchored it with "^". Since the REQUEST_URI in this case would be "/SOME_DIR/META-INF/services/javax.xml.parsers.DocumentBuilderFactory" it would not *start* with "DocumentBuilderFactory" and so the start-anchored pattern would never match.
Because you are seeing 'strange' behaviour associated with that special URL, I suspect that it has been aliased or proxied to another URL, and that a 302 redirect is being applied by *some other code*. If it was being affected by the code I posted, you'd be seeing a 301, not a 302.
As a test, you could try rewriting that request to a known-non-existent URL-path, thus creating a 404:
RewriteRule ^SOME_DIR/META-INF/services/javax\.xml\.parsers\.DocumentBuilderFactory$ /abc_123_this_here_file_will_never_be.hmtl [L]
The other possibility is that the applet is making an internal file request, not an HTTP request. I don't know. But in that case, no .htaccess code will have any effect, because .htaccess only applies to HTTP requests.
Oh, and be sure to flush your browser cache before testing any change to access-control code.
Jim
Again, thanks!
Because your URL [...] has a local URL-path that starts with "SOME_DIR" it should not be affected by your code. Adding the special exclusion is not necessary.
http://www.example.com/SOME_DIR/META-INF/services/javax.xml.parsers.DocumentBuilderFactory
If it was being affected by the code I posted, you'd be seeing a 301, not a 302.
The other possibility is that the applet is making an internal file request, not an HTTP request.
By the way, I am not putting any of this code in an .htaccess file. It is directly in the VirtualHost directive of my httpd.conf file, where all my other (working) rewrites are. I wouldn't think this would matter, though.
I'm really glad that you think this should work the way I think it should. At least that means I have a real problem, and not just a stupid regular expression error! If I could impose to ask you just one other question: since we both think that the code above shouldn't make a difference for requests starting with "/SOME_DIR", but it does, is there any "quick fix" way to return a 404 status any time a particular URL is requested? Again, I don't need 404 to work on any other URLs...just this one in particular, so my applets stop freaking out.
Because you are seeing 'strange' behaviour associated with that special URL, I suspect that it has been aliased or proxied to another URL, and that a 302 redirect is being applied by *some other code*.
This behaviour is not unique to this one URL. If I comment out my code, typing in any random, non-existent resource path returns a 404 properly:
http://www.example.com/SOME_DIR/blah_blah_blah.htm
...etc.
If I uncomment my code, the above (completely bogus) URL redirects to:
http://www.example.com/SOME_DIR/
...which is fine, except for one particular request, which MUST return a 404 when it doesn't exist because of a Java bug. That's the only thing that makes this particular URL special...a human can just be redirected to my main page when they type an invalid location, but that one nonexistent URL has to return 404. That's why I don't care whether the solution explicitly returns a 404 for that one URL, or whether all nonexistent resources return 404 properly.
RewriteCond %{REQUEST_URI}!DocumentBuilderFactory
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI}!^/SOME_DIR$
RewriteRule .* http://www.example.com/SOME_DIR/ [R=301,L]
Jim
Still no dice. Because there is no other rewrite code (my httpd.conf is about as plain as it gets), I'm not sure what's going on. I have a new theory, though. Is it possible that the original rewrite rule is working correctly, but because the error documents do not start with SOME_DIR, they are being rewritten? Here's how I'm picturing the flow:
(1) A user requests http://www.example.com/SOME_DIR/bogus.htm, which does not exist.
(2) The request begins with /SOME_DIR, so the rewrite rule does not apply.
(3) The 404 error page is fetched, but the URL for that page does not begin with /SOME_DIR, so the rewrite rule catches it and rewrites it to a 30X.
I think I'm probably wrong about this, because then the 30X would probably be rewritten in the same way and cause a loop. I'm grasping at straws, though. I can't understand why the rule is redirecting /SOME_DIR/nonexistent.htm to /SOME_DIR...it should leave it alone!
Thanks for all your help on this.
If you suspect your ErrorDocuments are being rewritten, add (a) RewriteCond(s) to exclude them.
The quickest way to master this subject is to experiment and make lots of mistakes, and get lots of 500-Server Error responses... and then fix them. :)
Once you've got it working, then a quick review to optimize the code is all that'll be needed.
Jim
Maybe you can suggest an alternative solution to the problem I originally intended to solve. I have a Web site at:
http://www.example.com/SOME_DIR/
This site exists entirely within SOME_DIR. The URL is printed in a book, so I can't move it around (I can't simply bump SOME_DIR's contents up to the root folder.)
Here is what I want to accomplish:
(1) There is no index page at http://www.example.com, so I want requests for that URL to be redirected permanently to http://www.example.com/SOME_DIR/
(2) People keep finding new ways to misspell SOME_DIR, or use mixed case letters (SoME_DiR), etc. I was using symbolic links to handle the most common misspellings, but I'd like to have any request that does not start with SOME_DIR to be redirected permanently.
(3) I need 404 errors to be correctly generated for resources that do not exist under SOME_DIR even after the solutions for #1 and #2 are applied.
I thought that the rules I proposed would solve my problem. They did solve #1 and #2, but created a problem for #3. Is there a better way to do this?
Thanks!
http://www.example.com
Then I wrote a rewrite ruleset that handles the most common misspellings of the subdirectory and redirects the request to the root directory. Up to now, I've been using SOME_DIR in my posts, but I'll need to be more specific here so my rule makes sense. The subdirectory is named "CE06"...that's C-E-Zero-Six. People frequently misspell this as C-E-O-Six (letter Oh substituted for zero), or use lower case letters. Further, this URL was misprinted as CT06 at one source. Here's my rules:
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{REQUEST_URI} ^/C[ET][0O]6(.*)$ [NC]
RewriteRule .* http://www.example.com%1 [R=301,L]
Thanks, Jim, for all your help!
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteRule ^/C[ET][0O]6(.*)$ /$1 [NC,L]
Even if you do keep the redirect, note that the pattern (and [NC] flag) can be moved into RewriteRule itself, eliminating the second RewriteCond.
Jim
Best regards,
Steve