Forum Moderators: phranque

Message Too Old, No Replies

Redirect with Exclusions

using htaccess to redirect a domain using exclusions

         

cucumberdesign

8:42 am on Mar 13, 2009 (gmt 0)

10+ Year Member



Hi guys,

I have written a rule to redirect any calls to one of my domains to redirect to another domain unless a specific folder(s) are called, in which case it should not redirect.

Due to the 2 domains both working off the same root folder I have put a condition on the host header to make sure that the rule only works on the specific domain.

On my testing server it works correctly and redirects (i comment out the host header condition) but when I put it live it doesn't work.

Here is the rule.


# redirect all .co.uk requests to .com except the exclusions
RewriteCond %{HTTP_HOST} ^(www\.)?example\.co\.uk$
RewriteCond %{REQUEST_URI} !^/(softwareŠinclude)(/?Š/.*)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

The rule was working, but caused some redirect loops on the .com domain and now it no longer redirects in the live environment.

To test that the htaccess file was in fact working, I added some additional rules which work fine, so I have eliminated the possibility of htaccess not working.

Also as mentioned, if I comment out the first rule and test on my local server it works perfectly.

Any ideas?

Thanks

cucumberdesign

1:58 pm on Mar 13, 2009 (gmt 0)

10+ Year Member



Problem was the server not the file. After a DNS flush and restart the redirects worked.

Thanks.
Byron

jdMorgan

5:41 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Glad you got it fixed!

Jim

g1smd

8:33 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note that
^(.*)$
simplifies to
(.*)
here.

Additionally, I believe that all after

...include)
could be deleted.

Caterham

9:59 pm on Mar 13, 2009 (gmt 0)

10+ Year Member



Note that ^(.*)$ simplifies to (.*) here.

Why do you want to remove the anchor and let PCRE decide if (.*) can be treated as anchored or not?

jdMorgan

2:14 pm on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally, I leave the anchoring off all stand-alone ".*" patterns on the assumption that two less characters to parse is a good thing (especially in .htaccess, where code is interpreted rather than pre-compiled as it is at the server-config level). Since the default behaviour of the maximally-greedy-and-promiscuous ".*" pattern is to match anything and everything, and since I have not actually reviewed the POSIX regex library source code or set up a performance benchmark comparison, I think it's a reasonable approach.

However, better-informed opinions are *always* welcome here, and if "making the regex library decide" does result in slower execution, I'll happily go back and re-anchor all my ".*" patterns.

As should be obvious, the vast majority of Webmasters reading here are limited to .htaccess context on name-based shared servers. So that is our usual "default context." Where differences exist between the per-dir context and the server-config context, we do like to try to make those differences explicit in the discussion.

Jim

Caterham

4:52 pm on Mar 15, 2009 (gmt 0)

10+ Year Member



PCRE, that is the regEx engine used since apache 2.0 (scrlib/pcre directory in apache's source), has some functions to optimize a regular expression at compile time. One is to set the PCRE_ANCHORED flag if the regEx wasn't compiled with it explicitly.

The regEx is automatically anchored by PCRE if
- .* was used and the regEx is compiled with PCRE_DOTALL and, if it's capturing, there's no backreference like \1 within the pattern
- ^ was used and the regEx is not compiled with PCRE_MULTILINE
and in some other cases.

Benchmarking is not easy... Some time ago I ran apache bench three times one after another (without changing anything) and the results differed what I'd call out of tolerance so comparing the result with a modification which doesn't lead to a massive difference seems to be difficult.

I wouldn't expect too much especially on fast machines but that doesn't mean that I'd remove ^ explicitly. The regEx is not compiled with PCRE_MULTILINE, so PCRE would set PCRE_ANCHORED.

ap_regcomp()
does not call
pcre_compile()
with the option PCRE_DOTALL, i.e. you shouldn't get the optimization unless you anchor the pattern with ^ explicitly.

I don't know what optimization POSIX extended (apache 1.3.x) does, but that engine is known to be slower than PCRE.

g1smd

6:12 pm on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What about usage in .htaccess, where it is interpreted, vs. usage in httpd.conf, where it is compiled?

Most people here seem to be using servers where this code is usually added to the .htacces file.

Caterham

6:54 pm on Mar 15, 2009 (gmt 0)

10+ Year Member



The compilation itself is context less. It's compiled when the configuration is read. That happens in httpd.conf once at server startup (twice on windows) and, for .htaccess files, each time the .htaccess file is read. That (reading the .htaccess file) is done in the directory walk (map_to_storage-phase). That is the point where the regEx used by a directive in a .htaccess file is compiled.

The execution (per-directory) of the merged directives by its corresponding modules (mod_rewrite -> fixup-phase) occurs at a later stage of processing.

If the request was denied (access_check-phase) the regEx from the .htaccess file was compiled but never executed since access_check runs prior fixups.

[edited by: Caterham at 7:01 pm (utc) on Mar. 15, 2009]

Caterham

11:03 am on Mar 16, 2009 (gmt 0)

10+ Year Member



BTW: To be more clear, the performance issue with .htaccess files is not caused at runtime by a module in its registered hook(s) but within the directory_walk for searching, reading and parsing .htaccess files per each (mostly first main-) request (the dir_walk is cached per request). The parsing will invoke the module, of course, since the module provides the commands what to do with its directives if one was found in a configuration file. That is not the runtime execution "applying rewrite rules" but the parsing of the argument line (building the rewrite list, compiling regular expressions, setting flags,...).

At runtime, the module gets its merged configuration and can't distinguish if parts of the configuration were originally defined by directives in a .htaccess file or in a <Directory> section.