Forum Moderators: phranque

Message Too Old, No Replies

Rewritecond and rewriterule explanation needed for a multiple domains

multiple domain, rewrite rule, rewrite cond

         

moroandrea

3:08 pm on Oct 28, 2010 (gmt 0)

10+ Year Member



Hi All,

I'm not an Apache expert and I rarely use the redirect rules, however I did my best to read the documentation and look for examples that would have been able to provide me an answer without success.

In my .htaccess file I implemented the following rules (I paste just the first 4)

RewriteCond %{HTTP_HOST} ^andreamoro.eu$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.eu$
RewriteRule ^contactme\/?$ "http\:\/\/www\.andreamoro\.co\.uk\/contactme\/" [R=301,L]

RewriteCond %{HTTP_HOST} ^andreamoro.eu$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.eu$
RewriteRule ^about\-andrea\-moro\/?$ "http\:\/\/www\.andreamoro\.co\.uk\/about\-andrea\-moro\/" [R=301,L]

RewriteCond %{HTTP_HOST} ^andreamoro.co.uk$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.co.uk$
RewriteRule ^contattami\/?$ "http\:\/\/www\.andreamoro\.eu\/contattami\/" [R=301,L]

RewriteCond %{HTTP_HOST} ^andreamoro.co.uk$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.co.uk$
RewriteRule ^biografia\/?$ "http\:\/\/www\.andreamoro\.eu\/biografia\/" [R=301,L]

They works fine if they are written in this way.

As there are two groups of conditions based on the incoming host, I read that it was possible group the rules underneath the same condition. So I tried and published the file in the following way

RewriteCond %{HTTP_HOST} ^andreamoro.eu$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.eu$
RewriteRule ^contactme\/?$ "http\:\/\/www\.andreamoro\.co\.uk\/contactme\/" [R=301,L]
RewriteRule ^about\-andrea\-moro\/?$ "http\:\/\/www\.andreamoro\.co\.uk\/about\-andrea\-moro\/" [R=301,L]

RewriteCond %{HTTP_HOST} ^andreamoro.co.uk$ [OR]
RewriteCond %{HTTP_HOST} ^www.andreamoro.co.uk$
RewriteRule ^contattami\/?$ "http\:\/\/www\.andreamoro\.eu\/contattami\/" [R=301,L]
RewriteRule ^biografia\/?$ "http\:\/\/www\.andreamoro\.eu\/biografia\/" [R=301,L]

As soon as the file is live, the rules continue to work as they are meant but each single rules fall into a invisible loop chain that with a browser like firefox you generally don't see it.
I discovered through chrome and double checked with an header checker, and that's true. In the above way I got 5 redirects to arrive to the final destination and I'm not able to guess why. Apart from any perfomance downside, this is very detrimental in a SEO condition.

My website is relatively small, but I have another where I require the above pattern with many hosts conditions. And this web site requires 300 rules or more. If I need to replicate the condition per each rules, this .htaccess file will became so big and unmanageable, without considering that it can negatively effect the performance of the server.

Can you give me a clue on this, please?

Cheers
Andrea

jdMorgan

4:00 pm on Oct 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteConds apply only to the *single* RewriteRule that follows them. Therefore in your latest code, the second RewriteRule in each "set" of two rules has no conditions, causing an 'infinite' loop that stops only when the browser or the server gives up.

Note also that all of your rules would fail even if the RewriteCond problem was corrected, in the case where the perfectly-valid hostnames "andreamoro.co.uk.", "andreamoro.co.uk:80", "andreamoro.co.uk.:80", "www.andreamoro.co.uk.", "www.andreamoro.co.uk:80", or "www.andreamoro.co.uk.:80", were requested. This is because you've end-anchored your hostname patterns.

Furthermore, casing errors in the hostname, which are rare but possible, would also miss out the redirect.

You should also take advantage of the power of regular expressions to reduce the number of required RewriteConds in your rules from two to one -- For example, using either
 RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.co\.uk(\.|\.?:[0-9]+)?$ 

or
 RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.co\.uk 

in your first rule would fix both of these issues.

You've also got a lot of unnecessary character-escaping where it is not needed, and missing character-escaping where it is needed. Referring to the (somewhat-simplified) regular-expressions reference in the mod_rewrite documentation would be a good idea.

Note that in general, only those characters which are "tokens" in regular-expressions patterns need to be escaped, and that generally, only the strings in the patterns (and not in the substitutions) need any escaping at all.

If you'd like to try to use a more-advanced and more-efficient approach, here's a method that accomplishes what you appear to want with far less code:

RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.eu(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^contactme/?$ [OR]
RewriteCond $1 ^about-andrea-moro/?$
RewriteRule ^(([^/]+/)*[^.]+)$ http://www.andreamoro.co.uk/$1/ [R=301,L]
#
RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.co\.uk(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^contattami/?$ [OR]
RewriteCond $1 ^biografia/?$
RewriteRule ^(([^/]+/)*[^.]+)$ http://www.andreamoro.eu/$1/ [R=301,L]

This replaces your original four rules with just two, and reduces the total number of RewriteConds from eight to six as well.

I can't tell from your post whether you may want to add a lot more "pages," a lot more hostnames, or both to these rules. So this solution is a compromise between possible additional efficiency improvements and flexibility. In particular, the RewriteRule pattern shown here matches only "extensionless" URLs -- those which do not contain a period in the final path-part. This prevents the RewriteConds from having to be processed for most requests -- for example, for image, CSS, and JavaScript file requests.

However, if you do not have more "pages" to add, then these two rules could be made even more efficient by coding them as:

RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.eu(\.|\.?:[0-9]+)?$ [NC]
RewriteRule ^(contactme|about-andrea-moro)/?$ http://www.andreamoro.co.uk/$1/ [R=301,L]
#
RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.co\.uk(\.|\.?:[0-9]+)?$ [NC]
RewriteRule ^(contattami|biografia)/?$ http://www.andreamoro.eu/$1/ [R=301,L]

Which approach you take from here depends on whether you need to 'expand' these redirect rules and in what "direction" they need to be expanded -- that is, the number of additional hostnames versus the number of additional "pages" you might want to add.

The resources cited in our Apache Forum Charter, and the example threads and tutorials in our Apache Library may prove useful to you. BTW, if you are familiar with general language theory, be aware that mod_rewrite has its roots firmly planted in "lexical rewriting" and that regular expressions form the base used in almost all machine-based language parsing. While mod_rewrite itself is quite specialized to Apache servers and the Web, its immediately-underlying foundations are not.

Jim

moroandrea

4:39 pm on Oct 28, 2010 (gmt 0)

10+ Year Member



Jim,
thank you were much for shading a light on this. I really appreciate it.

If you pass by London, please drop me an email and I will pay u a beer.
You are the first one who is providing me an answer.

Cheers
Andrea

moroandrea

9:50 am on Oct 29, 2010 (gmt 0)

10+ Year Member



Hi Jim,

the rule you write works fine, and again thank you very much for the time you spent putting together the detailed explanation.

However, because I don't want to implement a ready-to-use solution only, but I aim to learn (to provide support to other in the future) I hope you can find a little more time to help me understand some doubts I listed below:

You said that RewriteCond apply to immediate following line. According to the following example,

RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.eu(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^contactme/?$ [OR]
RewriteCond $1 ^about-andrea-moro/?$

it looks I can have multpile rewriteCond following each other. Is there a limit to this number?
Or can I add as many rewritecond, eventually following each line belonging to same group with an [OR] according to necessity?

Always with reference to the above example, what is the content of the placeholder $1 contained in the second RewriteCond.
I don't think it is a back path reference to the previous host name. If this is the case, I guess it should have been %1. Is this correct?

Is it a placeholder for the the second rewritecond that will contain the the value of the condition that will be true?
i.e. RewriteCond $1 ^contactme/?$ [OR] ... in this case $1 will contain contactme if this is the case. Is it correct?

Finally, let suppose I would manage a redirection from a domain to a subdomain meanwhile preserving the querystring, how would be this possible?

I.e. www.andreamoro.co.uk/blog-en/ to became en.andreamoro.com/blog/

As I have a limited knowledge about Url rewriting, I don't know how to refer to the back path. I surrounded the TLD with the parenthesis in order to create a reference, but in which way I can tell specify the reference I want?
I'm a bit confused with % and $?

It looks like the % referers to the reference in the pattern, while the $ refers to the references in the substitution. Is this correct?

Can something like the following work?

RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.(*)(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $2 ^eu$ [OR]
RewriteCond $2 ^co.uk$
RewriteRule ^blog-en/?$ [%2.andreamoro.com...] [R=301,L]

g1smd

10:29 am on Oct 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$1 comes from the RewriteRule pattern.
%1 comes from the RewriteCond pattern.

If more than one set of parentheses, count the number of left "(" to find the "number" for $1, $2, %1, %2, etc.

It may also help to know that the left side of the RewriteRule is evaluated before any of the RewriteCond patterns.

moroandrea

10:34 am on Oct 29, 2010 (gmt 0)

10+ Year Member



Hi G1smd,

thanks for the useful answer. But this create some confunsion to me trying to read the following

RewriteCond %{HTTP_HOST} ^(www\.)?andreamoro\.co\.uk(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^contattami/?$ [OR]
RewriteCond $1 ^biografia/?$
RewriteRule ^(([^/]+/)*[^.]+)$ [andreamoro.eu...] [R=301,L]

While the second rewriteCond contains a $1? To what it is reffered if there is not any other previous RewriteRule before that?

jdMorgan

2:18 pm on Oct 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Back-references $1 through $9 in RewriteConds refer to the parts of the requested URL-path matched by the first through the ninth parenthesized subpatterns in the RewriteRule that *follows* those RewriteConds.

You may have as many RewriteConds as you like on one RewriteRule, limited only by processing efficiency considerations.

As described in the Apache mod_rewrite documentation, processing goes something like this:
  1. Evaluate RewriteRule pattern. If no match, end processing of this rule and begin processing the next one.
  2. If the RewriteRule pattern matched, begin processing this rule's RewriteConds. Note that since the RewriteRule pattern has already been evaluated, the back-references $1 through $9 are available to these RewriteConds.
  3. RewriteConds can be ANDed, where all must be true to invoke the RewriteRule, they may be [OR]ed where any one (or more) must be true to invoke the RewriteRule, or there may be a mixture of ANDed and ORed RewriteConds. AND is the default logical operator, while OR can be specified using the [OR] flag. In the case of [OR]ed RewriteConds, the first one that is evaluated as "true" ends the RewriteCond processing, and any back-references created by that matched RewriteCond will be available for use in the RewriteRule substitution path (the new URL or filepath output by the RewriteRule).

So the above rule says:
'If the requested URL-path consists of any number of "directory levels" followed by a "filename" that does not contain any periods, then remember that entire URL-path AND
If the requested hostname is any valid variant of adreamoro.uk AND
If the requested URL-path is "contattami" OR "bioggrafia"
THEN
Redirect the client to the same URL-path as originally requested by the client, but using the "www.andreamoro.eu" hostname instead of the originally-requested hostname.'

The term "url-path" refers to the part of the complete URL after the protocol (http) and the hostname (e.g. www.andreamoro.co.uk, and before any query string (denoted by a "?") or URL-fragment (denoted by a "#").

If you request http://google.com/search/?q=abc#123, then the protocol is "http", the hostname is "google.com", the URL-path is "/search/", the query string is "q=abc" and the URL-fragment is "123".

Jim

moroandrea

4:30 pm on Oct 29, 2010 (gmt 0)

10+ Year Member



Jim,

thank you very much. This definitely clarify my doubts.